blog:2019:0212_lstm_text_generation

Differences

This shows you the differences between two versions of the page.


blog:2019:0212_lstm_text_generation [2020/07/10 12:11] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== Text generation with LSTM ======
 +
 +{{tag>deep_learning}}
 +
 +A long time ago I remember I first tried an LSTM implementation in Torch, trying to generate text based on some large text corpus from shakespear (see reference below). The result that could be achieved at that time were already quite surprising. But now I want to give this concept another try. Except that this time, it should be implemented in Tensorflow instead of torch. SO let's get started!
 +
 +====== ======
 +
 +===== References =====
 +
 +  * [[http://karpathy.github.io/2015/05/21/rnn-effectiveness/|The Unreasonable Effectiveness of Recurrent Neural Networks]]
 +  * [[https://www.tensorflow.org/tutorials/sequences/text_generation|Text generation using a RNN with eager execution]]
 +  * [[https://towardsdatascience.com/deep-learning-with-tensorflow-part-3-music-and-text-generation-8a3fbfdc5e9b|Music and text generation]]
 +  * https://magenta.tensorflow.org/
 +  * https://github.com/martin-gorner/tensorflow-rnn-shakespeare
 +  * https://github.com/burliEnterprises/tensorflow-shakespeare-poem-generator
 +  * https://github.com/jcjohnson/torch-rnn
 +  * https://www.tensorflow.org/guide/keras
 +
 +===== Initial implementation =====
 +
 +  * At first I didn't want to enter the **tf.keras** world just yet, but reading this [[https://www.tensorflow.org/guide/keras|guide on keras]] I finally changed my mind and decided to give it a try here.
 +
 +  * I followed this tutorial to build the model: https://www.tensorflow.org/tutorials/sequences/text_generation
 +
 +  * Everything wen right except on the **sparse_categorical_crossentropy()** function which didn't accept the optional "from_logits" argument. So I checked [[https://stackoverflow.com/questions/53919290/tensorflow-sparse-categorical-cross-entropy-with-logits|this page]] and realized this was due to the version of tensorflow I'm using (version 1.12 instead of 1.13), but fortunately I found a way around the problem: we just need to import the corresponding backend function directly and use it as this one will in fact accept the new signature already:
 +
 +  <sxh python>
 +from tensorflow.python.keras import backend as K
 +
 +def loss(labels, logits):
 +    return K.sparse_categorical_crossentropy(labels, logits, from_logits=True)
 +  </sxh>
 +
 +
 +===== First results =====
 +
 +  * So here is a small example of a text I could generate: <code>QUEENE:
 +A fieter carver in the place,
 +This child was parted body to the state,
 +Here at the prince that is all the world,
 +So two thy brother and what thou dost consent
 +That is straight deliver them.
 +
 +ROMEO:
 +O, if I had straight shall you be meterch'd;
 +One cords apong no other reasons.
 +
 +MISTRESS OVERDONE:
 +What's thy name?
 +
 +CORIOLANUS:
 +Come, come, we'll play a better-house
 +But that the sea prove a stolemnman:
 +Why, I, pleased in the park,
 +And bring all the rest from the king in lived to slander with the flood,
 +And fellow'st thou, which lies your king and quick and will not but
 +bid me fain, of what containt do amend him roar
 +Than honourable and familiar curses;
 +And soon perish.
 +
 +CORIOLANUS:
 +You bless my daughter Katharina,
 +A mother said that you may call him
 +Aid the Lord Auberchere how 'tis boot doth light again.
 +This is the heat's face.
 +
 +Second Senator:
 +Then we may cry 'Charge it is that one would seem to be;
 +The fault of heaven shall not be distraught,
 +With such a miserable stay.
 +</code>
 +
 +=> Not too bad for a training phase of less than 1 hour! (60 epochs, 1024 GRU units in a single hidden layer)
 +
 +  * That first model gave me an loss of about 0.69, so I then try to increase the model complexity adding a second LSTM layer with 512 units (first LSTM layer has 1024 units), and I got an error of about 0.45 this time.
 +
 +  * Now let's try again with yet another LSTM layer! And the results are desappointing since we are only reaching 0.43 loss value with 60 epochs. Even with 180 epochs, we are still only reaching a **loss of 0.43**
 +