# [TensorFlow 2.0] Text Classification with an RNN in Keras

*For those of you who has not subscribed medium, use our **Friendās Link!!*

How was your weekend? My cousin got married this Sunday!!! š° š š I am still stuck in that emotional moments šæ and under some side effects of too much of wine š·.

Anyway we have to learn RNN this week. My plan is to start with RNN basics with some explanations in this post and then some advancements in the following posts.

I hereby acknowledge that this post is based on

*Deep Learning with Python *by FranĆ§ois Chollet pp.196ā206

# I. What is RNN?

In the first place what does ā**recurrent**ā mean? The word reminds me of a musical note; **the repeat sign. **As in the music, the neural networks are repeated during certain loops in RNN. The network that we have dealt with is called feedforward networks, it was a song without any repeat sign, so all you have to do is to go forward.

On the other hand, RNN has an internal loop while maintaining state which consists of information regarding what the sequences have been through during the loops. There are only 2 concepts that we have to clarify to understand the RNN;** loop and state. **It is more or less very simple if we put RNN this way; a for

**that reuses quantities computed during the previous iteration of the loop in other words,**

*loop***.**

*state*Now, we have to mention how the output is computed. Input and state will be parameterized by two matrices, W and U respectively and a bias vector. Then, it will go through an activation function whatever it is.

*output_t=activation(W*input_t+U*state_t+bias)*

*output_t=activation(W*input_t+U*state_t+bias)*

- t is for time
- initial state can be any random vector, even zero vector.
- So, the RNN layers that we will be looking at very soon, i.e., SimpleRNN, LSTM and GRU layers follow a very similar mechanism in a sense that these RNN layers will find most adequate Wās and Uās; weights.

Enough of brief information, letās go deeper with more details.

# II. SimpleRNN in Keras

Letās start with the most simple RNN. In this section we will see some basics of RNN. The concept is very simple, the output of the previous time step is used as state information, then it is repeated for certain amount of iterations.

**SimpleRNN has 2 modes of output**

- It takes inputs of 3D tensor of shape (batch_size, time_steps, input_features)
- Then, it can return a 2D tensor of shape
**(batch_size, output_features)**which is the last output for each input sequence.

from keras.models import Sequential

from keras.layers import Embedding, SimpleRNNmodel = Sequential()

model.add(Embedding(10000, 32))

model.add(SimpleRNN(32))

model.summary()

- Or it can return a 3D tensor of shape
**(batch_size, time_steps, output_features)**which is the full sequences of successive outputs for each time steps by adding**return_sequences=True.**

`model = Sequential()`

model.add(Embedding(10000, 32))

model.add(SimpleRNN(32,**return_sequences=True**))

model.summary()

**Can you see that difference in the output shape?**

**2. As usual, letās check param#, while doing so we will get used to the mechanism of the SimpleRNN**

- 10,000*32=320,000 āwe have done this in word-embeddings
- (32+32+1)*32=2080.

- The first 32 is from the 32-dimensional word embedding layer which will be an input for the RNN layer in each iteration. (W)

- The second 32 is the dimension of output shape in the previous time step which is defined in SimpleRNN(32). (U)

- The 1 is for the bias.**- Lastly, 32 is from the number of units which is same as the number of time steps.**

Then, can you see how the number of parameters in the following model is computed?

`model = Sequential()`

model.add(Embedding(10000, 32)) #32*10,000

model.add(SimpleRNN(64,return_sequences=True)) #(32+64+1)*64=6208

model.add(SimpleRNN(32,return_sequences=True)) #(64+32+1)*32=3104

model.add(SimpleRNN(32,return_sequences=True)) #(32+32+1)*32=2080

model.summary()

Pop quiz! Experiment yourself then check the comments š¤

`model = Sequential()`

model.add(Embedding(10000, 1))

model.add(SimpleRNN(64,return_sequences=True))

model.add(SimpleRNN(3,return_sequences=True))

model.add(SimpleRNN(2))

model.summary()

**3. Implement on IMDB data**

from keras.datasets import imdb

from keras.preprocessing import sequence

from keras.layers import Densemax_features = 10000

maxlen = 500

batch_size = 32

**Import data**

print('Loading data...')

(input_train, y_train), (input_test, y_test) = imdb.load_data( num_words=max_features)print(len(input_train), 'train sequences')

print(len(input_test), 'test sequences')

print('Pad sequences (samples x time)')input_train = sequence.pad_sequences(input_train, maxlen=maxlen)

input_test = sequence.pad_sequences(input_test, maxlen=maxlen)print('input_train shape:', input_train.shape)

print('input_test shape:', input_test.shape)

Loading dataā¦

25000 train sequences

25000 test sequences

Pad sequences (samples x time)

input_train shape: (25000, 500)

input_test shape: (25000, 500)

**Modeling**

`model = Sequential()`

model.add(Embedding(max_features, 32)) #max_feature=10,000 so, 320,000

model.add(SimpleRNN(32)) #(32+32+1)*32=2080

model.add(Dense(1, activation='sigmoid'))#(32+1)*1=33

model.summary()

**Compiling and fitting**

model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

history = model.fit(input_train, y_train,epochs=10, batch_size=128, validation_split=0.2)#25,000*0.8=20,000 (train on 20000samples) 5000 left for validation

**Plotting**

import matplotlib.pyplot as pltacc = history.history['acc']

val_acc = history.history['val_acc']

loss = history.history['loss']

val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')

plt.plot(epochs, val_acc, 'b', label='Validation acc')plt.title('Training and validation accuracy')

plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')

plt.plot(epochs, val_loss, 'b', label='Validation loss')

plt.title('Training and validation loss')

plt.legend()

plt.show()

The validation accuracy goes up to 86%, and this is not very outstanding. The drawback was that we considered the first 500 words; maxlen = 500. Also, the SimpleRNN isnāt appropriate when it comes to processing long sequences; texts are!

# III.LSTM in Keras

LSTM is an abbreviation of Long Short-Term Memory. It is one of the solutions to the ** vanishing gradient problem** of our SimpleRNN. It does sound very vague.

**Let me put it this way.**

Have you seen the movie series Finding Dory? It is a lovely movie š

In that movie Dory suffers from a short-term memory loss and she suddenly recalls her parents and that is how the journey begins.

Our poor Dory is **SimpleRNN** which fails to retain at time-t information about the inputs seen many time steps before (although it should be able to do so, theoretically ). This problem is called ** vanishing gradient problem **or between you and me short-term memory loss.

And, as you know, Dory is accompanied by Nemo and his father. Dory alone would have never been able to find her parents (spoil alert!). Whenever Dory is lost and forgets why she is away from home her friends are there for her to remind her.

So is SimpleRNN. It needs another data flow that carries information across time-steps (or a journey). So, in LSTM we add **a carry track.**

Then, how the carry track is computed?

*output_t = activation(Uo*state_t + Wo*input_t + Vo*C_t + bo)*

i_t = activation(Ui*state_t + Wi*input_t + bi)

f_t = activation(Uf*state_t + Wf*input_t + bf)

k_t = activation(Uk*state_t + Wk*input_t + bk)

*c_t+1=i_t + k_t + c_t + k_t*

Now, letās have some work done.

**Modeling**We have already imported dataset, if you are starting from this line go up and copy the parts

from keras.layers import LSTMmodel = Sequential()

model.add(Embedding(max_features, 32)) #max_features=10,000 so 320,000

model.add(LSTM(32))

model.add(Dense(1, activation='sigmoid'))

model.summary()

Where did 8320 come from?

** output_t = activation(Uo*state_t + Wo*input_t + Vo*C_t + bo). **Here, Uo and Wo have 32 parameters and 1 bias for bo. Then, what about C_t?

i_t-1 = activation(Ui*state_t-1 + Wi*input_t -1+ bi)

f_t-1= activation(Uf*state_t-1 + Wf*input_t-1+ bf)

k_t-1= activation(Uk*state_t-1 + Wk*input_t-1 + bk)

These 3 will each have 32+32+1=65 parameters and, so C_t which is a transformation of i_t-1 and f_t-1 and k_t-1 will have 65*3=195 parameters.

So, **(32+32+195+1)*32=8320.**

**Compiling and fitting**

model.compile(optimizer='rmsprop',loss='binary_crossentropy',

metrics=['acc'])history = model.fit(input_train, y_train,

epochs=10,

batch_size=128, validation_split=0.2)

**Plotting**

This time, validation accuracy goes up to 88%, better than SimpleRNN but still not perfect. This is mainly because LSTM is good at global, long-term structure of the words/texts rather than sentiment-analysis. So, LSTM has its power when it comes to translation.

So this is it for this post and I will be soon back with RNN in TensorFlow2.0. See you then!