Introduction to Recurrent Neural Networks with Keras and TensorFlow

Introduction to Recurrent Neural Networks with Keras and TensorFlow

Introduction
Configuring Your Development Environment
Having Problems Configuring Your Development Environment?
Project Structure
What Are Sequential Data

A Caveat: Masking and Padding

Modeling Sequential Data with MLPs
The Recurrence Formula
Recurrent Neural Network (an overview)
Training and Visualizations
Loading and Inference

Summary

References
Citation Information

Introduction to Recurrent Neural Networks with Keras and TensorFlow

It’s a standard Monday morning for you. You are sitting at your workstation, waiting for another computer vision problem statement. By now, you have become an absolute maestro at computer vision (CV) problems.

But to your horror, your company gives you a sequential text classification problem instead of CV. They say they are expanding and for that reason, in front of you lies an absolutely alien domain of which you know nothing.

Thankfully, we would never want that to happen to you. So, finally, due to popular demand, we bring you our first tutorial on dealing with sequential text data; Recurrent Neural Networks (RNNs).

The world of deep learning has progressed immensely, with Transformer models ruling both NLP and CV domains. But to understand Transformers, it is important to grasp the intuition behind RNNs, your gateway to working with sequential data.

Oh, we are really excited about this one! This tutorial marks our first venture into deep learning with sequential data. We have been a vision-first firm for a long time now, and it is about time we learn and process the information provided by the language world.

In this tutorial, we talk about sequential data and how to model it. We build a Recurrent Neural Network and train it on a well-defined application of the real world.

This lesson is the first in a 3-part series on NLP 102:

Introduction to Recurrent Neural Networks with Keras and TensorFlow (today’s tutorial)
Long Short-Term Memory Networks
Neural Machine Translation

To learn how to build a Recurrent Neural Network with TensorFlow and Keras, just keep reading.

Looking for the source code to this post?

Introduction to Recurrent Neural Networks with Keras and TensorFlow

Introduction

Imagine you have been employed by a movie critique firm. Movies receive a lot of reviews from all over the globe. Your mission, should you choose to accept it, is to predict each review’s sentiment to catch the audience’s drift.

The task is simple, given a movie review, classify it as either a positive review or a negative review. Now, as it happens, this task is known as Sentiment Classification in the Deep Learning World.

Don’t confuse this with a computer vision problem. We are not reading facial expressions by employing our old friend OpenCV. Here we deal with text data. Specifically volumes of text data. To mimic the task, we chose imdb_reviews, a 25,000 highly polar movie review dataset.

Configuring Your Development Environment

To follow this guide, you need to have the TensorFlow and the TensorFlow Datasets library installed on your system.

Luckily, both are pip-installable:

$ pip install tensorflow
$ pip install tensorflow_datasets
$ pip install matplotlib

Having Problems Configuring Your Development Environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

We first need to review our project directory structure.

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.

From there, take a look at the directory structure:

$ tree -dirsfirst
.
|____ output
| |____ lstm_plot.png
| |____ rnn_plot.png
|____ pyimagesearch
| |____ plot.py
| |____ save_load.py
| |____ config.py
| |____ standardization.py
| |____ __init__.py
| |____ model.py
| |____ dataset.py
|____ train.py
|____ inference.py
|____ terminal_output.txt

In the pyimagesearch directory, we have:

plot.py: Script to help us visualize outputs.
save_load.py: Script to load and save model weights.
config.py: Script containing the entire configuration pipeline.
standardization.py: Script containing utilities to help us prepare the data.
__init__.py: Script, which turns the directory into a python package.
model.py: Script housing the model.
dataset.py: Script to help us load the data to our project.

In the core directory, we have two scripts:

train.py: Script to train the RNN model.
inference.py: Script to draw inference from our

Note: The code download for this blog post contains code snippets for Long Short-Term Memory (LSTM) as well. These will be covered in the following blog post on LSTM.

What Are Sequential Data

Before we go into movie reviews and understand their sentiment, we first need to understand the data.

We are all Computer Vision engineers here and know how images are an array of numbers. But how do we interpret a corpus of text?

If we think about this, all texts can be easily represented as a sequence of characters, as shown in Figure 2. Notice how text is not just a collection but a sequence of characters. This signifies that the characters are equally as important as the order in which they reside.

**Figure 2:** Example of Sequential Data (image by the author).

Any data where the order or sequence is as essential as the data itself is called Sequential Data. Some examples of Sequential Data are sentences, stock market data, audio data, etc.

Let us now try to understand how movie reviews relate to Sequential Data. We first open the dataset.py python file, which helps load the dataset on disk.

# import the necessary packages
import tensorflow_datasets as tfds

We begin with the necessary imports with Line 2. The dataset we will use (imdb_reviews) is already in the tensorflow_datasets package.

Next, we define the get_imdb_dataset function to load the dataset on the disk.

def get_imdb_dataset(folderName, batchSize, bufferSize, autotune,
	test=False):
	# check whether the test flag is true
	if test:
		# load the test dataset, batch it, and prefetch it
		testDs = tfds.load(
			name="imdb_reviews",
			data_dir=folderName,
			as_supervised=True,
			shuffle_files=True,
			split="test"
		)
		testDs = testDs.batch(batchSize).prefetch(autotune)
		
		# return the test dataset
		return testDs
	# otherwise we will be loading the training and validation dataset
	else:
		# load the training and validation dataset
		(trainDs, valDs) = tfds.load(
			name="imdb_reviews",
			data_dir=folderName,
			as_supervised=True,
			shuffle_files=True,
			split=["train[:90%]", "train[90%:]"]
		)

		# shuffle, batch, and prefetch the train and the validation
		# dataset
		trainDs = (trainDs
			.shuffle(bufferSize)
			.batch(batchSize)
			.prefetch(autotune)
		)
		valDs = (valDs
			.shuffle(bufferSize)
			.batch(batchSize)
			.prefetch(autotune)
		)

		# return the train and the validation dataset
		return (trainDs, valDs)

This function takes in the following inputs:

folderName: the path on the local system to which the dataset will be downloaded
batchSize: the size in which we want to batch our data
bufferSize: the size of the buffer from which elements are randomly selected
autotune: a constant provided by the tf.data API for space optimization while prefetching
test: a Boolean flag used to determine if the dataset to be loaded is for testing or training purposes

Lines 7-16 execute when the test flag is set to True. This code snippet downloads (or uses the cached) test split of the dataset, batches, and prefetches it.

Lines 21-45 execute when the test flag is set to False. This means that this dataset will be used for training. The code snippet downloads (or uses the cached) train and validation split of the dataset, shuffles, batches, and prefetches it.

The only difference between the two clauses (training and testing) is that we shuffle the training dataset while keeping the testing dataset.

But having the data at hand, loading, batching, and prefetching it is not enough. Primarily because the data looks like this, as shown in Figure 3.

**Figure 3:** Dataset at a glance (source: imdb_reviews).

To make this data usable:

We need to remove the unnecessary characters (standardization)
Tokenize the dataset
Vectorize the tokens

We will follow each of the steps gradually. First, let us see how to standardize the dataset for our purposes.

But what is standardization? It removes unnecessary punctuations and HTML tags from the text corpus. It is a pre-processing step that is very important in any text data pipeline.

We create a custom standardization function in the standardization.py file.

# import the necessary packages
import tensorflow as tf
import string
import re

def custom_standardization(inputData):
	# transform everything to lowercase
	lowercase = tf.strings.lower(inputData)

	# strip off the html break point and punctuations and return it
	strippedHtml = tf.strings.regex_replace(lowercase, "<br />", " ")
	strippedPunctuation = tf.strings.regex_replace(strippedHtml,
		f"[{re.escape(string.punctuation)}]", "")
	return strippedPunctuation

On Lines 2-4, we import the necessary packages needed.

Next, we define the custom_standardization function to standardize our dataset. The dataset is first converted to lowercase on Line 8. The HTML tags, spaces, and punctuations are removed on Lines 11-13, and finally, the standardized dataset is returned on Line 14.

After standardizing our text dataset, the next step is to tokenize and vectorize it. Tokenization refers to the process of splitting tokens from the dataset into units. A token can be a character, a word, or a sentence. They are created according to the task at hand.

For our sentiment classifier, we will consider a token to be a word.

Can we feed tokens to the deep learning model after tokenization? Not just yet.

We still need to convert these tokens into numbers. The process of representing tokens into numbers is called text vectorization. With vectorization, each token (word) in our text corpus will be represented by a number.

The following are the basic steps of text vectorization:

We will create a dictionary of all the unique words from the text corpus. Assign unique numbers for every word. This dictionary is called the vocabulary.
Now we will substitute the tokens (words) in the dataset with their respective unique numbers as in the vocabulary dictionary.

The entire process is demonstrated in Figure 4.

**Figure 4:** Tokenization (image by the author).

A Caveat: Masking and Padding

Padding and masking are very useful techniques that will optimize our training. But what exactly is its function, and why do we need it?

Imagine you have a corpus of text. The sentences within this corpus are of different lengths. This means that every sentence has to be standardized to a common length.

Practice makes perfect
Perfection is a dream worth chasing

Becomes …

Practice makes perfect [pad] [pad] [pad] [pad]
Perfection is a dream worth chasing [pad]

Now these [pad] tokens are of no use in the model except for standardizing the length. So we mask these tokens while training. The process of masking and padding is demonstrated in Figure 5.

**Figure 5:** Masking and padding (image by the author).

Modeling Sequential Data with MLPs

But before we dive into modeling sequential data with Recurrent Neural Networks, let us first understand the shortcomings of the models we would have generally used.

What if we tried to model sequential data with Multilayer Perceptrons (MLPs)?

The answer lies in the question. We create a sequence of our data and pass it through the MLP. The MLP starts learning the data, but something is missing. The process is demonstrated in Figure 6.

**Figure 6:** Sequential data through a multilayer perceptron (image by the author).

Can you tell what is missing in Figure 6?

The MLP learns individual data points. Each time a data point is passed through the MLP, the color of the entire network changes. This signifies that the MLP models that particular data point. So far, so good.

But there is something else here: $x_1$ is succeeded by $x_2$ and then by $x_3$ and so on. This makes the data sequential. While the MLP can learn the individual data points perfectly, it will not be able to learn the order of the data.

And therein lies the problem.

The Recurrence Formula

To solve this problem, we need a function that takes care of the current and previous states.

$y(t) = f (x(t) + y(t-1))$

$x(t)$ is the current input. $y(t)$ is the current output and $y(t-1)$ is the past output. The function that models this relation is represented by $f(x)$ .

Let’s unfold this equation and understand its importance.

At $t=1$
$y(1) = f(x(1) + y(0))$
At $t=2$
$y(2) = f(x(2) + y(1))$
$y(2) = f(x(2) + f( x(1) + y(0)))$

It is quite evident from the above examples that the equation can model the previous states along with the current state. This is an extremely important equation we will use throughout our tutorial.

In the case of Recurrent Neural Networks the function $f(x)$ is a simple $\tanh$ that introduces nonlinearity. If you need a quick brush-up on the tanh function, feel free to play around with the interactive graph below:

The recurrence formula for Recurrent Neural Network can be represented as:

$h(t) = \tanh(W_{x} x_{t} +W_{h} h_{t-1})$

$h(t)$ is called the hidden state of the network. Notice how the current hidden state is a function of the current input $x_{t}$ and the previous hidden state $h_{t-1}$ . This is demonstrated in Figure 7.

**Figure 7:** The Recurrence formula (image by the author).

Recurrent Neural Network (an overview)

Now that we know the formula behind Recurrent Neural Networks, let us see how to build one. The diagram of a Recurrent Neural Network is shown in Figure 8.

**Figure 8:** RNN architecture (image by the author).

An RNN cell consists of three parts:

The input or $x_i$
The output or $y_i$
And the hidden state $h_i$

The hidden state is fed back as input to the next state and the input for that state. Fortunately, this is implemented using the SimpleRNN function inside the tf.keras.layers API.

The relation between the input, output, and the hidden state is demonstrated in the following. Figure 9 shows an unfolded RNN.

**Figure 9:** RNNs unpacked (image by the author).

And now, we begin to build our RNN model. Let’s go through the model.py file.

# import the necessary packages
from tensorflow.keras import layers
from tensorflow import keras

We begin with the necessary imports on Lines 2-3.

def get_rnn_model(vocabSize):
	# input for variable-length sequences of integers
	inputs = keras.Input(shape=(None,), dtype="int32")

	# embed the tokens in a 128-dimensional vector with masking
	# applied and apply dropout
	x = layers.Embedding(vocabSize, 128, mask_zero=True)(inputs)
	x = layers.Dropout(0.2)(x)

	# add 3 simple RNNs
	x = layers.SimpleRNN(64, return_sequences=True)(x)
	x = layers.SimpleRNN(64, return_sequences=True)(x)
	x = layers.SimpleRNN(64)(x)

	# add a classifier head
	x = layers.Dense(units=64, activation="relu")(x)
	x = layers.Dense(units=32, activation="relu")(x)
	x = layers.Dropout(0.2)(x)
	outputs = layers.Dense(1, activation="sigmoid")(x)
	
	# build the RNN model
	model = keras.Model(inputs, outputs, name="RNN")
	
	# return the RNN model
	return model

On Line 5, we define the get_rnn_model we use to create the model. On Line 7, we create the input layer for a variable length sequence of input. The padding and masking mechanism is taken care of by our Embedding layer, which we initialize in Line 11. The Embedding layer also embeds the variable-length sequence into a 128-dimensional vector. For those unclear, consider this 128-dimensional matrix as a way the computers view and assign meaning to the text.

On Line 12, we apply dropout to the inputs. We add 3 simple RNN cells on Lines 15-17 and a classifier head with a sigmoid activation on Lines 20-23.

Finally, the model is built using keras.Model API on Line 26 and returned on Line 29.

Training and Visualizations

With the model created and ready, we can finally begin our training procedure. But before that, we must first look at some helper functions that will assist with visualizations and saving.

We begin with a function inside plot.py that will help us plot the loss and accuracy.

# import the necessary packages
import matplotlib.pyplot as plt

def plot_loss_accuracy(history, filepath):
	# plot the training and validation loss
	plt.style.use("ggplot")
	(fig, axs) = plt.subplots(2, 1)
	axs[0].plot(history["loss"], label="train_loss")
	axs[0].plot(history["val_loss"], label="val_loss")
	axs[0].set_xlabel("Epoch #")
	axs[0].set_ylabel("Loss")
	axs[0].legend()
	axs[1].plot(history["accuracy"], label="train_accuracy")
	axs[1].plot(history["val_accuracy"], label="val_accuracy")
	axs[1].set_xlabel("Epoch #")
	axs[1].set_ylabel("Accuracy")
	axs[1].legend()
	fig.savefig(filepath)

On Line 2, we begin by importing matplotlib. On Line 4, we define the plot_loss_accuracy function that takes the model history and filepath as input.

Next, on Lines 6-17, we plot the loss and accuracy for training and validation and save the figure in the specified filepath (Line 18).

Next, we define another helper function called save_load.py to save the adapted vectorization layer for later use.

from tensorflow.keras.layers import TextVectorization
import tensorflow as tf
import pickle

def save_vectorizer(vectorizer, name):
	# pickle the weights of the vectorization layer
	pickle.dump({"weights": vectorizer.get_weights()},
		open(f"{name}.pkl", "wb"))

We begin with the necessary imports on Lines 1-3. Next, we define a function called save_vectorizer on Lines 5-8 that pickles and saves the weight of the vectorization layer.

And finally, with all the necessary functions defined, we can start with train.py, where we actually train our RNN model.

# USAGE
# python train.py

# set the seed for reproducibility
import tensorflow as tf
tf.keras.utils.set_random_seed(42)

# import the necessary packages
from pyimagesearch.standardization import custom_standardization
from pyimagesearch.plot import plot_loss_accuracy
from pyimagesearch.save_load import save_vectorizer
from pyimagesearch.dataset import get_imdb_dataset
from pyimagesearch.model import get_rnn_model
from pyimagesearch.model import get_lstm_model
from pyimagesearch import config
from tensorflow.keras import layers
from tensorflow import keras
import os

We begin with all the necessary imports on Lines 5-18.

# get the IMDB dataset
print("[INFO] getting the IMDB dataset...")
(trainDs, valDs) = get_imdb_dataset(folderName=config.DATASET_PATH,
	batchSize=config.BATCH_SIZE, bufferSize=config.BUFFER_SIZE,
	autotune=tf.data.AUTOTUNE, test=False)

# initialize the text vectorization layer
vectorizeLayer = layers.TextVectorization(
	max_tokens=config.VOCAB_SIZE,
	output_mode="int",
	output_sequence_length=config.MAX_SEQUENCE_LENGTH,
	standardize=custom_standardization,
)

# grab the text from the training dataset and adapt the text
# vectorization layer on it
trainText = trainDs.map(lambda text, label: text)
vectorizeLayer.adapt(trainText)

# vectorize the training and the validation dataset
trainDs = trainDs.map(lambda text, label: (vectorizeLayer(text), label))
valDs = valDs.map(lambda text, label: (vectorizeLayer(text), label))

# get the RNN model and compile it
print("[INFO] building the RNN model...")
modelRNN = get_rnn_model(vocabSize=config.VOCAB_SIZE)
modelRNN.compile(metrics=["accuracy"],
	optimizer=keras.optimizers.Adam(learning_rate=config.LR),
	loss=keras.losses.BinaryCrossentropy(from_logits=False),
)

# train the RNN model
print("[INFO] training the RNN model...")
historyRNN = modelRNN.fit(trainDs, epochs=config.EPOCHS,
	validation_data=valDs,
)

Next, on Lines 22-24, we get the IMDb dataset for movie reviews.

On Lines 27-32, we initialize the text vectorization layer with:

max_tokens: The maximum number of tokens inside the vocabulary.
output_mode: The data type for the output.
output_sequence_length: The maximum sequence length that we will need for padding and masking.
standardize: The custom standardization function that we defined previously.

On Lines 36 and 37, we adapt the vectorization layer on the training dataset.

When a text vectorization layer is initialized, it does not hold any information about the vocabulary of the training corpus. To build a vocabulary, we need to pass the entire training dataset through the vectorization layer. This is known as adapting to the text.

Finally, on Lines 40 and 41, we vectorize the text of the training and validation dataset using the adapted vectorization layer.

On Lines 44-49, we initialize the predefined RNN model and compile it with Adam optimizer and Binary Cross-Entropy loss.

On Lines 52-55, we fit the model on the training data and save its history onto the historyRNN variable.

# check whether the output folder exists, if not build the output folder
if not os.path.exists(config.OUTPUT_PATH):
	os.makedirs(config.OUTPUT_PATH)

# save the loss and accuracy plots of RNN and LSTM models
plot_loss_accuracy(history=historyRNN.history, filepath=config.RNN_PLOT)
plot_loss_accuracy(history=historyLSTM.history, filepath=config.LSTM_PLOT)

On Lines 72 and 73, we check whether the output path exists or create it otherwise. The plot_loss_accuracy function is called on Line 76 to visualize the loss and accuracy of the RNN model.

# save the trained RNN and LSTM models to disk
print(f"[INFO] saving the RNN model to {config.RNN_MODEL_PATH}...")
keras.models.save_model(model=modelRNN, filepath=config.RNN_MODEL_PATH,
	include_optimizer=False)
print(f"[INFO] saving the LSTM model to {config.LSTM_MODEL_PATH}...")
keras.models.save_model(model=modelLSTM, filepath=config.LSTM_MODEL_PATH,
	include_optimizer=False)

On Lines 80-82, we then save the trained RNN model on our disk.

# save the text vectorization layer to disk
save_vectorizer(vectorizer=vectorizeLayer, name=config.TEXT_VEC_PATH)

Finally, on Line 88, we save the text vectorization layer we used to vectorize the training and validation data.

We can verify the model output’s training, validation loss, and accuracy. We have a 67.88% validation accuracy in just 10 epochs!

$ python train.py
[INFO] getting the IMDB dataset...
[INFO] building the RNN model...
[INFO] training the RNN model...
Epoch 1/10
22/22 [==============================] - 11s 329ms/step - loss: 0.7018 - accuracy: 0.4971 - val_loss: 0.6981 - val_accuracy: 0.4820
Epoch 2/10
22/22 [==============================] - 7s 320ms/step - loss: 0.6935 - accuracy: 0.5152 - val_loss: 0.6967 - val_accuracy: 0.4916
Epoch 3/10
22/22 [==============================] - 7s 293ms/step - loss: 0.6883 - accuracy: 0.5405 - val_loss: 0.6959 - val_accuracy: 0.5000
Epoch 4/10
22/22 [==============================] - 7s 307ms/step - loss: 0.6850 - accuracy: 0.5509 - val_loss: 0.6952 - val_accuracy: 0.5064
Epoch 5/10
22/22 [==============================] - 7s 303ms/step - loss: 0.6802 - accuracy: 0.5673 - val_loss: 0.6950 - val_accuracy: 0.5100
Epoch 6/10
22/22 [==============================] - 7s 302ms/step - loss: 0.6729 - accuracy: 0.5915 - val_loss: 0.6953 - val_accuracy: 0.5136
Epoch 7/10
22/22 [==============================] - 7s 294ms/step - loss: 0.6650 - accuracy: 0.6094 - val_loss: 0.6943 - val_accuracy: 0.5232
Epoch 8/10
22/22 [==============================] - 7s 303ms/step - loss: 0.6493 - accuracy: 0.6402 - val_loss: 0.6812 - val_accuracy: 0.5668
Epoch 9/10
22/22 [==============================] - 7s 294ms/step - loss: 0.6141 - accuracy: 0.6774 - val_loss: 0.6379 - val_accuracy: 0.6380
Epoch 10/10
22/22 [==============================] - 7s 296ms/step - loss: 0.5501 - accuracy: 0.7335 - val_loss: 0.5945 - val_accuracy: 0.6788

Loading and Inference

Now that the model has been trained and saved to disk, we need to perform inference to understand the model performance. Before starting with inference, let us first open save_load.py again and look at the load_vectorizer function.

def load_vectorizer(name, maxTokens, outputLength, standardize=None):
	# load the pickles data
	fromDisk = pickle.load(open(f"{name}.pkl", "rb"))

	# build a new vectorization layer
	newVectorizer = TextVectorization(max_tokens=maxTokens,
		output_mode="int", output_sequence_length=outputLength,
		standardize=standardize)

	# call the adap method with some dummy data for the vectorization
	# layer to initialize properly
	newVectorizer.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
	newVectorizer.set_weights(fromDisk["weights"])

	# return the vectorization layer
	return newVectorizer

We define the load_vectorizer function on Line 10, which takes the following input:

name: the file path of the saved TextVectorization layer weights
maxTokens: the maximum number of tokens in the vocabulary
outputLength: the length of the output sequence
standardize: we do not need any standardization function here

In the training pipeline, we build a TextVectorization layer to tokenize and vectorize our training data. In the inference pipeline, we need the same TextVectorization layer as the training pipeline.

To do this, we save the weights of the adapted TextVectorization layer and load the weights on top of a newly initialized layer. On Line 12, we load the weights of the saved TextVectorization layer. On Line 25, we return the loaded vectorizer.

# USAGE
# python inference.py

# import the necessary packages
from pyimagesearch.standardization import custom_standardization
from pyimagesearch.save_load import load_vectorizer
from pyimagesearch.dataset import get_imdb_dataset
from pyimagesearch import config
from tensorflow import keras
import tensorflow as tf

# load the pre-trained RNN and LSTM model
print("[INFO] loading the pre-trained RNN model...")
modelRnn = keras.models.load_model(filepath=config.RNN_MODEL_PATH)
modelRnn.compile(optimizer="adam", metrics=["accuracy"],
	loss=keras.losses.BinaryCrossentropy(from_logits=False),
)
print("[INFO] loading the pre-trained LSTM model...")
modelLstm = keras.models.load_model(filepath=config.LSTM_MODEL_PATH)
modelLstm.compile(optimizer="adam", metrics=["accuracy"],
	loss=keras.losses.BinaryCrossentropy(from_logits=False),
)

On Lines 5-10, we import the necessary packages. On Line 14, the saved RNN model is loaded back to disk. The loaded model is then compiled with the suitable metrics, loss, and optimizer on Lines 15-17.

# get the IMDB dataset
print("[INFO] getting the IMDB test dataset...")
testDs = get_imdb_dataset(folderName=config.DATASET_PATH,
	batchSize=config.BATCH_SIZE, bufferSize=config.BUFFER_SIZE,
	autotune=tf.data.AUTOTUNE, test=True)

# load the pre-trained text vectorization layer
vectorizeLayer = load_vectorizer(name=config.TEXT_VEC_PATH,
	maxTokens=config.VOCAB_SIZE,
	outputLength=config.MAX_SEQUENCE_LENGTH,
	standardize=custom_standardization)

# vectorize the test dataset
testDs = testDs.map(lambda text, label: (vectorizeLayer(text), label))

# evaluate the trained RNN and LSTM model on the test dataset
for model in [modelRnn, modelLstm]:
	print(f"[INFO] test evaluation for {model.name}:")
	(testLoss, testAccuracy) = model.evaluate(testDs)
	print(f"\t[INFO] test loss: {testLoss:0.2f}")
	print(f"\t[INFO] test accuracy: {testAccuracy * 100:0.2f}%")

On Lines 26-28, we get the testing dataset and pre-process it. Line 37 will map the dataset to get the vectorized tokens and labels.

On Lines 40-44, we evaluate the testing accuracy and the testing loss of the RNN model. Our model achieves 68.42% accuracy at inference!

$ python inference.py
[INFO] loading the pre-trained RNN model...
[INFO] loading the pre-trained LSTM model...
[INFO] getting the IMDB test dataset...
[INFO] test evaluation for RNN:
25/25 [==============================] - 4s 96ms/step - loss: 0.6035 - accuracy: 0.6842
        [INFO] test loss: 0.60
        [INFO] test accuracy: 68.42%

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: May 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, we have covered the basics of sequential data and how to model them. We at PyImageSearch always want our readers to have a strong foundation. To cover some exciting blog posts on Attention and Transformers (coming soon, really soon), this was the first big step to take.

You are now familiar with a modest amount of text processing utilities and the recurrence formula and can build a Recurrent Neural Network for modeling sequential data of any sort.

In the coming tutorial on this series, we will study the shortcomings of RNN and understand how to bypass them using Long Short-Term Memory.

References

https://www.tensorflow.org/tutorials/keras/text_classification
https://wandb.ai/authors/rnn-viz/reports/Under-the-Hood-of-RNNs–VmlldzoyNTQ4MjY
https://keras.io/api/layers/recurrent_layers/simple_rnn/
Mathematical Animations generated using: Manim CE
Interactive graph generated using: Desmos

Citation Information

A. R. Gosthipaty, D. Chakraborty, and R. Raha. “Introduction to Recurrent Neural Networks with Keras and TensorFlow,” PyImageSearch, P. Chugh, S. Huot, K. Kidriavsteva, and A. Thanki, eds., 2022, https://pyimg.co/a3dwm

@incollection{GCR_2022_RNN,
  author = {Aritra Roy Gosthipaty and Devjyoti Chakraborty and Ritwik Raha},
  title = {Introduction to Recurrent Neural Networks with Keras and TensorFlow},
  booktitle = {PyImageSearch},
  editor = {Puneet Chugh and and Susan Huot and Kseniia Kidriavsteva and Abhishek Thanki},
  year = {2022},
  note = {https://pyimg.co/a3dwm},
}

Unleash the potential of computer vision with Roboflow - Free!

Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.

Join Roboflow Now

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Table of Contents

Introduction to Recurrent Neural Networks with Keras and TensorFlow

Looking for the source code to this post?

Introduction to Recurrent Neural Networks with Keras and TensorFlow

Introduction

Configuring Your Development Environment

Having Problems Configuring Your Development Environment?

Project Structure

What Are Sequential Data

A Caveat: Masking and Padding

Modeling Sequential Data with MLPs

The Recurrence Formula

Recurrent Neural Network (an overview)

Training and Visualizations

Loading and Inference

What's next? We recommend PyImageSearch University.

Summary

References

Citation Information

Unleash the potential of computer vision with Roboflow - Free!

Download the Source Code and FREE 17-page Resource Guide

About the Author

Comment section

PyImageSearch University

Multi-Task Learning and HydraNets with PyTorch

Getting Started with Docker for Machine Learning

Install dlib (the easy, complete guide)

Topics

Books & Courses

PyImageSearch

Table of Contents

Looking for the source code to this post?

What's next? We recommend PyImageSearch University.

Unleash the potential of computer vision with Roboflow - Free!

Download the Source Code and FREE 17-page Resource Guide

About the Author

Computer Vision and Deep Learning for Banking and Finance

Image Translation with Pix2Pix

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?