Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Inside this Keras tutorial, you will discover how easy it is to get started with deep learning and Python. You will use the Keras deep learning library to train your first neural network on a custom image dataset, and from there, you’ll implement your first Convolutional Neural Network (CNN) as well.

The inspiration for this guide came from PyImageSearch reader, Igor, who emailed me a few weeks ago and asked:

Hey Adrian, thanks for the PyImageSearch blog. I’ve noticed that nearly every “getting started” guide I come across for Keras and image classification uses either the MNIST or CIFAR-10 datasets which are built into Keras. I just call one of those functions and the data is automatically loaded for me.

But how do I go about using my own image dataset with Keras?

What steps do I have to take?

Igor has a great point — most Keras tutorials you come across will try to teach you the basics of the library using an image classification dataset such MNIST (handwriting recognition) or CIFAR-10 (basic object recognition).

These image datasets are standard benchmarks in the computer vision and deep learning literature, and sure, they will absolutely get you started using Keras…

…but they aren’t necessarily practical in the sense that they don’t teach you how to work with your own set of images residing on disk. Instead, you’re just calling helper functions to load pre-compiled datasets.

I’m going with a different take on an introductory Keras tutorial.

Instead of teaching you how to utilize one of these pre-compiled datasets, I’m going to teach you how to train your first neural network and Convolutional Neural Network using a custom dataset — because let’s face it, your goal is to apply deep learning to your own dataset, not one built into Keras, am I right?

To learn how to get started with Keras, Deep Learning, and Python, just keep reading!

Looking for the source code to this post?

Keras Tutorial: How to get started with Keras, Deep Learning, and Python

2020-05-13 Update: This blog post is now TensorFlow 2+ compatible!

Today’s Keras tutorial is designed with the practitioner in mind — it is meant to be a practitioner’s approach to applied deep learning.

That means that we’ll learn by doing.

We’ll be getting our hands dirty.

Writing some Keras code.

And then training our networks on our custom datasets.

This tutorial is not meant to be a deep dive into the theory surrounding deep learning.

If you’re interested in studying deep learning in odepth, including both (1) hands-on implementations and (2) a discussion of theory, I would suggest you check out my book, Deep Learning for Computer Vision with Python.

Overview of what’s going to be covered

Training your first simple neural network with Keras doesn’t require a lot of code, but we’re going to start slow, taking it step-by-step, ensuring you understand the process of how to train a network on your own custom dataset.

The steps we’ll cover today include:

Installing Keras and other dependencies on your system
Loading your data from disk
Creating your training and testing splits
Defining your Keras model architecture
Compiling your Keras model
Training your model on your training data
Evaluating your model on your test data
Making predictions using your trained Keras model

I’ve also included an additional section on training your first Convolutional Neural Network.

This may seem like a lot of steps, but I promise you, once we start getting into the example you’ll see that the examples are linear, make intuitive sense, and will help you understand the fundamentals of training a neural network with Keras.

Our example dataset

**Figure 1:** In this Keras tutorial, we won’t be using CIFAR-10 or MNIST for our dataset. Instead, I’ll show you how you can organize your own dataset of images and train a neural network using deep learning with Keras.

Most Keras tutorials you come across for image classification will utilize MNIST or CIFAR-10 — I’m not going to do that here.

To start, MNIST and CIFAR-10 aren’t very exciting examples.

These tutorials don’t actually cover how to work with your own custom image datasets. Instead, they simply call built-in Keras utilities that magically return the MNIST and CIFAR-10 datasets as NumPy arrays. In fact, your training and testing splits have already been pre-split for you!

Secondly, if you want to use your own custom datasets you really don’t know where to start. You’ll find yourself scratching your head and asking questions such as:

Where are those helper functions loading the data from?
What format should my dataset on disk be?
How can I load my dataset into memory?
What preprocessing steps do I need to perform?

Let’s be honest — your goal in studying Keras and deep learning isn’t to work with these pre-baked datasets.

Instead, you want to work with your own custom datasets.

And those introductory Keras tutorials you’ve come across only take you so far.

That’s why, inside this Keras tutorial, we’ll be working with a custom dataset called the “Animals dataset” I created for my book, Deep Learning for Computer Vision with Python:

**Figure 2:** In this Keras tutorial we’ll use an example animals dataset straight from my deep learning book. The dataset consists of dogs, cats, and pandas.

The purpose of this dataset is to correctly classify an image as containing either:

Cats
Dogs
Pandas

Containing only 3,000 images, the Animals dataset is meant to be an introductory dataset that we can quickly train a deep learning model on using either our CPU or GPU (and still obtain reasonable accuracy).

Furthermore, using this custom dataset enables you to understand:

How you should organize your dataset on disk
How to load your images and class labels from disk
How to partition your data into training and testing splits
How to train your first Keras neural network on the training data
How to evaluate your model on the testing data
How you can reuse your trained model on data that is brand new and outside your training and testing splits

By following the steps in this Keras tutorial you’ll be able to swap out my Animals dataset for any dataset of your choice, provided you utilize the project/directory structure detailed below.

Need data? If you need to scrape images from the internet to create a dataset, check out how to do it the easy way with Bing Image Search, or the slightly more involved way with Google Images.

Project structure

There are a number of files associated with this project. Grab the zip from the “Downloads” section and then use the tree command to show the project structure in your terminal (I’ve provided two command line argument flags to tree to make the output nice and clean):

$ tree --dirsfirst --filelimit 10
.
├── animals
│   ├── cats [1000 entries exceeds filelimit, not opening dir]
│   ├── dogs [1000 entries exceeds filelimit, not opening dir]
│   └── panda [1000 entries exceeds filelimit, not opening dir]
├── images
│   ├── cat.jpg
│   ├── dog.jpg
│   └── panda.jpg
├── output
│   ├── simple_nn.model
│   ├── simple_nn_lb.pickle
│   ├── simple_nn_plot.png
│   ├── smallvggnet.model
│   ├── smallvggnet_lb.pickle
│   └── smallvggnet_plot.png
├── pyimagesearch
│   ├── __init__.py
│   └── smallvggnet.py
├── predict.py
├── train_simple_nn.py
└── train_vgg.py

7 directories, 14 files

As previously discussed, today we’ll be working with the Animals dataset. Notice how animals is organized in the project tree. Inside of animals/ , there are three class directories: cats/ , dogs/ , panda/ . Within each of those directories is 1,000 images pertaining to the respective class.

If you work with your own dataset, just organize it the same way! Ideally you’ll gather 1,000 images per class at a minimum. This isn’t always possible, but you should at least have class balance. Significantly more images in one class folder could cause model bias.

Next is the images/ directory. This directory contains three images for testing purposes which we’ll use to demonstrate how to (1) load a trained model from disk and then (2) classify an input image that is not part of our original dataset.

The output/ folder contains three types of files which are generated by training:

.model : A serialized Keras model file is generated after training and can be used in future inference scripts.
.pickle : A serialized label binarizer file. This file contains an object which contains class names. It accompanies a model file.
.png : I always place my training/validation plot images in the output folder as it is an output of the training process.

The pyimagesearch/ directory is a module. Contrary to the many questions I receive, pyimagesearch is not a pip-installable package. Instead it resides in the project folder and classes contained within can be imported into your scripts. It is provided in the “Downloads” section of this Keras tutorial.

Today we’ll be reviewing four .py files:

In the first half of the blog post, we’ll train a simple model. The training script is train_simple_nn.py .
We’ll advance to training SmallVGGNet using the train_vgg.py script.
The smallvggnet.py file contains our SmallVGGNet class, a Convolutional Neural Network.
What good is a serialized model unless we can deploy it? In predict.py , I’ve provided sample code for you to load a serialized model + label file and make an inference on an image. The prediction script is only useful after we have successfully trained a model with reasonable accuracy. It is always useful to run this script to test with images that are not contained within the dataset.

Configuring your development environment

**Figure 3:** We’ll use Keras with the TensorFlow backend in this introduction to Keras for deep learning blog post.

To configure your system for this tutorial, I first recommend following either of these tutorials:

Either tutorial will help you configure you system with all the necessary software for this blog post in a convenient Python virtual environment.

Please note that PyImageSearch does not recommend or support Windows for CV/DL projects.

2. Load your data from disk

**Figure 4:** Step #2 of our Keras tutorial involves loading images from disk into memory.

Now that Keras is installed on our system we can start implementing our first simple neural network training script using Keras. We’ll later implement a full-blown Convolutional Neural Network, but let’s start easy and work our way up.

Open up train_simple_nn.py and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os

Lines 2-19 import our required packages. As you can see there are quite a few tools this script is taking advantage of. Let’s review the important ones:

matplotlib : This is the go-to plotting package for Python. That said, it does have its nuances, and if you’re having trouble with it, refer to this blog post. On Line 3, we instruct matplotlib to use the "Agg" backend enabling us to save plots to disk — that’s your first nuance!
sklearn : The scikit-learn library will help us with binarizing our labels, splitting data for training/testing, and generating a training report in our terminal.
tensorflow.keras : You’re reading this tutorial to learn about Keras — it is our high level frontend into TensorFlow and other deep learning backends.
imutils : My package of convenience functions. We’ll use the paths module to generate a list of image file paths for training.
numpy : NumPy is for numerical processing with Python. It is another go-to package. If you have OpenCV for Python and scikit-learn installed, then you’ll have NumPy as it is a dependency.
cv2 : This is OpenCV. At this point, it is both tradition and a requirement to tack on the 2 even though you’re likely using OpenCV 3 or higher.
…the remaining imports are built into your installation of Python!

Wheww! That was a lot, but having a good idea of what each import is used for will aid your understanding as we walk through these scripts.

Let’s parse our command line arguments with argparse:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset of images")
ap.add_argument("-m", "--model", required=True,
	help="path to output trained model")
ap.add_argument("-l", "--label-bin", required=True,
	help="path to output label binarizer")
ap.add_argument("-p", "--plot", required=True,
	help="path to output accuracy/loss plot")
args = vars(ap.parse_args())

Our script will dynamically handle additional information provided via the command line when we execute our script. The additional information is in the form of command line arguments. The argparse module is built into Python and will handle parsing the information you provide in your command string. For additional explanation, refer to this blog post.

We have four command line arguments to parse:

--dataset : The path to our dataset of images on disk.
--model : Our model will be serialized and output to disk. This argument contains the path to the output model file.
--label-bin : Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.
--plot : The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data.

With the dataset information in hand, let’s load our images and class labels:

# initialize the data and labels
print("[INFO] loading images...")
data = []
labels = []

# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)

# loop over the input images
for imagePath in imagePaths:
	# load the image, resize the image to be 32x32 pixels (ignoring
	# aspect ratio), flatten the image into 32x32x3=3072 pixel image
	# into a list, and store the image in the data list
	image = cv2.imread(imagePath)
	image = cv2.resize(image, (32, 32)).flatten()
	data.append(image)

	# extract the class label from the image path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-2]
	labels.append(label)

Here we:

Initialize lists for our data and labels (Lines 35 and 36). These will later become NumPy arrays.
Grab imagePaths and randomly shuffle them (Lines 39-41). The paths.list_images function conveniently will find all the paths to all input images in our --dataset directory before we sort and shuffle them. I set a seed so that the random reordering is reproducible.
Begin looping over all imagePaths in our dataset (Line 44).

For each imagePath , we proceed to:

Load the image into memory (Line 48).
Resize the image to 32x32 pixels (ignoring aspect ratio) as well as flatten the image (Line 49). It is critical to resize our images properly because this neural network requires these dimensions. Each neural network will require different dimensions, so just be aware of this. Flattening the data allows us to pass the raw pixel intensities to the input layer neurons easily. You’ll see later that for VGGNet we pass the volume to the network since it is convolutional. Keep in mind that this example is just a simple non-convolutional network — we’ll be looking at a more advanced example later in the post.
Append the resized image to data (Line 50).
Extract the class label of the image from the path (Line 54) and add it to the labels list (Line 55). The labels list contains the classes that correspond to each image in the data list.

Now in one fell swoop, we can apply array operations to the data and labels:

# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)

On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] (a common preprocessing step).

We also convert the labels list to a NumPy array (Line 59).

3. Construct your training and testing splits

**Figure 5:** Before fitting a deep learning or machine learning model you must split your data into training and testing sets. Scikit-learn is employed in this blog post to split our data.

Now that we have loaded our image data from disk, next we need to construct our training and testing splits:

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
	labels, test_size=0.25, random_state=42)

It is typical to allocate a percentage of your data for training and a smaller percentage of your data for testing. The scikit-learn provides a handy train_test_split function which will split the data for us.

Both trainX and testX make up the image data itself while trainY and testY make up the labels.

Our class labels are currently represented as strings; however, Keras will assume that both:

Labels are encoded as integers
And furthermore, one-hot encoding is performed on these labels making each label represented as a vector rather than an integer

To accomplish this encoding, we can use the LabelBinarizer class from scikit-learn:

# convert the labels from integers to vectors (for 2-class, binary
# classification you should use Keras' to_categorical function
# instead as the scikit-learn's LabelBinarizer will not return a
# vector)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

On Line 70, we initialize the LabelBinarizer object.

A call to fit_transform finds all unique class labels in trainY and then transforms them into one-hot encoded labels.

A call to just .transform on testY performs just the one-hot encoding step — the unique set of possible class labels was already determined by the call to .fit_transform .

Here’s an example:

[1, 0, 0] # corresponds to cats
[0, 1, 0] # corresponds to dogs
[0, 0, 1] # corresponds to panda

Notice how only one of the array elements is “hot” which is why we call this “one-hot” encoding.

4. Define your Keras model architecture

**Figure 6:** Our simple neural network is created using Keras in this deep learning tutorial.

The next step is to define our neural network architecture using Keras. Here we will be using a network with one input layer, two hidden layers, and one output layer:

# define the 3072-1024-512-3 architecture using Keras
model = Sequential()
model.add(Dense(1024, input_shape=(3072,), activation="sigmoid"))
model.add(Dense(512, activation="sigmoid"))
model.add(Dense(len(lb.classes_), activation="softmax"))

Since our model is really simple, we go ahead and define it in this script (typically I like to make a separate class in a separate file for the model architecture).

The input layer and first hidden layer are defined on Line 76. will have an input_shape of 3072 as there are 32x32x3=3072 pixels in a flattened input image. The first hidden layer will have 1024 nodes.

The second hidden layer will have 512 nodes (Line 77).

Finally, the number of nodes in the final output layer (Line 78) will be the number of possible class labels — in this case, the output layer will have three nodes, one for each of our class labels (“cats”, “dogs”, and “panda”, respectively).

5. Compile your Keras model

**Figure 7:** Step #5 of our Keras tutorial requires that we compile our model with an optimizer and loss function.

Once we have defined our neural network architecture, the next step is to “compile” it:

# initialize our initial learning rate and # of epochs to train for
INIT_LR = 0.01
EPOCHS = 80

# compile the model using SGD as our optimizer and categorical
# cross-entropy loss (you'll want to use binary_crossentropy
# for 2-class classification)
print("[INFO] training network...")
opt = SGD(lr=INIT_LR)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

First, we initialize our learning rate and total number of epochs to train for (Lines 81 and 82).

Then we compile our model using the Stochastic Gradient Descent (SGD ) optimizer with "categorical_crossentropy" as the loss function.

Categorical cross-entropy is used as the loss for nearly all networks trained to perform classification. The only exception is for 2-class classification where there are only two possible class labels. In that event you would want to swap out "categorical_crossentropy" for "binary_crossentropy" .

6. Fit your Keras model to the data

**Figure 8:** In Step #6 of this Keras tutorial, we train a deep learning model using our training data and compiled model.

Now that our Keras model is compiled, we can “fit” (i.e., train) it on our training data:

# train the neural network
H = model.fit(x=trainX, y=trainY, validation_data=(testX, testY),
	epochs=EPOCHS, batch_size=32)

We’ve discussed all the inputs except batch_size . The batch_size controls the size of each group of data to pass through the network. Larger GPUs would be able to accommodate larger batch sizes. I recommend starting with 32 or 64 and going up from there.

7. Evaluate your Keras model

**Figure 9:** After we fit our model, we can use our testing data to make predictions and generate a classification report.

We’ve trained our actual model but now we need to evaluate it on our testing data.

It’s important that we evaluate on our testing data so we can obtain an unbiased (or as close to unbiased as possible) representation of how well our model is performing with data it has never been trained on.

To evaluate our Keras model we can use a combination of the .predict method of the model along with the classification_report from scikit-learn:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(x=testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=lb.classes_))

# plot the training loss and accuracy
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy (Simple NN)")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])

2020-05-13 Update: In order for this plotting snippet to be TensorFlow 2+ compatible the H.history dictionary keys are updated to fully spell out “accuracy” sans “acc” (i.e., H.history["val_accuracy"] and H.history["accuracy"]). It is semi-confusing that “val” is not spelled out as “validation”; we have to learn to love and live with the API and always remember that it is a work in progress that many developers around the world contribute to.

When running this script you’ll notice that our Keras neural network will start to train, and once training is complete, we’ll evaluate the network on our testing set:

$ python train_simple_nn.py --dataset animals --model output/simple_nn.model \
	--label-bin output/simple_nn_lb.pickle --plot output/simple_nn_plot.png
Using TensorFlow backend.
[INFO] loading images...
[INFO] training network...
Train on 2250 samples, validate on 750 samples
Epoch 1/80
2250/2250 [==============================] - 1s 311us/sample - loss: 1.1041 - accuracy: 0.3516 - val_loss: 1.1578 - val_accuracy: 0.3707
Epoch 2/80
2250/2250 [==============================] - 0s 183us/sample - loss: 1.0877 - accuracy: 0.3738 - val_loss: 1.0766 - val_accuracy: 0.3813
Epoch 3/80
2250/2250 [==============================] - 0s 181us/sample - loss: 1.0707 - accuracy: 0.4240 - val_loss: 1.0693 - val_accuracy: 0.3533
...
Epoch 78/80
2250/2250 [==============================] - 0s 184us/sample - loss: 0.7688 - accuracy: 0.6160 - val_loss: 0.8696 - val_accuracy: 0.5880
Epoch 79/80
2250/2250 [==============================] - 0s 181us/sample - loss: 0.7675 - accuracy: 0.6200 - val_loss: 1.0294 - val_accuracy: 0.5107
Epoch 80/80
2250/2250 [==============================] - 0s 181us/sample - loss: 0.7687 - accuracy: 0.6164 - val_loss: 0.8361 - val_accuracy: 0.6120
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       0.57      0.59      0.58       236
        dogs       0.55      0.31      0.39       236
       panda       0.66      0.89      0.76       278

    accuracy                           0.61       750
   macro avg       0.59      0.60      0.58       750
weighted avg       0.60      0.61      0.59       750

[INFO] serializing network and label binarizer...

This network is small, and when combined with a small dataset, takes only 2 seconds per epoch on my CPU.

Here you can see that our network is obtaining 60% accuracy.

Since we would have a 1/3 chance of randomly picking the correct label for a given image we know that our network has actually learned patterns that can be used to discriminate between the three classes.

We also save a plot of our:

Training loss
Validation loss
Training accuracy
Validation accuracy

…ensuring that we can easily spot overfitting or underfitting in our results.

**Figure 10:** Our simple neural network training script (created with Keras) generates an accuracy/loss plot to help us spot under/overfitting.

Looking at our plot we see a small amount of overfitting start to occur past epoch ~45 where our training and validation losses start to diverge and a pronounced gap appears.

Finally, we can save our model to disk so we can reuse it later without having to retrain it:

# save the model and label binarizer to disk
print("[INFO] serializing network and label binarizer...")
model.save(args["model"], save_format="h5")
f = open(args["label_bin"], "wb")
f.write(pickle.dumps(lb))
f.close()

8. Make predictions on new data using your Keras model

At this point our model is trained — but what if we wanted to make predictions on images after our network has already been trained?

What would we do then?

How would we load the model from disk?

How can we load an image and then preprocess it for classification?

Inside the predict.py script, I’ll show you how, so open it and insert the following code:

# import the necessary packages
from tensorflow.keras.models import load_model
import argparse
import pickle
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image we are going to classify")
ap.add_argument("-m", "--model", required=True,
	help="path to trained Keras model")
ap.add_argument("-l", "--label-bin", required=True,
	help="path to label binarizer")
ap.add_argument("-w", "--width", type=int, default=28,
	help="target spatial dimension width")
ap.add_argument("-e", "--height", type=int, default=28,
	help="target spatial dimension height")
ap.add_argument("-f", "--flatten", type=int, default=-1,
	help="whether or not we should flatten the image")
args = vars(ap.parse_args())

First, we’ll import our required packages and modules.

You’ll need to explicitly import load_model from tensorflow.keras.models whenever you write a script to load a Keras model from disk. OpenCV will be used for annotation and display. The pickle module will be used to load our label binarizer.

Next, let’s parse our command line arguments:

--image : The path to our input image.
--model : Our trained and serialized Keras model path.
--label-bin : Path to the serialized label binarizer.
--width : The width of the input shape for our CNN. Remember — you can’t just specify anything here. You need to specify the width that the model is designed for.
--height : The height of the image input to the CNN. The height specified must also match the network’s input shape.
--flatten : Whether or not we should flatten the image. By default, we won’t flatten the image. If you need to flatten the image, you should pass a 1 for this argument.

Next, let’s load the image and resize it based on the command line arguments:

# load the input image and resize it to the target spatial dimensions
image = cv2.imread(args["image"])
output = image.copy()
image = cv2.resize(image, (args["width"], args["height"]))

# scale the pixel values to [0, 1]
image = image.astype("float") / 255.0

And then we’ll flatten the image if necessary:

# check to see if we should flatten the image and add a batch
# dimension
if args["flatten"] > 0:
	image = image.flatten()
	image = image.reshape((1, image.shape[0]))

# otherwise, we must be working with a CNN -- don't flatten the
# image, simply add the batch dimension
else:
	image = image.reshape((1, image.shape[0], image.shape[1],
		image.shape[2]))

Flattening the image for standard fully-connected networks is straightforward (Lines 33-35).

In the case of a CNN, we also add the batch dimension, but we do not flatten the image (Lines 39-41). An example CNN is covered in the next section.

From there, let’s load the model + label binarizer into memory and make a prediction:

# load the model and label binarizer
print("[INFO] loading network and label binarizer...")
model = load_model(args["model"])
lb = pickle.loads(open(args["label_bin"], "rb").read())

# make a prediction on the image
preds = model.predict(image)

# find the class label index with the largest corresponding
# probability
i = preds.argmax(axis=1)[0]
label = lb.classes_[i]

Our model and label binarizer are loaded via Lines 45 and 46.

We can make predictions on the input image by calling model.predict (Line 49).

What does the preds array look like?

(Pdb) preds
array([[5.4622066e-01, 4.5377851e-01, 7.7963534e-07]], dtype=float32)

The 2D array contains (1) the index of the image in the batch (here there is only one index as there was only one image passed into the NN for classification) and (2) percentages corresponding to each class label, as shown by querying the variable in my Python debugger:

cats: 54.6%
dogs: 45.4%
panda: ~0%

In other words, our network “thinks” that it sees “cats” and it sure as hell “knows” that it doesn’t see a “panda”.

Line 53 finds the index of the max value (the 0-th “cats” index).

And Line 54 extracts the “cats” string label from the label binarizer.

Easy right?

Now let’s display the results:

# draw the class label + probability on the output image
text = "{}: {:.2f}%".format(label, preds[0][i] * 100)
cv2.putText(output, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7,
	(0, 0, 255), 2)

# show the output image
cv2.imshow("Image", output)
cv2.waitKey(0)

We format our text string on Line 57. This includes the label and the prediction value in percentage format.

Then we place the text on the output image (Lines 58 and 59).

Finally, we show the output image on the screen and wait until the user presses any key on Lines 62 and 63 (watch Homer Simpson try to locate the “any” key).

Our prediction script was rather straightforward.

Once you’ve used the “Downloads” section of this tutorial to download the code, you can open up a terminal and try running our trained network on custom images:

$ python predict.py --image images/cat.jpg --model output/simple_nn.model \
	--label-bin output/simple_nn_lb.pickle --width 32 --height 32 --flatten 1
Using TensorFlow backend.
[INFO] loading network and label binarizer...

Be sure that you copy/pasted or typed the entire command (including command line arguments) from within the folder relative to the script. If you’re having trouble with the command line arguments, give this blog post a read.

**Figure 11:** A cat is correctly classified with a simple neural network in our Keras tutorial.

Here you can see that our simple Keras neural network has classified the input image as “cats” with 55.87% probability, despite the cat’s face being partially obscured by a piece of bread.

9. BONUS: Training your first Convolutional Neural Network with Keras

Admittedly, using a standard feedforward neural network to classify images is not a wise choice.

Instead, we should leverage Convolutional Neural Networks (CNNs) which are designed to operate over the raw pixel intensities of images and learn discriminating filters that can be used to classify images with high accuracy.

The model we’ll be discussing here today is a smaller variant of VGGNet which I have named “SmallVGGNet”.

VGGNet-like models share two common characteristics:

Only 3×3 convolutions are used
Convolution layers are stacked on top of each other deeper in the network architecture prior to applying a destructive pooling operation

Let’s go ahead and implement SmallVGGNet now.

Open up the smallvggnet.py file and insert the following code:

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

As you can see from the imports on Lines 2-10, everything needed for the SmallVGGNet comes from keras . I encourage you to familiarize yourself with each in the Keras documentation and in my deep learning book.

We then begin to define our SmallVGGNet class and the build method:

class SmallVGGNet:
	@staticmethod
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

Our class is defined on Line 12 and the sole build method is defined on Line 14.

Four parameters are required for build : the width of the input images, the height of the height input images, the depth , and number of classes .

The depth can also be thought of as the number of channels. Our images are in the RGB color space, so we’ll pass a depth of 3 when we call the build method.

First, we initialize a Sequential model (Line 17).

Then, we determine channel ordering. Keras supports "channels_last" (i.e. TensorFlow) and "channels_first" (i.e. Theano) ordering. Lines 18-25 allow our model to support either type of backend.

Now, let’s add some layers to the network:

		# CONV => RELU => POOL layer set
		model.add(Conv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

Our first CONV => RELU => POOL layers are added by this block.

Our first CONV layer has 32 filters of size 3x3 .

It is very important that we specify the inputShape for the first layer as all subsequent layer dimensions will be calculated using a trickle-down approach.

We’ll use the ReLU (Rectified Linear Unit) activation function in this network architecture. There are a number of activation methods and I encourage you to familiarize yourself with the popular ones inside Deep Learning for Computer Vision with Python where pros/cons and tradeoffs are discussed.

Batch Normalization, MaxPooling, and Dropout are also applied.

Batch Normalization is used to normalize the activations of a given input volume before passing it to the next layer in the network. It has been proven to be very effective at reducing the number of epochs required to train a CNN as well as stabilizing training itself.

POOL layers have a primary function of progressively reducing the spatial size (i.e. width and height) of the input volume to a layer. It is common to insert POOL layers between consecutive CONV layers in a CNN architecture.

Dropout is an interesting concept not to be overlooked. In an effort to force the network to be more robust we can apply dropout, the process of disconnecting random neurons between layers. This process is proven to reduce overfitting, increase accuracy, and allow our network to generalize better for unfamiliar images. As denoted by the parameter, 25% of the node connections are randomly disconnected (dropped out) between layers during each training iteration.

Note: If you’re new to deep learning, this may all sound like a different language to you. Just like learning a new spoken language, it takes time, study, and practice. If you’re yearning to learn the language of deep learning, why not grab my highly rated book, Deep Learning for Computer Vision with Python? I promise that I break down these concepts in the book and reinforce them via practical examples.

Moving on, we reach our next block of (CONV => RELU) * 2 => POOL layers:

		# (CONV => RELU) * 2 => POOL layer set
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

Notice that our filter dimensions remain the same (3x3 , which is common for VGG-like networks); however, we’re increasing the total number of filters learned from 32 to 64.

This is followed by a (CONV => RELU => POOL) * 3 layer set:

		# (CONV => RELU) * 3 => POOL layer set
		model.add(Conv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))

Again, notice how all CONV layers learn 3x3 filters but the total number of filters learned by the CONV layers has doubled from 64 to 128. Increasing the total number of filters learned the deeper you go into a CNN (and as your input volume size becomes smaller and smaller) is common practice.

And finally we have a set of FC => RELU layers:

		# first (and only) set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(512))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

Fully connected layers are denoted by Dense in Keras. The final layer is fully connected with three outputs (since we have three classes in our dataset). The softmax layer returns the class probabilities for each label.

Now that SmallVGGNet is implemented, let’s write the driver script that will be used to train it on our Animals dataset.

Much of the code here will be similar to the previous example, but I’ll:

Review the entire script as a matter of completeness
And call out any differences along the way

Open up the train_vgg.py script and let’s get started:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.smallvggnet import SmallVGGNet
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os

The imports are the same as our previous training script with two exceptions:

Instead of from keras.models import Sequential , this time we import SmallVGGNet via
from pyimagesearch.smallvggnet import SmallVGGNet . Scroll up slightly to see the SmallVGGNet implementation.
We will be augmenting our data with ImageDataGenerator . Data augmentation is almost always recommended and leads to models that generalize better. Data augmentation involves adding applying random rotations, shifts, shears, and scaling to existing training data. You won’t see a bunch of new .png and .jpg files — it is done on the fly as the script executes.

You should recognize the other imports at this point. If not, just refer to the bulleted list above.

Let’s parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset of images")
ap.add_argument("-m", "--model", required=True,
	help="path to output trained model")
ap.add_argument("-l", "--label-bin", required=True,
	help="path to output label binarizer")
ap.add_argument("-p", "--plot", required=True,
	help="path to output accuracy/loss plot")
args = vars(ap.parse_args())

We have four command line arguments to parse:

--dataset : The path to our dataset of images on disk. This can be the path to animals/ or another dataset organized the same way.
--model : Our model will be serialized and output to disk. This argument contains the path to the output model file. Be sure to name your model accordingly so you don’t overwrite any previously trained models (such as the simple neural network one).
--label-bin : Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.
--plot : The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data. Each time you train your model with changes to parameters, you should specify a different plot filename in the command line so that you’ll have a history of plots corresponding to training notes in your notebook or notes file. This tutorial makes deep learning seem easy, but keep in mind that I went through several iterations of training before I settled on all parameters to share with you in this script.

Let’s load and preprocess our data:

# initialize the data and labels
print("[INFO] loading images...")
data = []
labels = []

# grab the image paths and randomly shuffle them
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)

# loop over the input images
for imagePath in imagePaths:
	# load the image, resize it to 64x64 pixels (the required input
	# spatial dimensions of SmallVGGNet), and store the image in the
	# data list
	image = cv2.imread(imagePath)
	image = cv2.resize(image, (64, 64))
	data.append(image)

	# extract the class label from the image path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-2]
	labels.append(label)

# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)

Exactly as in the simple neural network script, here we:

Initialize lists for our data and labels (Lines 35 and 36).
Grab imagePaths and randomly shuffle them (Lines 39-41). The paths.list_images function conveniently will find all images in our input dataset directory before we sort and shuffle them.
Begin looping over all imagePaths in our dataset (Line 44).

As we loop over each imagePath , we proceed to:

Load the image into memory (Line 48).
Resize the image to 64x64 , the required input spatial dimensions of SmallVGGNet (Line 49). One key difference is that we are not flattening our data for neural network, because it is convolutional.
Append the resized image to data (Line 50).
Extract the class label of the image from the imagePath and add it to the labels list (Lines 54 and 55).

On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] in array form.

We also convert the labels list to a NumPy array format (Line 59).

Then we’ll split our data and binarize our labels:

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
	labels, test_size=0.25, random_state=42)

# convert the labels from integers to vectors (for 2-class, binary
# classification you should use Keras' to_categorical function
# instead as the scikit-learn's LabelBinarizer will not return a
# vector)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

We perform a 75/25 training and testing split on the data (Lines 63 and 64). An experiment I would encourage you to try is to change the training split to 80/20 and see if the results change significantly.

Label binarizing takes place on Lines 70-72. This allows for one-hot encoding as well as serializing our label binarizer to a pickle file later in the script.

Now comes the data augmentation:

# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
	height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
	horizontal_flip=True, fill_mode="nearest")

# initialize our VGG-like Convolutional Neural Network
model = SmallVGGNet.build(width=64, height=64, depth=3,
	classes=len(lb.classes_))

On Lines 75-77, we initialize our image data generator to perform image augmentation.

Image augmentation allows us to construct “additional” training data from our existing training data by randomly rotating, shifting, shearing, zooming, and flipping.

Data augmentation is often a critical step to:

Avoiding overfitting
Ensuring your model generalizes well

I recommend that you always perform data augmentation unless you have an explicit reason not to.

To build our SmallVGGNet , we simply call SmallVGGNet.build while passing the necessary parameters (Lines 80 and 81).

Let’s compile and train our model:

# initialize our initial learning rate, # of epochs to train for,
# and batch size
INIT_LR = 0.01
EPOCHS = 75
BS = 32

# initialize the model and optimizer (you'll want to use
# binary_crossentropy for 2-class classification)
print("[INFO] training network...")
opt = SGD(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the network
H = model.fit(x=aug.flow(trainX, trainY, batch_size=BS),
	validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,
	epochs=EPOCHS)

First, we establish our learning rate, number of epochs, and batch size (Lines 85-87).

Then we initialize our Stochastic Gradient Descent (SGD) optimizer (Line 92).

We’re now ready to compile and train our model (Lines 93-99). Our model.fit call handles both training and on-the-fly data augmentation. We must pass the generator with our training data as the first parameter. The generator will produce batches of augmented training data according to the settings we previously made.

2020-05-13 Update: Formerly, TensorFlow/Keras required use of a method called fit_generator in order to accomplish data augmentation. Now the fit method can handle data augmentation as well, making for more-consistent code. Be sure to check out my articles about fit and fit generator as well as data augmentation.

Finally, we’ll evaluate our model, plot the loss/accuracy curves, and save the model:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(x=testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=lb.classes_))

# plot the training loss and accuracy
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.plot(N, H.history["accuracy"], label="train_acc")
plt.plot(N, H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy (SmallVGGNet)")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(args["plot"])

# save the model and label binarizer to disk
print("[INFO] serializing network and label binarizer...")
model.save(args["model"], save_format="h5")
f = open(args["label_bin"], "wb")
f.write(pickle.dumps(lb))
f.close()

We make predictions on the testing set, and then scikit-learn is employed to calculate and print our classification_report (Lines 103-105).

Matplotlib is utilized for plotting the loss/accuracy curves — Lines 108-118 demonstrate my typical plot setup. Line 119 saves the figure to disk.

2020-05-13 Update: In order for this plotting snippet to be TensorFlow 2+ compatible the H.history dictionary keys are updated to fully spell out “accuracy” sans “acc” (i.e., H.history["val_accuracy"]and H.history["accuracy"]). It is semi-confusing that “val” is not spelled out as “validation”; we have to learn to love and live with the API and always remember that it is a work in progress that many developers around the world contribute to.

Finally, we save our model and label binarizer to disk (Lines 123-126).

Let’s go ahead and train our model.

Make sure you’ve used the “Downloads” section of this blog post to download the source code and the example dataset.

From there, open up a terminal and execute the following command:

$ python train_vgg.py --dataset animals --model output/smallvggnet.model \
	--label-bin output/smallvggnet_lb.pickle \
	--plot output/smallvggnet_plot.png
Using TensorFlow backend.
[INFO] loading images...
[INFO] training network...
Train for 70 steps, validate on 750 samples
Epoch 1/75
70/70 [==============================] - 13s 179ms/step - loss: 1.4178 - accuracy: 0.5081 - val_loss: 1.7470 - val_accuracy: 0.3147
Epoch 2/75
70/70 [==============================] - 12s 166ms/step - loss: 0.9799 - accuracy: 0.6001 - val_loss: 1.6043 - val_accuracy: 0.3253
Epoch 3/75
70/70 [==============================] - 12s 166ms/step - loss: 0.9156 - accuracy: 0.5920 - val_loss: 1.7941 - val_accuracy: 0.3320
...
Epoch 73/75
70/70 [==============================] - 12s 166ms/step - loss: 0.3791 - accuracy: 0.8318 - val_loss: 0.6827 - val_accuracy: 0.7453
Epoch 74/75
70/70 [==============================] - 12s 167ms/step - loss: 0.3823 - accuracy: 0.8255 - val_loss: 0.8157 - val_accuracy: 0.7320
Epoch 75/75
70/70 [==============================] - 12s 166ms/step - loss: 0.3693 - accuracy: 0.8408 - val_loss: 0.5902 - val_accuracy: 0.7547
[INFO] evaluating network...
              precision    recall  f1-score   support

        cats       0.66      0.73      0.69       236
        dogs       0.66      0.62      0.64       236
       panda       0.93      0.89      0.91       278

    accuracy                           0.75       750
   macro avg       0.75      0.75      0.75       750
weighted avg       0.76      0.75      0.76       750

[INFO] serializing network and label binarizer...

When you paste the command, ensure that you have all the command line arguments to avoid a “usage” error. If you are new to command line arguments, make sure you read about them before continuing.

Training on a CPU will take some time — each of the 75 epochs requires over one minute. Training will take well over an hour.

A GPU will finish the process in a matter of minutes as each epoch requires only 2sec, as demonstrated!

Let’s take a look at the resulting training plot that is in the output/ directory:

**Figure 12:** Our deep learning with Keras accuracy/loss plot demonstrates that we have obtained 76% accuracy on our animals data with our SmallVGGNet model.

As our results demonstrate, you can see that we are achieving 76% accuracy on our Animals dataset using a Convolutional Neural Network, significantly higher than the previous accuracy of 60% using a standard fully-connected network.

We can also apply our newly trained Keras CNN to example images:

$ python predict.py --image images/panda.jpg --model output/smallvggnet.model \
	--label-bin output/smallvggnet_lb.pickle --width 64 --height 64
Using TensorFlow backend.
[INFO] loading network and label binarizer...

**Figure 13:** Our deep learning with Keras tutorial has demonstrated how we can confidently recognize pandas in images.

Our CNN is very confident that this a “panda”. I am too, but I just wish he would stop staring at me!

Let’s try a cute little beagle:

$ python predict.py --image images/dog.jpg --model output/smallvggnet.model \
	--label-bin output/smallvggnet_lb.pickle --width 64 --height 64
Using TensorFlow backend.
[INFO] loading network and label binarizer...

**Figure 14:** A beagle is recognized as a dog using Keras, TensorFlow, and Python. Our Keras tutorial has introduced the basics for deep learning, but has just scratched the surface of the field.

A couple beagles have been part of my family and childhood. I’m glad that this beagle picture I found online is recognized as a dog!

I could use a similar CNN to find dog photos of my beagles on my computer.

In fact, in Google Photos, if you type “dog” in the search box, pictures of dogs in your photo library will be returned — I’m pretty sure a CNN has been used for that image search engine feature. Image search engines aren’t the only use case for CNNs — I bet your mind is starting to come up with all sorts of ideas upon which to apply deep learning.

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: October 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In today’s tutorial, you learned how to get started with Keras, Deep Learning, and Python.

Specifically, you learned the seven key steps to working with Keras and your own custom datasets:

How to load your data from disk
How to create your training and testing splits
How to define your Keras model architecture
How to compile and prepare your Keras model
How to train your model on your training data
How to evaluate your model on testing data
How to make predictions using your trained Keras model

From there you also learned how to implement a Convolutional Neural Network, enabling you to obtain higher accuracy than a standard fully-connected network.

If you have any questions regarding Keras be sure to leave a comment — I’ll do my best to answer.

And to be notified when future Keras and deep learning posts are published here on PyImageSearch, be sure to enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

About the Author

Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.

162 responses to: Keras Tutorial: How to get started with Keras, Deep Learning, and Python

mohamed

September 10, 2018 at 1:54 pm

Great: Adrian!
always forward
Thank you and thank you Igor

I have a suggestion as to how to apply some basic concepts of deep learning.
About how to write those equations in Python.
Many people know the concepts but there is a barrier between them and the application.
- Adrian Rosebrock
  
  September 10, 2018 at 5:24 pm
  
  Hey Mohamed — is there a particular algorithm/equation that you’re struggling with? Or are you speaking in more general terms? If you’re speaking more generically, then Deep Learning for Computer Vision with Python covers the basic concepts of both machine learning and deep learning, including some basic equations and theory before moving into actual applications and code.
  - mohamed
    
    September 10, 2018 at 6:00 pm
    
    Thanks Adrian Yes, there are certain things that are facing me. But anyway I meant in general
    
    The book is really wonderful I will work on getting the rest of the versions of it.
Newman

September 10, 2018 at 1:58 pm

Not Working, here, different numbers on training, and a lot of wrong detection.

first:
precision recall f1-score support

cats 0.46 0.66 0.54 244
dogs 0.49 0.22 0.30 242
panda 0.69 0.78 0.73 264

avg / total 0.55 0.56 0.53 750

second method:
precision recall f1-score support

cats 0.66 0.77 0.71 244
dogs 0.76 0.55 0.63 242
panda 0.85 0.95 0.90 264

avg / total 0.76 0.76 0.75 750

everything seems fine but not the results.
- Adrian Rosebrock
  
  September 10, 2018 at 5:23 pm
  
  Could you share which version of Keras and TensorFlow (assuming a TF backend) you are running? Secondly, keep in mind that NNs are stochastic algorithms — there will naturally be variations in results and you should not expect your results to 100% match mine. The effects are random weight initializations are even more pronounced due the fact that we’re working with such a small dataset.
Enrique

September 11, 2018 at 3:44 am

Hi Adrian:
I test an image like this https://www.petdarling.com/articulos/wp-content/uploads/2014/06/como-quitarle-las-pulgas-a-mi-perro.jpg, however the result shown is “Panda 100%”. Why happend this?
- Adrian Rosebrock
  
  September 11, 2018 at 8:04 am
  
  Pandas are largely white and black and the dog itself is dark brown and white. The network could have overfit to the panda class. The example used here is just that — an example. It’s not meant to be a model that can correctly classify each image class 100% of the time. For that, you will need more data and more advanced techniques. I would encourage you to take a look at Deep Learning for Computer Vision with Python for more information.
Aiden Ralph

September 11, 2018 at 3:46 am

Brilliant post Adrian!
- Adrian Rosebrock
  
  September 11, 2018 at 8:02 am
  
  Thanks Aiden, I’m glad you liked it!
Aline

September 11, 2018 at 4:29 pm

Amazing tutorial! So clear!
- Adrian Rosebrock
  
  September 12, 2018 at 2:10 pm
  
  Thanks so much, Aline! I’m glad you found it helpful 🙂
Reed Guo

September 12, 2018 at 12:40 am

Hi, Adrian

Excellent post.

Can we improve its accuracy?
- Adrian Rosebrock
  
  September 12, 2018 at 2:03 pm
  
  You could improve the accuracy by:
  
  1. Using more images
  2. Applying transfer learning
  
  Models trained on ImageNet already have a panda, dog, and cat class as well so you could even use an out-of-the-box classifier.
Viktor

September 12, 2018 at 3:49 am

Hello! How can I download the Animals dataset?
- Adrian Rosebrock
  
  September 12, 2018 at 1:58 pm
  
  Just use the “Downloads” section of the blog post and you will be able to download the code and “Animals” dataset.
- Vignesh Suresh
  
  March 4, 2020 at 11:38 am
  
  Great Video Adrian. Thanks
Hamid

September 12, 2018 at 5:59 am

Nice one Adrian ! I really appreciate it .

Such a wonderful post with elegant and simple explanation .

I wonder if increasing the no.of hidden layers and making dropout to 0.5 would further increase the accuracy from 78
- Adrian Rosebrock
  
  September 12, 2018 at 1:55 pm
  
  It may as those are hyperparameters. Give it a try and see!
Marcelo Mota

September 12, 2018 at 2:06 pm

My friend, this is the best tutorial so far I have ever seen!! thank you so much.

I am struggling just in one point: I have a binary problem and have to use the to_categorical function from keras. As I am seeing, I can just use it with integers categories, not strings. Is this true?

And how do I use the pickle file to write this integer binary categories (from to_categorical) and also how do I use it in the classification_report (the code uses the “lb”)?

Thank you again and congratulations for such good and complete explanations!
- Adrian Rosebrock
  
  September 12, 2018 at 2:14 pm
  
  Thanks Marcelo, I’m glad you found the tutorial helpful!
  
  For a binary problem you should use the LabelEncoder class instead of LabelBinarizer. The LabelBinarizer will integer-encode the labels which you can then pass into to_categorical to obtain the vector representation of the labels.
  
  The LabelEncoder can be serialized to disk and convert labels just like the LabelBinarizer does.
Mutlucan Tokat

September 12, 2018 at 2:29 pm

Hi Adrian,
Range of the pixels are same. Every pixel gets values between 0-255. Why we need to scale it between 0 and 1 ?
- Adrian Rosebrock
  
  September 14, 2018 at 9:53 am
  
  Most (but not all) neural networks expect inputs to be in the range [0, 1]. You may see other scaling and normalization techniques, such as mean subtraction, but those are more advanced methods. Your goal here is to bring every input to a common range. Exactly which range you use may depend on:
  
  1. Weight initialization techniqu
  2. Activation technique
  3. Whether or not you are performing mean subtraction
  
  In general, scaling to [0, 1] gives your variables less “freedom” and less likely of causing gradient or overflow errors if you keep larger value ranges (such as [0, 255]). [0, 1] scaling is typically your “first” scaling technique that you’ll see.
Bob de Graaf

September 13, 2018 at 9:28 am

Hi Adrian,

Great tutorial as always! I’m wondering though, isn’t this almost the same tutorial as the Pokemon one? The where you classify different 5 pokemons in images? The code seems mostly the same 🙂

I do see some small differences though, for example in the Pokemon on you use the Adam optimizer instead of SGD, and the initial learning rate is 0.001 instead of 0.01.

Are these changes things you’ve learned these past months to achieve better results? Or were these randomly chosen?
- Adrian Rosebrock
  
  September 14, 2018 at 9:33 am
  
  The code is similar but not the same. This tutorial is meant to be significantly more introductory and is intended for readers with little to no experience with Keras and deep learning.
  
  The parameters were also not randomly chosen. They were determined via experiments to find the optimal values for this particular post.
  - Bob de Graaf
    
    September 15, 2018 at 8:42 am
    
    Ah ok, good to know, thanks! I wasn’t trying to be offensive or anything, just curious. Apologies if I came across that way!
    - Adrian Rosebrock
      
      September 17, 2018 at 2:31 pm
      
      You certainly were not being offensive, Bob. I just wanted to clarify, that’s all 🙂 Have a great day, friend!
Mutlu

September 15, 2018 at 7:39 am

Hi Adrian,

What is chanDim = -1 and chanDim = 1 in beginning of the smallVGGNET?

Great tutorial BTW.
- Adrian Rosebrock
  
  September 17, 2018 at 2:51 pm
  
  It’s the dimension of the channel. For channels-first ordering (ex. Thenano) the channels are ordered first but with channels-last ordering (like TensorFlow) the channels are ordered last — a “-1” value when indexing with Python means the “last dimension/value”.
Roshan

September 15, 2018 at 4:27 pm

Hi Adrain,
Thank you for the excellent tutorial.
I have a basic question:
During validation, we are considering train, test split as 75% and 25%respectively.
So while testing, the network randomly picks 25% of images.
But if I want to find out which images are used for testing, how can I find out?
I want to know the names of the images used for testing.
Please help me
- Adrian Rosebrock
  
  September 17, 2018 at 2:25 pm
  
  The names of the images won’t be returned by scikit-learn. Instead, if you want the exact image names I would suggest you split your image paths rather than the raw images/labels. That will enable you to still perform the split and have the image paths.
andreas

September 18, 2018 at 7:47 pm

Hi Adrian,
This was an excellent tutorial, very well presented and clear. I have a question, how would I add the bounding box using either nms or my own algorithm to show boxes around a image, like what is done in face detection?
Thanks,
Andreas
- Adrian Rosebrock
  
  October 8, 2018 at 1:35 pm
  
  We are performing image classification in this post. What you are looking to perform is called object detection. I would suggest you read this tutorial to get you started.
merly

September 20, 2018 at 9:09 am

never seen a simple and better tutorial..
- Adrian Rosebrock
  
  October 8, 2018 at 1:14 pm
  
  Thank you for the kind words Merly 🙂 Congratulations on getting your start with Keras!
merly

September 21, 2018 at 3:08 am

UserWarning: Trying to unpickle estimator LabelBinarizer from version 0.19.1 when using version 0.19.2. This might lead to breaking code or invalid results. Use at your own risk.

I am getting this error ..what should i do?
- Adrian Rosebrock
  
  October 8, 2018 at 1:10 pm
  
  Hey there, it’s not an error, it’s a warning. I would suggest you train the model first before you try to run it and make predictions.
Hashir

September 22, 2018 at 3:42 am

hi adrien,
This blog was awsome . i really appreciate you for this great effort and am your big fan.
After reading this keras + tf tutorial i understood lots of things. But i have to initialize my model or weights manually by using my own method like random initialization. so what step should i do in order to initialize this model manually…

Thanks in advance
- Adrian Rosebrock
  
  October 8, 2018 at 1:06 pm
  
  The model weights are automatically initialized during the call to .compile. You can change the initialization method by choosing one of the Keras initializers.
Salman Sajid

September 22, 2018 at 4:51 am

Thanx Adrian
Can we use this technique for activity reconization or this technique is only for static object detection
- Adrian Rosebrock
  
  October 8, 2018 at 1:04 pm
  
  The method covered here is only for image classification, not activity recognition or object detection.
Wilf

September 25, 2018 at 5:44 am

Trained using keras 2.2.2 and tensorflow 1.10.0

Prediction for both simple_nn and smallvggnet failed on the dog.jpg image.

My question: How do you analyze/understand what went wrong? Is it overfitting, too few training images, poor training image selection, overfitting or something else?
- Adrian Rosebrock
  
  October 8, 2018 at 12:45 pm
  
  In order to help get everyone up and running with Keras and deep learning we used a very small dataset for this example. Typically, we would have at least 1,000 images per class. Our network is also far from perfect. We can increase the accuracy of our model by introducing regularization methods, such as L2 weighting, additional data augmentation, etc. If you’re interested in learning more about overfitting/underfitting, including how to detect them, I would suggest you read through Deep Learning for Computer Vision with Python.
Mattia

September 27, 2018 at 6:37 am

Hi Adrian,
After many issues with installing opencv, finally i got started with opencv.
I was trying this tutorial and when i launch the program with this command

python train_vgg.py –dataset animals –model output/smallvggnet.model \
–label-bin output/smallvggnet_lb.pickle \
–plot output/smallvggnet_plot.png

this is the result:
> /home/luca/Scrivania/keras-tutorial/train_simple_nn.py(78)()
-> model = Sequential()
(Pdb)

And it doesn’ t move on,
what should i do?
- Adrian Rosebrock
  
  October 8, 2018 at 12:32 pm
  
  How are you trying to execute the script? Via the command line?
  - Kirill
    
    October 10, 2018 at 4:51 pm
    
    Got same problem. Run via command line (using fish, virtualenv, python 3.6.5, mac)
    - Adrian Rosebrock
      
      October 12, 2018 at 9:10 am
      
      Does bash produce the same error as fish?
- inf111
  
  October 23, 2018 at 3:07 pm
  
  just execute “continue” command
Hélder Ribeiro

September 27, 2018 at 7:07 am

Hhi adrian,
so using the vgg train as you describe everything goes smoothly and i’m getting the 70% plus accuracy, but when i try to predict something using the predict.py I’m always getting the panda prediction.
After doing some research I think it might be something related to the preprocessing of images??
But i’m not sure. One thing is clear when I add:
image = image.astype(‘float32’)
image = image/255
after the image read I start to have some better results, but not sure if this is the way, can you help me?
Thanks
- Adrian Rosebrock
  
  October 8, 2018 at 12:32 pm
  
  It sounds like the network is overfitting to the “panda” class. One method to increase accuracy would be to introduce more regularization, including additional data augmentation.
Stonez

October 2, 2018 at 4:56 am

Thanks for the great tutorial! Can I add more classes into the file structure, say, adding cow class?
There should be no changes in the code for this code to recognize dogs, cats, pandas, and cow, correct?

Thanks

Stonez
- Adrian Rosebrock
  
  October 8, 2018 at 10:36 am
  
  As long as you follow my directory structure for the project and add a directory named “cow” with “cow” images to the dataset directory then yes, no code changes are required.
Balaji

October 3, 2018 at 3:41 pm

Hi,
I am getting following error when i try run predict script.

…
line 294, in from_config
model = cls(name=name)
UnboundLocalError: local variable ‘name’ referenced before assignment
- Adrian Rosebrock
  
  October 8, 2018 at 10:20 am
  
  Hi Balaji, could you clarify which version of Keras and TensorFlow you are using? Additionally, did you train your model before trying to run the prediction script?
  - Alan
    
    October 8, 2018 at 11:42 am
    
    Hi Adrian.
    I am with the same problem using tensorflow 1.5.0 (because my computer does not support AVX instructions) and Keras 2.2.3, and tensorflow 1.10.0 and Keras 2.2.3 in other machine.
    I am trying:
    python predict.py –image images/cat.jpg –model output/simple_nn.model \
    –label-bin output/simple_nn_lb.pickle –width 32 –height 32 –flatten 1
    
    python predict.py –image images/panda.jpg –model output/smallvggnet.model \
    –label-bin output/smallvggnet_lb.pickle –width 64 –height 64
    
    python predict.py –image images/dog.jpg –model output/smallvggnet.model \
    –label-bin output/smallvggnet_lb.pickle –width 64 –height 64
    
    Do I need train the model? I have download your files and I am trying execute them without train.
    
    Thanks.
    - Adrian Rosebrock
      
      October 8, 2018 at 1:42 pm
      
      Yes, make sure you train the model before you try to make predictions on images.
Niranjan A

October 4, 2018 at 11:54 pm

Hello Adrian,

Every article that i check out on pyImageSearch always leaves me impressed. Great work.

I noticed that in every tutorial you use “argparse”. I wish to know if it makes any difference if we directly load our image into a variable directly instead of using argparse. If so, can you let me know what the difference is?

Thanks.
- Adrian Rosebrock
  
  October 8, 2018 at 9:55 am
  
  Hey Niranjan — I think your confusion can be resolved by reading this guide on how argparse works. As you’ll find out, argparse just allows us to supply arguments via the command line instead of manually hardcoding them 🙂
jacob

October 5, 2018 at 11:53 am

hi adrian,
can you give me an example of a path that can be add in help”……” ?
Because when I start the train simple example it arrives to “[INFO] loading images…”
and then it doesn’t go on!
Ah , thanks for these amazing tutorials!!!!!!
- Adrian Rosebrock
  
  October 8, 2018 at 9:53 am
  
  Hey Jacob, I think your confusion is related to how command line arguments work. Make sure you read this tutorial to help you clear up your confusion.
北凉徐凤年

October 12, 2018 at 5:58 am

hi adrian,
I dont understand when training train_vgg.py there is 70/70 [==================]
Where 70 come from? What does 70 mean? 70 pictures every epoch?
Thanks for your articles!
- Adrian Rosebrock
  
  October 12, 2018 at 8:46 am
  
  That is actually the number of batches per epoch. There are 70 batches of images per epoch.
  - YewBoon
    
    April 15, 2019 at 5:10 am
    
    Hi Adrian,
    
    Could you please explain further about how the 70 is calculated?
    I assume is from this line of code, right?
    
    model.fit_generator(aug.flow(trainX, trainY, batch_size=BS), validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS, epochs=EPOCHS)
    - Adrian Rosebrock
      
      April 18, 2019 at 7:15 am
      
      What specifically in Line 70 are you asking how is calculated? The steps_per_epoch value?
Jorge

October 17, 2018 at 9:44 pm

Hello Adrian. Thank you very much for this excellent tutorial. I have a little question. Could you please describe what would be the architecture in keras code of the convolutional network if the dataset only had two categories? for example cats and dogs (eliminating the pandas folder) and always in the RGB color space. If you could make a parallel with the code of this tutorial for the three categories. I know it is a question that may be basic but it would help me to understand the architecture of the cnn in keras that I still do not have very clear. Thank you very much for your excellent work and congratulations for your wedding. Jorge from Argentina.
- Adrian Rosebrock
  
  October 20, 2018 at 7:52 am
  
  The architecture itself would not change except for the final FC layer where there will be two nodes rather than three nodes. Other than that, there will be no other changes to the architecture itself. If you want to train a network for binary classification just make sure you use “binary_crossentropy” for your loss.
Yuthika Shekhar

October 21, 2018 at 1:56 pm

Hi Adrian,

Thanks for the amazing explanation. I have few doubts.

If we try implementing with another dataset, is it supposed to be organized in the way you have organized?

And also, what does 32x32x3=3072 pixels in a flattened input image mean I am not able to understand the multiplication of 3?
- Adrian Rosebrock
  
  October 22, 2018 at 7:57 am
  
  I would recommend you use the same directory structure that I use in the blog post. It will ensure that the code doesn’t have to be changed at all and you can just run the script to train on your own custom dataset.
  
  As for your second question, images are represented as a 3 channel RGB image. Thus, for a 32×32 RGB image there are a total of 32x32x3=3072 values.
Nick

October 21, 2018 at 11:56 pm

Hi, this tutorial is self-explanatory I have just started learning machine learning and this image recognition sounds really interesting and cool. I have downloaded all the required files and code from your site. I have Spyder install on Anaconda. I want to run these files. I need help in how can i start integrating these scripts.
How can i run this model on Spyder?

Thank You.
- Adrian Rosebrock
  
  October 22, 2018 at 7:52 am
  
  Hey Nick, you can certainly use an IDE if you would like but I don’t recommend if you are new to computer vision and deep learning. Take the time to invest in your ability to execute the scripts via the command line. We use the command line quite a bit so become comfortable with it now. Additionally, while I don’t use the Spyder IDE you can use this tutorial on how to use an IDE with Python.
Megan

October 23, 2018 at 7:23 pm

In Section 9, how do you choose 512 in the model.add(Dense(512)) line 60 of code after you’re done the CONV –> RELU steps?
- Adrian Rosebrock
  
  October 29, 2018 at 2:07 pm
  
  It’s a hyperparameter to the model architecture. You run experiments to tune the hyperparameters of the network. I discuss my best practices, tips, and suggestions to hyperparameter tuning inside my book, Deep Learning for Computer Vision with Python.
Farshad

October 28, 2018 at 3:26 am

Hi Adrian. Thanks for nice explanation. Is there any way to create a CNN model from scratch for object detection or object localization using keras? Can keras do it at all? I searched many posts in websites and all of them used keras for image classification only. If yes, I hope you publish a blog post tutorial about object detection by keras. Thanks for your amazing works.
- Adrian Rosebrock
  
  October 29, 2018 at 1:26 pm
  
  Great question, thanks for asking Farshad. I actually cover how to train your own custom Keras object detector inside Deep Learning for Computer Vision with Python.
Juanlu

October 28, 2018 at 5:19 pm

Great post, but there is one thing missing which is making the predictions fail.
The same way we divide the inputs by 255.0, we need to do the same thing on the predictor before providing image as input on the NN.
- Adrian Rosebrock
  
  October 30, 2018 at 6:25 am
  
  Thanks so much for pointing this out, Juanlu! It was a typo on my part. I have fixed the typo as well as the code download so the issue no longer exists.
andreas

November 13, 2018 at 10:32 pm

Hi Adrian,

How do we add the object detection bounding boxes to the images?
- Adrian Rosebrock
  
  November 15, 2018 at 12:13 pm
  
  You cannot use a model trained for image classification as an object detector. I would suggest you read this tutorial on deep learning object detection so you can learn the fundamentals.
andreas

November 15, 2018 at 4:41 am

Hi Adrian,

I get this error..any suggestions?

(-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
- Adrian Rosebrock
  
  November 15, 2018 at 11:52 am
  
  Double-check the path to your input dataset. Your path is likely incorrect and the cv2.imread function is returning “None”.
Ctibor

November 24, 2018 at 6:08 pm

Hi Adrian.
Thank you for your excellent tutorial. But it’s just for pictures. How could cnn be used to recognize sounds?
- Adrian Rosebrock
  
  November 25, 2018 at 8:55 am
  
  Sorry, I don’t have any experience with deep learning for audio applications. I only work with computer vision here. Sorry I couldn’t be of more help!
Zachary Miller

November 28, 2018 at 3:21 pm

This is by far the most simple to understand and useful tutorial on Keras that I have ever seen. You do a great job of explaining BOTH the concepts behind how the neural network works and what the different functions in the libraries are doing for us (the last part is often left out). Thank you so much!
- Adrian Rosebrock
  
  November 30, 2018 at 9:05 am
  
  Thank you so much for the kind words, Zachary — I really appreciate that 🙂
moh

December 14, 2018 at 10:10 am

Hi, Adrian
Excellent work
If I have 1 channel images (ex medical images) and I wanted to apply this program to classify them, what should I change in this program especially the input_shape ?
- Adrian Rosebrock
  
  December 18, 2018 at 9:22 am
  
  First, convert your images to grayscale when you load them:
  
  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
  
  Secondly, change your depth=1 when initializing SmallVGGNet.
David

December 20, 2018 at 8:45 am

please send me source code of this post..
- Adrian Rosebrock
  
  December 27, 2018 at 11:09 am
  
  You can use the “Downloads” section of the post to download the source code.
Akshay

January 8, 2019 at 8:03 am

where can I find the dataset for this tutorial?
- Adrian Rosebrock
  
  January 11, 2019 at 10:05 am
  
  You can use the “Downloads” section of this post to download the source code and dataset.
Jarvis

January 8, 2019 at 4:54 pm

Hi Adrian
Thank you for this tutorial.
I have two doubts :
1. At the data =np.array(data, dtype=”float”)
i am getting a sequence error , in short I am not able to convert the data dtype(‘O’) into float.
I got rid of this error by copying it into another float array. But after trying for many hours I could’nt solve this error.
2. I am getting loss =NaN
I have checked my input data and I am sure that none of the input values have nan value.

Any help will be appreciated.
thank you
- Adrian Rosebrock
  
  January 11, 2019 at 9:59 am
  
  Are you using the exact code and datasets from this tutorial? Or are you working with your own custom dataset?
  - Jarvis
    
    January 20, 2019 at 6:10 pm
    
    Hi thanks Adrian for the post again.
    I figured the errors myself. I was using a custom dataset and some of the images were corrupted due to which i was getting these errors.
    - Adrian Rosebrock
      
      January 22, 2019 at 9:29 am
      
      Congrats on resolving the issue!
AHMED ARUP KAMAL

January 21, 2019 at 9:53 am

Hi Adrian,

I managed to run it. But after 1200 epoch and 0.01 learning rate, my training accuracy is ~1.0 and validation accuracy is ~0.50!
What’s happening??!!
- Adrian Rosebrock
  
  January 22, 2019 at 9:19 am
  
  Your network is overfitting to the data. Training for longer isn’t necessarily going to give you better accuracy. Instead, you need to learn how to properly set the hyperparameters of the network. To improve your accuracy and learn my tips, suggestions, and best practices to improve the accuracy of your networks, make sure you refer to Deep Learning for Computer Vision with Python.
Saptarshi

January 31, 2019 at 1:41 pm

Hey! Loved the tutorial.
Can the same VGG network be used for a hand gesture recognition system for classifying gestures from A-Z (26 classes)?
- Adrian Rosebrock
  
  February 1, 2019 at 6:46 am
  
  Not as it stands. VGG is a classification network and presuming you are referring to the pre-trained VGG network on ImageNet there is no hand gesture classes. You would need to fine-tune VGG to recognize hand gestures. For what it’s worth I cover tranfer learning and fine-tuning inside my book, Deep Learning for Computer Vision with Python.
fishwolf

February 8, 2019 at 6:59 am

Is it possible know where the cat/dog/panda are into image?
is it possible running this process in real time with a video streaming?

Thanks
- Adrian Rosebrock
  
  February 14, 2019 at 2:03 pm
  
  What you are referring to is called “object detection”. See this most for more details on object detection.
riyaz

February 18, 2019 at 2:24 am

can you do the same problem for binary classification.. i got stuck at doing that… i have 2 classes only.. and i want to save the model also
- Adrian Rosebrock
  
  February 20, 2019 at 12:34 pm
  
  You’ll want to change your loss function to “binary_crossentropy” for 2-class, binary classification. This tutorial covers how to save and load your models with Keras. For more details on deep learning, including how to get started, I would suggest working through Practical Python and OpenCV.
sruthi

February 26, 2019 at 2:37 am

Hi,
i didn’t understand how feature extraction is done in this code. I have applied the same code for gender recognition and the only difference from your code is the training set images. I would like to know how feature extraction is done and what features have been extracted.
- Adrian Rosebrock
  
  February 27, 2019 at 5:44 am
  
  If you’re interested in feature extraction via pre-trained CNNs (including gender recognition) then definitely take a look at Deep Learning for Computer Vision with Python where I cover the topic in detail.
sruthi

February 26, 2019 at 7:41 am

I have used this exact same code for gender recogniton. It is working but i would like to know what feature are being extracted as well as how feature extraction is done. Can you please reply asap.The only difference from your code is the dataset used.
- Adrian Rosebrock
  
  February 26, 2019 at 12:40 pm
  
  I cover that exact topic inside Deep Learning for Computer Vision — my suggest is to start there.
Nguyen Anh Duy

March 16, 2019 at 10:20 pm

Hi Adrian,

I only want to classify dog and cat, so I change “caterogry_crossentropy” to “binary_crossentropy, then it has error:

“expected activation_2 to have shape (2 ) but got array with shape (1 )”

then I change to “spare_category_crossentropy” and it works.

But if I want to classify gray images, for examples “number 0” and “number 1” in MINST dataset, I do “binary_crossentropy” and change the input_shape to:

“model = SmallVGGNet.build(width=28, height=28, depth=1, classes=len(lb.classes_))

then it shows similar error as:

“expected activation_2 to have shape (2 ) but got array with shape (1 )”

Could you help me?

Thank you very much.
- Adrian Rosebrock
  
  March 19, 2019 at 10:13 am
  
  1. See my note on Lines 66-69 about using Keras’ “to_categorical” function.
  
  2. You should be using “binary_crossentropy” as your loss.
  
  Once you switch both of those you will be able to train the network.
  - Daniel
    
    April 5, 2019 at 11:33 am
    
    Adrian,
    how do you use the “to_categorical” function in this context? I don’t have a lot of coding experience with python. I googled for examples but it did not work for me.
    Also, in previous related questions you mentioned to change to LabelEncoder (instead of LabelBinarizer). I tried my hand at that but did not work:
    I changed:
    lb = LabelBinarizer() to lb = LabelEncoder()
    
    I also change to the binary_crossentropy as a loss function. But I am getting an error like Nguyen mentioned above.
    Many thanks for the tutorial. It is REALLY helpful.
    - Adrian Rosebrock
      
      April 12, 2019 at 1:00 pm
      
      You first encode using LabelEncoder and then call to_categorical, similar to the following:
      
      le = LabelEncoder() labels = le.fit_transform(labels) labels = to_categorical(labels, numClasses)
      - bramata vikana
        
        November 20, 2019 at 6:24 am
        
        hi adriand thank you for your explanation , but can you explain me what is numClasses is ? thank you so much
      - Adrian Rosebrock
        
        November 21, 2019 at 9:03 am
        
        The “numClasses” is the total number of unique class labels. For example, suppose you had a three class dataset: dogs, cats, and pandas. Then “numClasses=3” since you have three total classes.
      - Parvez Alam
        
        February 11, 2020 at 1:40 am
        
        Hello Adrain Brother, Being in final Year of college i found your resources are quite awesome.
        >>>>I have a Doubt in using No. of classes <<<<
        
        You have used 3 classes(cat,dog,panda) and you vectorized to trainY and testY as below…
        
        lb = LabelBinarizer()
        trainY = lb.fit_transform(trainY)
        testY = lb.transform(testY)
        
        and in comments you mentioned if 2 classes is used then, LabelEncoder is used instead of LabelBinarizer and fit_transform and to_categorical is applied for labels, as below
        le = LabelEncoder()
        labels = le.fit_transform(labels)
        labels = to_categorical(labels, numClasses)
        
        Could you please explain me When to binarize the labels, When to binarize the trainY and testY, and when to use LabelBinarizer, LabelEncoder and to_categorical?
      - Adrian Rosebrock
        
        February 13, 2020 at 11:05 am
        
        Hey Parvez — I address that exact question inside Deep Learning for Computer Vision with Python. I suggest you start there.
    - Beatrice van Eden
      
      May 15, 2019 at 5:38 am
      
      Did you get this working? I check this out today but obviously, I do not understand exactly what is going on. I keep on getting errors even after using well-explained code on this.
      - Adrian Rosebrock
        
        May 15, 2019 at 2:30 pm
        
        Hi Beatrice — did you see my previous comment? I provided you with code you could use.
Robert

March 20, 2019 at 5:22 am

Hi

This is a great tutorial.

I’ve purchased your book and it’s supporting material and look forward to reading and learning even more about this topic.

Keep up the good work.

Robert.
- Adrian Rosebrock
  
  March 22, 2019 at 9:41 am
  
  Thanks so much, Robert! I hope you are enjoying. By all means, feel free to reach out if you have any questions on it 🙂
- Beatrice van Eden
  
  May 20, 2019 at 4:26 am
  
  Hi
  Yes, I did. I made the modifications but then get an error when training the neural net needs to happen. The shape of the array is not what it is expecting any more.
Alexander

March 21, 2019 at 7:03 am

Hello and nice guide!

I got a question, is this tutorial for windows or linux?
- Adrian Rosebrock
  
  March 22, 2019 at 8:41 am
  
  Provided you have Keras properly installed this tutorial will work on Linux, macOS, and Windows.
  - Alexander
    
    March 28, 2019 at 6:37 am
    
    Thank you for the response!
    I have another question, I tried running the train_vgg script and it takes about 3-4 minutes per epoch on my computer. How do I tell Tensorflow to use my GPU instead of my CPU? I assume it uses my CPU since the timers are well over 1 minute.
    - Adrian Rosebrock
      
      April 2, 2019 at 6:31 am
      
      You can use the “nvidia-smi” command to check and see if your GPU is being utilized. You’ll also want to ensure the “tensorflow-gpu” package is installed.
Sky

April 21, 2019 at 1:06 am

I learned that the evaluation dataset is used to tunning the hyperparameters.
In this blog, what are the hyperparameters?
- Adrian Rosebrock
  
  April 25, 2019 at 9:12 am
  
  The hyperparameters include the learning rate, number of nodes/filters for each layer, and any regularization. I would definitely suggest reading through Deep Learning for Computer Vision with Python where I cover hyperparameters (and how to properly tune them) in detail.
Beatrice van Eden

May 10, 2019 at 7:50 am

Thank you for sharing this with us. I found it to be of great benefit for me.

# do you have a similar tutorial for RGB-D data? I know you add the extra channel but I suppose my struggle is even before that, with the per processing of the data. I recorded a ROS bag with the RGB-D data then I can extract the RGB in a folder and the D in another, then I get confused when trying to give it as an input to the convnet. (I struggle with the coding).
- Adrian Rosebrock
  
  May 15, 2019 at 3:12 pm
  
  Sorry, I Do not have any tutorials for RGB-D data.
Adrian Rosebrock

May 15, 2019 at 3:17 pm

See this tutorial.
Tuan Anh Nguyen

May 19, 2019 at 4:59 am

Hello! thank you for sharing this with us!!!
I still do not understand how you label dogs, cats and pandas? Please explain to me the label?
- Adrian Rosebrock
  
  May 23, 2019 at 9:53 am
  
  I manually labeled those images themselves. I created a directory for each of the dogs, cats, and pandas images, then placed each into their corresponding directory.
Beatrice van Eden

May 20, 2019 at 4:21 am

Thank you.
- Adrian Rosebrock
  
  May 23, 2019 at 9:48 am
  
  No problem, I’m glad you found it helpful!
mary

May 20, 2019 at 4:25 pm

hey this tutorial is awesome,the code for non CNN worked just fine but when i ran it for CNN with smallvggnet it gave me the error:
Import Error: No module named ‘pyimagesearch’
how to resolve this?
and secondly if i use the this line : image = cv2.resize(image, (64, 64))
will it resize all my images into 64×64 no matter what the original size? plus how do i know that the images being fed into the neural network are fine for training,wont the larger images be distorted like that?(the details are unable to be observed for training)
My last question,for this line in smallvggnet script:inputShape = (width, height,depth )
do i write the dimensions which i want the image in or what the image already has?(in a dataset how can i tell about 1 image it has many images!)
- Adrian Rosebrock
  
  May 23, 2019 at 9:45 am
  
  Hey Mary — make sure you use the “Downloads” section of the code to download the source code. It sounds like you may have copied and pasted which likely caused the error.
  
  Secondly, I would recommend you read Deep Learning for Computer Vision with Python so you can learn the fundamentals of deep learning. That book will help you understand how we preprocess images and better enable you to train your own CNNs.
Chinmaya Panda

July 9, 2019 at 9:20 am

Dear Sir,

This is a best literature I have come across internet for ML implementation.
It is the exact way for my assigned work.

Today morning 9am I have started and finished all by 8pm.
I have understood the concepts, and Implemented in Jupyter Notebook, and got the result after few changes.
CNN based model testing is pending, but I will do this from your other blog.

Such a nice way of explanation and detailed code needs lots of appreciation, so I am dropping this message.

Thanks a lot for your contribution for society and Human Race.
- Adrian Rosebrock
  
  July 10, 2019 at 9:38 am
  
  Thanks Chinmaya, I really appreciate the kind words 🙂 Congrats on training your own NNs and CNNs!
Ali

July 10, 2019 at 3:07 am

Hi dear adrian!
Can you help me to train it for two classes only.
- Adrian Rosebrock
  
  July 10, 2019 at 9:33 am
  
  See this comment thread.
Henrique

July 23, 2019 at 12:53 pm

Hi Adrian,

Can i use this code to train 1 class only?

I’m trying to identify a object in photo. If the object is there i will receive a “ok” and if it’s not i will receive a “nok”
Ammu

August 12, 2019 at 6:12 am

Hi. how to download the animals dataset?
I couldn’t find it in the downloads section.

Thanks
- Adrian Rosebrock
  
  August 16, 2019 at 5:44 am
  
  Download the .zip of the file using the “Downloads” section of the tutorial. You’ll find the “animals” dataset there.
Andres

August 20, 2019 at 1:33 am

This was a very detailed tutorial. If I wanted to use Tensorflow 2.0 with the new keras interface, would I need to simply do something like: “import tensorflow.keras as keras” and the rest would work the same?

Thanks
- Adrian Rosebrock
  
  August 20, 2019 at 10:02 am
  
  You are absolutely correct! Since TensorFlow 2.0 is making big moves to use the “tensorflow.keras” package you can just import all Keras classes/functions directly from “tensorflow.keras”.
Abdullah

October 13, 2019 at 3:29 am

Hi Adrian, if I am adding another class “cow” for example, isn’t it necessary to change the epochs number?
- Adrian Rosebrock
  
  October 17, 2019 at 8:00 am
  
  Not necessarily. The epochs doesn’t impact the number of classes or vice versa. Try training using the same number of epochs. Additionally you should read Deep Learning for Computer Vision with Python to learn my best practices, tips, and suggestions when training your own deep learning models.
Agnes

October 17, 2019 at 3:11 am

Hi Adrian,

I would like to know if there is an explanation for fixing the number of neurons in the first hidden layer as 1024 from the input shape as 3072 in Line 76 in train_simple_nn.py file. I understand in every hidden layer due to dimensionality reduction, the image size gets reduced to one half of the original image size. Hence from Hidden layer 1 -> Hidden Layer 2 the pixels get reduced to 512 from 1024. But How does it change from 3072 to 1024?

Thanks in Advance…..
Srinivas and Mangipudi

October 28, 2019 at 12:13 pm

Hi I got an error after the training and network evaluation finished. The error was in generating the plot:

Traceback (most recent call last):
File “train_simple_nn.py”, line 111, in
plt.plot(N, H.history[“acc”], label=”train_acc”)
KeyError: ‘acc’
- Srinivas and Mangipudi
  
  October 28, 2019 at 1:49 pm
  
  Hi, I managed to get rid of the error by using metrics=[“acc”] in model.compile.
  
  But after running training set, i notice that the accuracy is below 50%, that means it is performing worse than random chance. Infact I gave it a cat image to predict but it predicted it was a dog with 63% accuracy.
  
  I don’t understand why its doing this?
- Adrian Rosebrock
  
  November 7, 2019 at 10:37 am
  
  In TensorFlow 2.0 the “acc” key was changed to “accuracy” and “val_accuracy”, respectively.
Arif

November 11, 2019 at 11:46 pm

Hi Adrian,

If I would like to implement face recognition application based on your codes, What should I do except adding the face detection?

Thank you
- Adrian Rosebrock
  
  November 14, 2019 at 9:25 am
  
  You should follow my tutorials on face applications and face recognition.
Aditi

December 10, 2019 at 6:47 am

Hi Adrain

Thanks for you post. Since i am using keras 2.3.1 and tensorflow 2.0.0. I read the previous comments and i changed the “acc” to accuracy and i got my plot as png. But still the other two model file and the pickle file is not loaded in the output folder. And also the pickle file is not loaded.

Thanks:)
- Adrian Rosebrock
  
  December 12, 2019 at 10:07 am
  
  Make sure you train your model first. Once the model is trained you can then make predictions using it.
Anja

December 23, 2019 at 9:32 am

Hi Adrian,

I created my own model with train_vgg.py and it works great. 🙂

With predict.py I can check individual images. However, I would like to check a live video with the created model.
That’s why I changed the code so that it checks the frames of the webcam – predictVideo.py. Alternatively, video files.
Unfortunately, the recognition (labeling) does not work well here, although I use the same model as the one
Checking individual images.

Example:
I extracted individual pictures from a video file and I check them with predict.py
Result: Everything is recognized correctly.
If I now check the video file with predictVideo.py, nothing is correctly recognized.

Is it because you cannot use the model for live or video file recognition?
Do I have to train the model differently?

Thanks a lot!
Anja
- Adrian Rosebrock
  
  December 26, 2019 at 9:55 am
  
  It’s hard to say what the issue is without seeing your code or video, but I would suggest you start with this tutorial to help you learn how to apply a Keras model to a video stream.
  
  Secondly, double and triple-check that your preprocessing steps are the same for inference/prediction as they are for training. A common mistake I see beginners make is forgetting to preprocess their images in the same manner as training.
teimoor

January 4, 2020 at 2:17 am

Hi how to insert my trained model since i don’t have any trained model in my disk?. it is required argument as per your code
- Adrian Rosebrock
  
  January 16, 2020 at 10:59 am
  
  You need to train your model before you serialize it to disk. From there you can use it to classify new input images.
Tharumudu

January 6, 2020 at 12:08 am

Hi Adrian,

This is a great tutorial and made everything easy for me as always. I would like to know a robust way to predict when I have like around 50,000 images. I’m currently looping through the images with ” tensorflow.keras.backend.clear_session() ” line after the prediction line.

Is there any way of predict all the images at once and then loop through them and save?
- Adrian Rosebrock
  
  January 16, 2020 at 10:52 am
  
  You mean make predictions on all 50,000 images? Yes, absolutely, just use Keras’ predict_generator function.
vikas

January 29, 2020 at 6:17 am

Hello Adrian sir,
Thank you very much for great tutorial .It’s very awesome and very easy to understand.
I have implemented it on my own data set of 3-classes of documents(Driving licences. I got good accuracy and I am getting good results on unseen images which belongs to the classes. But when I try to predict the image other than these 3-classes (ex. Dog or Cat) then also it showing match with one of the classes.. Why these is so? Please help.
- Adrian Rosebrock
  
  January 30, 2020 at 8:40 am
  
  You need to create a 4-th class called “ignore” that does not contain any of the documents, that way your model can predict one of the 3 document classes or the 4th “ignore” class.
Luis

March 29, 2020 at 1:00 pm

What should we change if using a binary class dataframe?

using to_Categorical() would change some details in the code, what would be then?
- Adrian Rosebrock
  
  April 1, 2020 at 9:33 am
  
  Take a look at the comments on this post as I have addressed that question a few times.

Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.

Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog.

Looking for the source code to this post?

Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Overview of what’s going to be covered

Our example dataset

Project structure

Configuring your development environment

2. Load your data from disk

3. Construct your training and testing splits

4. Define your Keras model architecture

5. Compile your Keras model

6. Fit your Keras model to the data

7. Evaluate your Keras model

8. Make predictions on new data using your Keras model

9. BONUS: Training your first Convolutional Neural Network with Keras

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

162 responses to: Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Comment section

PyImageSearch University

Exploring Oobabooga Text Generation Web UI: Installation, Features, and Fine-Tuning Llama Model with LoRA

NVIDIA Jetson Nano .img pre-configured for Deep Learning and Computer Vision

PyImageSearch Gurus: The finalized Kickstarter reward list.

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Overview of what’s going to be covered

Our example dataset

Project structure

Configuring your development environment

2. Load your data from disk

3. Construct your training and testing splits

4. Define your Keras model architecture

5. Compile your Keras model

6. Fit your Keras model to the data

7. Evaluate your Keras model

8. Make predictions on new data using your Keras model

9. BONUS: Training your first Convolutional Neural Network with Keras

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Reader Interactions

Semantic segmentation with OpenCV and deep learning

OpenCV OCR and text recognition with Tesseract

162 responses to: Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?