Inside this Keras tutorial, you will discover how easy it is to get started with deep learning and Python. You will use the Keras deep learning library to train your first neural network on a custom image dataset, and from there, you’ll implement your first Convolutional Neural Network (CNN) as well.
The inspiration for this guide came from PyImageSearch reader, Igor, who emailed me a few weeks ago and asked:
Hey Adrian, thanks for the PyImageSearch blog. I’ve noticed that nearly every “getting started” guide I come across for Keras and image classification uses either the MNIST or CIFAR-10 datasets which are built into Keras. I just call one of those functions and the data is automatically loaded for me.
But how do I go about using my own image dataset with Keras?
What steps do I have to take?
Igor has a great point — most Keras tutorials you come across will try to teach you the basics of the library using an image classification dataset such MNIST (handwriting recognition) or CIFAR-10 (basic object recognition).
These image datasets are standard benchmarks in the computer vision and deep learning literature, and sure, they will absolutely get you started using Keras…
…but they aren’t necessarily practical in the sense that they don’t teach you how to work with your own set of images residing on disk. Instead, you’re just calling helper functions to load pre-compiled datasets.
I’m going with a different take on an introductory Keras tutorial.
Instead of teaching you how to utilize one of these pre-compiled datasets, I’m going to teach you how to train your first neural network and Convolutional Neural Network using a custom dataset — because let’s face it, your goal is to apply deep learning to your own dataset, not one built into Keras, am I right?
To learn how to get started with Keras, Deep Learning, and Python, just keep reading!
Looking for the source code to this post?
Jump Right To The Downloads SectionKeras Tutorial: How to get started with Keras, Deep Learning, and Python
2020-05-13 Update: This blog post is now TensorFlow 2+ compatible!
Today’s Keras tutorial is designed with the practitioner in mind — it is meant to be a practitioner’s approach to applied deep learning.
That means that we’ll learn by doing.
We’ll be getting our hands dirty.
Writing some Keras code.
And then training our networks on our custom datasets.
This tutorial is not meant to be a deep dive into the theory surrounding deep learning.
If you’re interested in studying deep learning in odepth, including both (1) hands-on implementations and (2) a discussion of theory, I would suggest you check out my book, Deep Learning for Computer Vision with Python.
Overview of what’s going to be covered
Training your first simple neural network with Keras doesn’t require a lot of code, but we’re going to start slow, taking it step-by-step, ensuring you understand the process of how to train a network on your own custom dataset.
The steps we’ll cover today include:
- Installing Keras and other dependencies on your system
- Loading your data from disk
- Creating your training and testing splits
- Defining your Keras model architecture
- Compiling your Keras model
- Training your model on your training data
- Evaluating your model on your test data
- Making predictions using your trained Keras model
I’ve also included an additional section on training your first Convolutional Neural Network.
This may seem like a lot of steps, but I promise you, once we start getting into the example you’ll see that the examples are linear, make intuitive sense, and will help you understand the fundamentals of training a neural network with Keras.
Our example dataset
Most Keras tutorials you come across for image classification will utilize MNIST or CIFAR-10 — I’m not going to do that here.
To start, MNIST and CIFAR-10 aren’t very exciting examples.
These tutorials don’t actually cover how to work with your own custom image datasets. Instead, they simply call built-in Keras utilities that magically return the MNIST and CIFAR-10 datasets as NumPy arrays. In fact, your training and testing splits have already been pre-split for you!
Secondly, if you want to use your own custom datasets you really don’t know where to start. You’ll find yourself scratching your head and asking questions such as:
- Where are those helper functions loading the data from?
- What format should my dataset on disk be?
- How can I load my dataset into memory?
- What preprocessing steps do I need to perform?
Let’s be honest — your goal in studying Keras and deep learning isn’t to work with these pre-baked datasets.
Instead, you want to work with your own custom datasets.
And those introductory Keras tutorials you’ve come across only take you so far.
That’s why, inside this Keras tutorial, we’ll be working with a custom dataset called the “Animals dataset” I created for my book, Deep Learning for Computer Vision with Python:
The purpose of this dataset is to correctly classify an image as containing either:
- Cats
- Dogs
- Pandas
Containing only 3,000 images, the Animals dataset is meant to be an introductory dataset that we can quickly train a deep learning model on using either our CPU or GPU (and still obtain reasonable accuracy).
Furthermore, using this custom dataset enables you to understand:
- How you should organize your dataset on disk
- How to load your images and class labels from disk
- How to partition your data into training and testing splits
- How to train your first Keras neural network on the training data
- How to evaluate your model on the testing data
- How you can reuse your trained model on data that is brand new and outside your training and testing splits
By following the steps in this Keras tutorial you’ll be able to swap out my Animals dataset for any dataset of your choice, provided you utilize the project/directory structure detailed below.
Need data? If you need to scrape images from the internet to create a dataset, check out how to do it the easy way with Bing Image Search, or the slightly more involved way with Google Images.
Project structure
There are a number of files associated with this project. Grab the zip from the “Downloads” section and then use the tree
command to show the project structure in your terminal (I’ve provided two command line argument flags to tree
to make the output nice and clean):
$ tree --dirsfirst --filelimit 10 . ├── animals │ ├── cats [1000 entries exceeds filelimit, not opening dir] │ ├── dogs [1000 entries exceeds filelimit, not opening dir] │ └── panda [1000 entries exceeds filelimit, not opening dir] ├── images │ ├── cat.jpg │ ├── dog.jpg │ └── panda.jpg ├── output │ ├── simple_nn.model │ ├── simple_nn_lb.pickle │ ├── simple_nn_plot.png │ ├── smallvggnet.model │ ├── smallvggnet_lb.pickle │ └── smallvggnet_plot.png ├── pyimagesearch │ ├── __init__.py │ └── smallvggnet.py ├── predict.py ├── train_simple_nn.py └── train_vgg.py 7 directories, 14 files
As previously discussed, today we’ll be working with the Animals dataset. Notice how animals
is organized in the project tree. Inside of animals/
, there are three class directories: cats/
, dogs/
, panda/
. Within each of those directories is 1,000 images pertaining to the respective class.
If you work with your own dataset, just organize it the same way! Ideally you’ll gather 1,000 images per class at a minimum. This isn’t always possible, but you should at least have class balance. Significantly more images in one class folder could cause model bias.
Next is the images/
directory. This directory contains three images for testing purposes which we’ll use to demonstrate how to (1) load a trained model from disk and then (2) classify an input image that is not part of our original dataset.
The output/
folder contains three types of files which are generated by training:
.model
: A serialized Keras model file is generated after training and can be used in future inference scripts..pickle
: A serialized label binarizer file. This file contains an object which contains class names. It accompanies a model file..png
: I always place my training/validation plot images in the output folder as it is an output of the training process.
The pyimagesearch/
directory is a module. Contrary to the many questions I receive, pyimagesearch
is not a pip-installable package. Instead it resides in the project folder and classes contained within can be imported into your scripts. It is provided in the “Downloads” section of this Keras tutorial.
Today we’ll be reviewing four .py files:
- In the first half of the blog post, we’ll train a simple model. The training script is
train_simple_nn.py
. - We’ll advance to training
SmallVGGNet
using thetrain_vgg.py
script. - The
smallvggnet.py
file contains ourSmallVGGNet
class, a Convolutional Neural Network. - What good is a serialized model unless we can deploy it? In
predict.py
, I’ve provided sample code for you to load a serialized model + label file and make an inference on an image. The prediction script is only useful after we have successfully trained a model with reasonable accuracy. It is always useful to run this script to test with images that are not contained within the dataset.
Configuring your development environment
To configure your system for this tutorial, I first recommend following either of these tutorials:
Either tutorial will help you configure you system with all the necessary software for this blog post in a convenient Python virtual environment.
Please note that PyImageSearch does not recommend or support Windows for CV/DL projects.
2. Load your data from disk
Now that Keras is installed on our system we can start implementing our first simple neural network training script using Keras. We’ll later implement a full-blown Convolutional Neural Network, but let’s start easy and work our way up.
Open up train_simple_nn.py
and insert the following code:
# set the matplotlib backend so figures can be saved in the background import matplotlib matplotlib.use("Agg") # import the necessary packages from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD from imutils import paths import matplotlib.pyplot as plt import numpy as np import argparse import random import pickle import cv2 import os
Lines 2-19 import our required packages. As you can see there are quite a few tools this script is taking advantage of. Let’s review the important ones:
matplotlib
: This is the go-to plotting package for Python. That said, it does have its nuances, and if you’re having trouble with it, refer to this blog post. On Line 3, we instructmatplotlib
to use the"Agg"
backend enabling us to save plots to disk — that’s your first nuance!sklearn
: The scikit-learn library will help us with binarizing our labels, splitting data for training/testing, and generating a training report in our terminal.tensorflow.keras
: You’re reading this tutorial to learn about Keras — it is our high level frontend into TensorFlow and other deep learning backends.imutils
: My package of convenience functions. We’ll use thepaths
module to generate a list of image file paths for training.numpy
: NumPy is for numerical processing with Python. It is another go-to package. If you have OpenCV for Python and scikit-learn installed, then you’ll have NumPy as it is a dependency.cv2
: This is OpenCV. At this point, it is both tradition and a requirement to tack on the 2 even though you’re likely using OpenCV 3 or higher.- …the remaining imports are built into your installation of Python!
Wheww! That was a lot, but having a good idea of what each import is used for will aid your understanding as we walk through these scripts.
Let’s parse our command line arguments with argparse:
# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset of images") ap.add_argument("-m", "--model", required=True, help="path to output trained model") ap.add_argument("-l", "--label-bin", required=True, help="path to output label binarizer") ap.add_argument("-p", "--plot", required=True, help="path to output accuracy/loss plot") args = vars(ap.parse_args())
Our script will dynamically handle additional information provided via the command line when we execute our script. The additional information is in the form of command line arguments. The argparse
module is built into Python and will handle parsing the information you provide in your command string. For additional explanation, refer to this blog post.
We have four command line arguments to parse:
--dataset
: The path to our dataset of images on disk.--model
: Our model will be serialized and output to disk. This argument contains the path to the output model file.--label-bin
: Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.--plot
: The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data.
With the dataset information in hand, let’s load our images and class labels:
# initialize the data and labels print("[INFO] loading images...") data = [] labels = [] # grab the image paths and randomly shuffle them imagePaths = sorted(list(paths.list_images(args["dataset"]))) random.seed(42) random.shuffle(imagePaths) # loop over the input images for imagePath in imagePaths: # load the image, resize the image to be 32x32 pixels (ignoring # aspect ratio), flatten the image into 32x32x3=3072 pixel image # into a list, and store the image in the data list image = cv2.imread(imagePath) image = cv2.resize(image, (32, 32)).flatten() data.append(image) # extract the class label from the image path and update the # labels list label = imagePath.split(os.path.sep)[-2] labels.append(label)
Here we:
- Initialize lists for our
data
andlabels
(Lines 35 and 36). These will later become NumPy arrays. - Grab
imagePaths
and randomly shuffle them (Lines 39-41). Thepaths.list_images
function conveniently will find all the paths to all input images in our--dataset
directory before we sort andshuffle
them. I set aseed
so that the random reordering is reproducible. - Begin looping over all
imagePaths
in our dataset (Line 44).
For each imagePath
, we proceed to:
- Load the
image
into memory (Line 48). - Resize the
image
to32x32
pixels (ignoring aspect ratio) as well asflatten
the image (Line 49). It is critical toresize
our images properly because this neural network requires these dimensions. Each neural network will require different dimensions, so just be aware of this. Flattening the data allows us to pass the raw pixel intensities to the input layer neurons easily. You’ll see later that for VGGNet we pass the volume to the network since it is convolutional. Keep in mind that this example is just a simple non-convolutional network — we’ll be looking at a more advanced example later in the post. - Append the resized image to
data
(Line 50). - Extract the class
label
of the image from the path (Line 54) and add it to thelabels
list (Line 55). Thelabels
list contains the classes that correspond to each image in the data list.
Now in one fell swoop, we can apply array operations to the data and labels:
# scale the raw pixel intensities to the range [0, 1] data = np.array(data, dtype="float") / 255.0 labels = np.array(labels)
On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] (a common preprocessing step).
We also convert the labels
list to a NumPy array (Line 59).
3. Construct your training and testing splits
Now that we have loaded our image data from disk, next we need to construct our training and testing splits:
# partition the data into training and testing splits using 75% of # the data for training and the remaining 25% for testing (trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25, random_state=42)
It is typical to allocate a percentage of your data for training and a smaller percentage of your data for testing. The scikit-learn provides a handy train_test_split
function which will split the data for us.
Both trainX
and testX
make up the image data itself while trainY
and testY
make up the labels.
Our class labels are currently represented as strings; however, Keras will assume that both:
- Labels are encoded as integers
- And furthermore, one-hot encoding is performed on these labels making each label represented as a vector rather than an integer
To accomplish this encoding, we can use the LabelBinarizer
class from scikit-learn:
# convert the labels from integers to vectors (for 2-class, binary # classification you should use Keras' to_categorical function # instead as the scikit-learn's LabelBinarizer will not return a # vector) lb = LabelBinarizer() trainY = lb.fit_transform(trainY) testY = lb.transform(testY)
On Line 70, we initialize the LabelBinarizer
object.
A call to fit_transform
finds all unique class labels in trainY
and then transforms them into one-hot encoded labels.
A call to just .transform
on testY
performs just the one-hot encoding step — the unique set of possible class labels was already determined by the call to .fit_transform
.
Here’s an example:
[1, 0, 0] # corresponds to cats [0, 1, 0] # corresponds to dogs [0, 0, 1] # corresponds to panda
Notice how only one of the array elements is “hot” which is why we call this “one-hot” encoding.
4. Define your Keras model architecture
The next step is to define our neural network architecture using Keras. Here we will be using a network with one input layer, two hidden layers, and one output layer:
# define the 3072-1024-512-3 architecture using Keras model = Sequential() model.add(Dense(1024, input_shape=(3072,), activation="sigmoid")) model.add(Dense(512, activation="sigmoid")) model.add(Dense(len(lb.classes_), activation="softmax"))
Since our model is really simple, we go ahead and define it in this script (typically I like to make a separate class in a separate file for the model architecture).
The input layer and first hidden layer are defined on Line 76. will have an input_shape
of 3072
as there are 32x32x3=3072
pixels in a flattened input image. The first hidden layer will have 1024
nodes.
The second hidden layer will have 512
nodes (Line 77).
Finally, the number of nodes in the final output layer (Line 78) will be the number of possible class labels — in this case, the output layer will have three nodes, one for each of our class labels (“cats”, “dogs”, and “panda”, respectively).
5. Compile your Keras model
Once we have defined our neural network architecture, the next step is to “compile” it:
# initialize our initial learning rate and # of epochs to train for INIT_LR = 0.01 EPOCHS = 80 # compile the model using SGD as our optimizer and categorical # cross-entropy loss (you'll want to use binary_crossentropy # for 2-class classification) print("[INFO] training network...") opt = SGD(lr=INIT_LR) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
First, we initialize our learning rate and total number of epochs to train for (Lines 81 and 82).
Then we compile
our model using the Stochastic Gradient Descent (SGD
) optimizer with "categorical_crossentropy"
as the loss
function.
Categorical cross-entropy is used as the loss for nearly all networks trained to perform classification. The only exception is for 2-class classification where there are only two possible class labels. In that event you would want to swap out "categorical_crossentropy"
for "binary_crossentropy"
.
6. Fit your Keras model to the data
Now that our Keras model is compiled, we can “fit” (i.e., train) it on our training data:
# train the neural network H = model.fit(x=trainX, y=trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=32)
We’ve discussed all the inputs except batch_size
. The batch_size
controls the size of each group of data to pass through the network. Larger GPUs would be able to accommodate larger batch sizes. I recommend starting with 32
or 64
and going up from there.
7. Evaluate your Keras model
We’ve trained our actual model but now we need to evaluate it on our testing data.
It’s important that we evaluate on our testing data so we can obtain an unbiased (or as close to unbiased as possible) representation of how well our model is performing with data it has never been trained on.
To evaluate our Keras model we can use a combination of the .predict
method of the model along with the classification_report
from scikit-learn:
# evaluate the network print("[INFO] evaluating network...") predictions = model.predict(x=testX, batch_size=32) print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=lb.classes_)) # plot the training loss and accuracy N = np.arange(0, EPOCHS) plt.style.use("ggplot") plt.figure() plt.plot(N, H.history["loss"], label="train_loss") plt.plot(N, H.history["val_loss"], label="val_loss") plt.plot(N, H.history["accuracy"], label="train_acc") plt.plot(N, H.history["val_accuracy"], label="val_acc") plt.title("Training Loss and Accuracy (Simple NN)") plt.xlabel("Epoch #") plt.ylabel("Loss/Accuracy") plt.legend() plt.savefig(args["plot"])
2020-05-13 Update: In order for this plotting snippet to be TensorFlow 2+ compatible the H.history dictionary keys are updated to fully spell out “accuracy” sans “acc” (i.e., H.history["val_accuracy"]
and H.history["accuracy"]
). It is semi-confusing that “val” is not spelled out as “validation”; we have to learn to love and live with the API and always remember that it is a work in progress that many developers around the world contribute to.
When running this script you’ll notice that our Keras neural network will start to train, and once training is complete, we’ll evaluate the network on our testing set:
$ python train_simple_nn.py --dataset animals --model output/simple_nn.model \ --label-bin output/simple_nn_lb.pickle --plot output/simple_nn_plot.png Using TensorFlow backend. [INFO] loading images... [INFO] training network... Train on 2250 samples, validate on 750 samples Epoch 1/80 2250/2250 [==============================] - 1s 311us/sample - loss: 1.1041 - accuracy: 0.3516 - val_loss: 1.1578 - val_accuracy: 0.3707 Epoch 2/80 2250/2250 [==============================] - 0s 183us/sample - loss: 1.0877 - accuracy: 0.3738 - val_loss: 1.0766 - val_accuracy: 0.3813 Epoch 3/80 2250/2250 [==============================] - 0s 181us/sample - loss: 1.0707 - accuracy: 0.4240 - val_loss: 1.0693 - val_accuracy: 0.3533 ... Epoch 78/80 2250/2250 [==============================] - 0s 184us/sample - loss: 0.7688 - accuracy: 0.6160 - val_loss: 0.8696 - val_accuracy: 0.5880 Epoch 79/80 2250/2250 [==============================] - 0s 181us/sample - loss: 0.7675 - accuracy: 0.6200 - val_loss: 1.0294 - val_accuracy: 0.5107 Epoch 80/80 2250/2250 [==============================] - 0s 181us/sample - loss: 0.7687 - accuracy: 0.6164 - val_loss: 0.8361 - val_accuracy: 0.6120 [INFO] evaluating network... precision recall f1-score support cats 0.57 0.59 0.58 236 dogs 0.55 0.31 0.39 236 panda 0.66 0.89 0.76 278 accuracy 0.61 750 macro avg 0.59 0.60 0.58 750 weighted avg 0.60 0.61 0.59 750 [INFO] serializing network and label binarizer...
This network is small, and when combined with a small dataset, takes only 2 seconds per epoch on my CPU.
Here you can see that our network is obtaining 60% accuracy.
Since we would have a 1/3 chance of randomly picking the correct label for a given image we know that our network has actually learned patterns that can be used to discriminate between the three classes.
We also save a plot of our:
- Training loss
- Validation loss
- Training accuracy
- Validation accuracy
…ensuring that we can easily spot overfitting or underfitting in our results.
Looking at our plot we see a small amount of overfitting start to occur past epoch ~45 where our training and validation losses start to diverge and a pronounced gap appears.
Finally, we can save our model to disk so we can reuse it later without having to retrain it:
# save the model and label binarizer to disk print("[INFO] serializing network and label binarizer...") model.save(args["model"], save_format="h5") f = open(args["label_bin"], "wb") f.write(pickle.dumps(lb)) f.close()
8. Make predictions on new data using your Keras model
At this point our model is trained — but what if we wanted to make predictions on images after our network has already been trained?
What would we do then?
How would we load the model from disk?
How can we load an image and then preprocess it for classification?
Inside the predict.py
script, I’ll show you how, so open it and insert the following code:
# import the necessary packages from tensorflow.keras.models import load_model import argparse import pickle import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image we are going to classify") ap.add_argument("-m", "--model", required=True, help="path to trained Keras model") ap.add_argument("-l", "--label-bin", required=True, help="path to label binarizer") ap.add_argument("-w", "--width", type=int, default=28, help="target spatial dimension width") ap.add_argument("-e", "--height", type=int, default=28, help="target spatial dimension height") ap.add_argument("-f", "--flatten", type=int, default=-1, help="whether or not we should flatten the image") args = vars(ap.parse_args())
First, we’ll import our required packages and modules.
You’ll need to explicitly import load_model
from tensorflow.keras.models
whenever you write a script to load a Keras model from disk. OpenCV will be used for annotation and display. The pickle
module will be used to load our label binarizer.
Next, let’s parse our command line arguments:
--image
: The path to our input image.--model
: Our trained and serialized Keras model path.--label-bin
: Path to the serialized label binarizer.--width
: The width of the input shape for our CNN. Remember — you can’t just specify anything here. You need to specify the width that the model is designed for.--height
: The height of the image input to the CNN. The height specified must also match the network’s input shape.--flatten
: Whether or not we should flatten the image. By default, we won’t flatten the image. If you need to flatten the image, you should pass a1
for this argument.
Next, let’s load the image and resize it based on the command line arguments:
# load the input image and resize it to the target spatial dimensions image = cv2.imread(args["image"]) output = image.copy() image = cv2.resize(image, (args["width"], args["height"])) # scale the pixel values to [0, 1] image = image.astype("float") / 255.0
And then we’ll flatten
the image if necessary:
# check to see if we should flatten the image and add a batch # dimension if args["flatten"] > 0: image = image.flatten() image = image.reshape((1, image.shape[0])) # otherwise, we must be working with a CNN -- don't flatten the # image, simply add the batch dimension else: image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
Flattening the image for standard fully-connected networks is straightforward (Lines 33-35).
In the case of a CNN, we also add the batch dimension, but we do not flatten the image (Lines 39-41). An example CNN is covered in the next section.
From there, let’s load the model + label binarizer into memory and make a prediction:
# load the model and label binarizer print("[INFO] loading network and label binarizer...") model = load_model(args["model"]) lb = pickle.loads(open(args["label_bin"], "rb").read()) # make a prediction on the image preds = model.predict(image) # find the class label index with the largest corresponding # probability i = preds.argmax(axis=1)[0] label = lb.classes_[i]
Our model and label binarizer are loaded via Lines 45 and 46.
We can make predictions on the input image
by calling model.predict
(Line 49).
What does the preds
array look like?
(Pdb) preds array([[5.4622066e-01, 4.5377851e-01, 7.7963534e-07]], dtype=float32)
The 2D array contains (1) the index of the image in the batch (here there is only one index as there was only one image passed into the NN for classification) and (2) percentages corresponding to each class label, as shown by querying the variable in my Python debugger:
- cats: 54.6%
- dogs: 45.4%
- panda: ~0%
In other words, our network “thinks” that it sees “cats” and it sure as hell “knows” that it doesn’t see a “panda”.
Line 53 finds the index of the max value (the 0-th “cats” index).
And Line 54 extracts the “cats” string label from the label binarizer.
Easy right?
Now let’s display the results:
# draw the class label + probability on the output image text = "{}: {:.2f}%".format(label, preds[0][i] * 100) cv2.putText(output, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) # show the output image cv2.imshow("Image", output) cv2.waitKey(0)
We format our text
string on Line 57. This includes the label
and the prediction value in percentage format.
Then we place the text
on the output
image (Lines 58 and 59).
Finally, we show the output image on the screen and wait until the user presses any key on Lines 62 and 63 (watch Homer Simpson try to locate the “any” key).
Our prediction script was rather straightforward.
Once you’ve used the “Downloads” section of this tutorial to download the code, you can open up a terminal and try running our trained network on custom images:
$ python predict.py --image images/cat.jpg --model output/simple_nn.model \ --label-bin output/simple_nn_lb.pickle --width 32 --height 32 --flatten 1 Using TensorFlow backend. [INFO] loading network and label binarizer...
Be sure that you copy/pasted or typed the entire command (including command line arguments) from within the folder relative to the script. If you’re having trouble with the command line arguments, give this blog post a read.
Here you can see that our simple Keras neural network has classified the input image as “cats” with 55.87% probability, despite the cat’s face being partially obscured by a piece of bread.
9. BONUS: Training your first Convolutional Neural Network with Keras
Admittedly, using a standard feedforward neural network to classify images is not a wise choice.
Instead, we should leverage Convolutional Neural Networks (CNNs) which are designed to operate over the raw pixel intensities of images and learn discriminating filters that can be used to classify images with high accuracy.
The model we’ll be discussing here today is a smaller variant of VGGNet which I have named “SmallVGGNet”.
VGGNet-like models share two common characteristics:
- Only 3×3 convolutions are used
- Convolution layers are stacked on top of each other deeper in the network architecture prior to applying a destructive pooling operation
Let’s go ahead and implement SmallVGGNet now.
Open up the smallvggnet.py
file and insert the following code:
# import the necessary packages from tensorflow.keras.models import Sequential from tensorflow.keras.layers import BatchNormalization from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPooling2D from tensorflow.keras.layers import Activation from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dropout from tensorflow.keras.layers import Dense from tensorflow.keras import backend as K
As you can see from the imports on Lines 2-10, everything needed for the SmallVGGNet
comes from keras
. I encourage you to familiarize yourself with each in the Keras documentation and in my deep learning book.
We then begin to define our SmallVGGNet
class and the build
method:
class SmallVGGNet: @staticmethod def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1
Our class is defined on Line 12 and the sole build
method is defined on Line 14.
Four parameters are required for build
: the width
of the input images, the height of the height
input images, the depth
, and number of classes
.
The depth
can also be thought of as the number of channels. Our images are in the RGB color space, so we’ll pass a depth
of 3
when we call the build
method.
First, we initialize a Sequential
model (Line 17).
Then, we determine channel ordering. Keras supports "channels_last"
(i.e. TensorFlow) and "channels_first"
(i.e. Theano) ordering. Lines 18-25 allow our model to support either type of backend.
Now, let’s add some layers to the network:
# CONV => RELU => POOL layer set model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))
Our first CONV => RELU => POOL
layers are added by this block.
Our first CONV
layer has 32
filters of size 3x3
.
It is very important that we specify the inputShape
for the first layer as all subsequent layer dimensions will be calculated using a trickle-down approach.
We’ll use the ReLU (Rectified Linear Unit) activation function in this network architecture. There are a number of activation methods and I encourage you to familiarize yourself with the popular ones inside Deep Learning for Computer Vision with Python where pros/cons and tradeoffs are discussed.
Batch Normalization, MaxPooling, and Dropout are also applied.
Batch Normalization is used to normalize the activations of a given input volume before passing it to the next layer in the network. It has been proven to be very effective at reducing the number of epochs required to train a CNN as well as stabilizing training itself.
POOL layers have a primary function of progressively reducing the spatial size (i.e. width and height) of the input volume to a layer. It is common to insert POOL layers between consecutive CONV layers in a CNN architecture.
Dropout is an interesting concept not to be overlooked. In an effort to force the network to be more robust we can apply dropout, the process of disconnecting random neurons between layers. This process is proven to reduce overfitting, increase accuracy, and allow our network to generalize better for unfamiliar images. As denoted by the parameter, 25% of the node connections are randomly disconnected (dropped out) between layers during each training iteration.
Note: If you’re new to deep learning, this may all sound like a different language to you. Just like learning a new spoken language, it takes time, study, and practice. If you’re yearning to learn the language of deep learning, why not grab my highly rated book, Deep Learning for Computer Vision with Python? I promise that I break down these concepts in the book and reinforce them via practical examples.
Moving on, we reach our next block of (CONV => RELU) * 2 => POOL
layers:
# (CONV => RELU) * 2 => POOL layer set model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))
Notice that our filter dimensions remain the same (3x3
, which is common for VGG-like networks); however, we’re increasing the total number of filters learned from 32 to 64.
This is followed by a (CONV => RELU => POOL) * 3
layer set:
# (CONV => RELU) * 3 => POOL layer set model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))
Again, notice how all CONV layers learn 3x3
filters but the total number of filters learned by the CONV layers has doubled from 64 to 128. Increasing the total number of filters learned the deeper you go into a CNN (and as your input volume size becomes smaller and smaller) is common practice.
And finally we have a set of FC => RELU
layers:
# first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) # softmax classifier model.add(Dense(classes)) model.add(Activation("softmax")) # return the constructed network architecture return model
Fully connected layers are denoted by Dense
in Keras. The final layer is fully connected with three outputs (since we have three classes
in our dataset). The softmax
layer returns the class probabilities for each label.
Now that SmallVGGNet
is implemented, let’s write the driver script that will be used to train it on our Animals dataset.
Much of the code here will be similar to the previous example, but I’ll:
- Review the entire script as a matter of completeness
- And call out any differences along the way
Open up the train_vgg.py
script and let’s get started:
# set the matplotlib backend so figures can be saved in the background import matplotlib matplotlib.use("Agg") # import the necessary packages from pyimagesearch.smallvggnet import SmallVGGNet from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.optimizers import SGD from imutils import paths import matplotlib.pyplot as plt import numpy as np import argparse import random import pickle import cv2 import os
The imports are the same as our previous training script with two exceptions:
- Instead of
from keras.models import Sequential
, this time we importSmallVGGNet
via
from pyimagesearch.smallvggnet import SmallVGGNet
. Scroll up slightly to see the SmallVGGNet implementation. - We will be augmenting our data with
ImageDataGenerator
. Data augmentation is almost always recommended and leads to models that generalize better. Data augmentation involves adding applying random rotations, shifts, shears, and scaling to existing training data. You won’t see a bunch of new .png and .jpg files — it is done on the fly as the script executes.
You should recognize the other imports at this point. If not, just refer to the bulleted list above.
Let’s parse our command line arguments:
# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset of images") ap.add_argument("-m", "--model", required=True, help="path to output trained model") ap.add_argument("-l", "--label-bin", required=True, help="path to output label binarizer") ap.add_argument("-p", "--plot", required=True, help="path to output accuracy/loss plot") args = vars(ap.parse_args())
We have four command line arguments to parse:
--dataset
: The path to our dataset of images on disk. This can be the path toanimals/
or another dataset organized the same way.--model
: Our model will be serialized and output to disk. This argument contains the path to the output model file. Be sure to name your model accordingly so you don’t overwrite any previously trained models (such as the simple neural network one).--label-bin
: Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.--plot
: The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data. Each time you train your model with changes to parameters, you should specify a different plot filename in the command line so that you’ll have a history of plots corresponding to training notes in your notebook or notes file. This tutorial makes deep learning seem easy, but keep in mind that I went through several iterations of training before I settled on all parameters to share with you in this script.
Let’s load and preprocess our data:
# initialize the data and labels print("[INFO] loading images...") data = [] labels = [] # grab the image paths and randomly shuffle them imagePaths = sorted(list(paths.list_images(args["dataset"]))) random.seed(42) random.shuffle(imagePaths) # loop over the input images for imagePath in imagePaths: # load the image, resize it to 64x64 pixels (the required input # spatial dimensions of SmallVGGNet), and store the image in the # data list image = cv2.imread(imagePath) image = cv2.resize(image, (64, 64)) data.append(image) # extract the class label from the image path and update the # labels list label = imagePath.split(os.path.sep)[-2] labels.append(label) # scale the raw pixel intensities to the range [0, 1] data = np.array(data, dtype="float") / 255.0 labels = np.array(labels)
Exactly as in the simple neural network script, here we:
- Initialize lists for our
data
andlabels
(Lines 35 and 36). - Grab
imagePaths
and randomlyshuffle
them (Lines 39-41). Thepaths.list_images
function conveniently will find all images in our input dataset directory before we sort andshuffle
them. - Begin looping over all
imagePaths
in our dataset (Line 44).
As we loop over each imagePath
, we proceed to:
- Load the
image
into memory (Line 48). - Resize the image to
64x64
, the required input spatial dimensions ofSmallVGGNet
(Line 49). One key difference is that we are not flattening our data for neural network, because it is convolutional. - Append the resized
image
todata
(Line 50). - Extract the class
label
of the image from theimagePath
and add it to thelabels
list (Lines 54 and 55).
On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] in array form.
We also convert the labels
list to a NumPy array format (Line 59).
Then we’ll split our data and binarize our labels:
# partition the data into training and testing splits using 75% of # the data for training and the remaining 25% for testing (trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25, random_state=42) # convert the labels from integers to vectors (for 2-class, binary # classification you should use Keras' to_categorical function # instead as the scikit-learn's LabelBinarizer will not return a # vector) lb = LabelBinarizer() trainY = lb.fit_transform(trainY) testY = lb.transform(testY)
We perform a 75/25 training and testing split on the data (Lines 63 and 64). An experiment I would encourage you to try is to change the training split to 80/20 and see if the results change significantly.
Label binarizing takes place on Lines 70-72. This allows for one-hot encoding as well as serializing our label binarizer to a pickle file later in the script.
Now comes the data augmentation:
# construct the image generator for data augmentation aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode="nearest") # initialize our VGG-like Convolutional Neural Network model = SmallVGGNet.build(width=64, height=64, depth=3, classes=len(lb.classes_))
On Lines 75-77, we initialize our image data generator to perform image augmentation.
Image augmentation allows us to construct “additional” training data from our existing training data by randomly rotating, shifting, shearing, zooming, and flipping.
Data augmentation is often a critical step to:
- Avoiding overfitting
- Ensuring your model generalizes well
I recommend that you always perform data augmentation unless you have an explicit reason not to.
To build our SmallVGGNet
, we simply call SmallVGGNet.build
while passing the necessary parameters (Lines 80 and 81).
Let’s compile and train our model:
# initialize our initial learning rate, # of epochs to train for, # and batch size INIT_LR = 0.01 EPOCHS = 75 BS = 32 # initialize the model and optimizer (you'll want to use # binary_crossentropy for 2-class classification) print("[INFO] training network...") opt = SGD(lr=INIT_LR, decay=INIT_LR / EPOCHS) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) # train the network H = model.fit(x=aug.flow(trainX, trainY, batch_size=BS), validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS, epochs=EPOCHS)
First, we establish our learning rate, number of epochs, and batch size (Lines 85-87).
Then we initialize our Stochastic Gradient Descent (SGD) optimizer (Line 92).
We’re now ready to compile and train our model (Lines 93-99). Our model.fit
call handles both training and on-the-fly data augmentation. We must pass the generator with our training data as the first parameter. The generator will produce batches of augmented training data according to the settings we previously made.
2020-05-13 Update: Formerly, TensorFlow/Keras required use of a method called fit_generator
in order to accomplish data augmentation. Now the fit
method can handle data augmentation as well, making for more-consistent code. Be sure to check out my articles about fit and fit generator as well as data augmentation.
Finally, we’ll evaluate our model, plot the loss/accuracy curves, and save the model:
# evaluate the network print("[INFO] evaluating network...") predictions = model.predict(x=testX, batch_size=32) print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=lb.classes_)) # plot the training loss and accuracy N = np.arange(0, EPOCHS) plt.style.use("ggplot") plt.figure() plt.plot(N, H.history["loss"], label="train_loss") plt.plot(N, H.history["val_loss"], label="val_loss") plt.plot(N, H.history["accuracy"], label="train_acc") plt.plot(N, H.history["val_accuracy"], label="val_acc") plt.title("Training Loss and Accuracy (SmallVGGNet)") plt.xlabel("Epoch #") plt.ylabel("Loss/Accuracy") plt.legend() plt.savefig(args["plot"]) # save the model and label binarizer to disk print("[INFO] serializing network and label binarizer...") model.save(args["model"], save_format="h5") f = open(args["label_bin"], "wb") f.write(pickle.dumps(lb)) f.close()
We make predictions on the testing set, and then scikit-learn is employed to calculate and print our classification_report
(Lines 103-105).
Matplotlib is utilized for plotting the loss/accuracy curves — Lines 108-118 demonstrate my typical plot setup. Line 119 saves the figure to disk.
2020-05-13 Update: In order for this plotting snippet to be TensorFlow 2+ compatible the H.history dictionary keys are updated to fully spell out “accuracy” sans “acc” (i.e., H.history["val_accuracy"]
and H.history["accuracy"]
). It is semi-confusing that “val” is not spelled out as “validation”; we have to learn to love and live with the API and always remember that it is a work in progress that many developers around the world contribute to.
Finally, we save our model and label binarizer to disk (Lines 123-126).
Let’s go ahead and train our model.
Make sure you’ve used the “Downloads” section of this blog post to download the source code and the example dataset.
From there, open up a terminal and execute the following command:
$ python train_vgg.py --dataset animals --model output/smallvggnet.model \ --label-bin output/smallvggnet_lb.pickle \ --plot output/smallvggnet_plot.png Using TensorFlow backend. [INFO] loading images... [INFO] training network... Train for 70 steps, validate on 750 samples Epoch 1/75 70/70 [==============================] - 13s 179ms/step - loss: 1.4178 - accuracy: 0.5081 - val_loss: 1.7470 - val_accuracy: 0.3147 Epoch 2/75 70/70 [==============================] - 12s 166ms/step - loss: 0.9799 - accuracy: 0.6001 - val_loss: 1.6043 - val_accuracy: 0.3253 Epoch 3/75 70/70 [==============================] - 12s 166ms/step - loss: 0.9156 - accuracy: 0.5920 - val_loss: 1.7941 - val_accuracy: 0.3320 ... Epoch 73/75 70/70 [==============================] - 12s 166ms/step - loss: 0.3791 - accuracy: 0.8318 - val_loss: 0.6827 - val_accuracy: 0.7453 Epoch 74/75 70/70 [==============================] - 12s 167ms/step - loss: 0.3823 - accuracy: 0.8255 - val_loss: 0.8157 - val_accuracy: 0.7320 Epoch 75/75 70/70 [==============================] - 12s 166ms/step - loss: 0.3693 - accuracy: 0.8408 - val_loss: 0.5902 - val_accuracy: 0.7547 [INFO] evaluating network... precision recall f1-score support cats 0.66 0.73 0.69 236 dogs 0.66 0.62 0.64 236 panda 0.93 0.89 0.91 278 accuracy 0.75 750 macro avg 0.75 0.75 0.75 750 weighted avg 0.76 0.75 0.76 750 [INFO] serializing network and label binarizer...
When you paste the command, ensure that you have all the command line arguments to avoid a “usage” error. If you are new to command line arguments, make sure you read about them before continuing.
Training on a CPU will take some time — each of the 75 epochs requires over one minute. Training will take well over an hour.
A GPU will finish the process in a matter of minutes as each epoch requires only 2sec, as demonstrated!
Let’s take a look at the resulting training plot that is in the output/
directory:
As our results demonstrate, you can see that we are achieving 76% accuracy on our Animals dataset using a Convolutional Neural Network, significantly higher than the previous accuracy of 60% using a standard fully-connected network.
We can also apply our newly trained Keras CNN to example images:
$ python predict.py --image images/panda.jpg --model output/smallvggnet.model \ --label-bin output/smallvggnet_lb.pickle --width 64 --height 64 Using TensorFlow backend. [INFO] loading network and label binarizer...
Our CNN is very confident that this a “panda”. I am too, but I just wish he would stop staring at me!
Let’s try a cute little beagle:
$ python predict.py --image images/dog.jpg --model output/smallvggnet.model \ --label-bin output/smallvggnet_lb.pickle --width 64 --height 64 Using TensorFlow backend. [INFO] loading network and label binarizer...
A couple beagles have been part of my family and childhood. I’m glad that this beagle picture I found online is recognized as a dog!
I could use a similar CNN to find dog photos of my beagles on my computer.
In fact, in Google Photos, if you type “dog” in the search box, pictures of dogs in your photo library will be returned — I’m pretty sure a CNN has been used for that image search engine feature. Image search engines aren’t the only use case for CNNs — I bet your mind is starting to come up with all sorts of ideas upon which to apply deep learning.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In today’s tutorial, you learned how to get started with Keras, Deep Learning, and Python.
Specifically, you learned the seven key steps to working with Keras and your own custom datasets:
- How to load your data from disk
- How to create your training and testing splits
- How to define your Keras model architecture
- How to compile and prepare your Keras model
- How to train your model on your training data
- How to evaluate your model on testing data
- How to make predictions using your trained Keras model
From there you also learned how to implement a Convolutional Neural Network, enabling you to obtain higher accuracy than a standard fully-connected network.
If you have any questions regarding Keras be sure to leave a comment — I’ll do my best to answer.
And to be notified when future Keras and deep learning posts are published here on PyImageSearch, be sure to enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
mohamed
Great: Adrian!
always forward
Thank you and thank you Igor
I have a suggestion as to how to apply some basic concepts of deep learning.
About how to write those equations in Python.
Many people know the concepts but there is a barrier between them and the application.
Adrian Rosebrock
Hey Mohamed — is there a particular algorithm/equation that you’re struggling with? Or are you speaking in more general terms? If you’re speaking more generically, then Deep Learning for Computer Vision with Python covers the basic concepts of both machine learning and deep learning, including some basic equations and theory before moving into actual applications and code.
mohamed
Thanks Adrian Yes, there are certain things that are facing me. But anyway I meant in general
The book is really wonderful I will work on getting the rest of the versions of it.
Newman
Not Working, here, different numbers on training, and a lot of wrong detection.
first:
precision recall f1-score support
cats 0.46 0.66 0.54 244
dogs 0.49 0.22 0.30 242
panda 0.69 0.78 0.73 264
avg / total 0.55 0.56 0.53 750
second method:
precision recall f1-score support
cats 0.66 0.77 0.71 244
dogs 0.76 0.55 0.63 242
panda 0.85 0.95 0.90 264
avg / total 0.76 0.76 0.75 750
everything seems fine but not the results.
Adrian Rosebrock
Could you share which version of Keras and TensorFlow (assuming a TF backend) you are running? Secondly, keep in mind that NNs are stochastic algorithms — there will naturally be variations in results and you should not expect your results to 100% match mine. The effects are random weight initializations are even more pronounced due the fact that we’re working with such a small dataset.
Enrique
Hi Adrian:
I test an image like this https://www.petdarling.com/articulos/wp-content/uploads/2014/06/como-quitarle-las-pulgas-a-mi-perro.jpg, however the result shown is “Panda 100%”. Why happend this?
Adrian Rosebrock
Pandas are largely white and black and the dog itself is dark brown and white. The network could have overfit to the panda class. The example used here is just that — an example. It’s not meant to be a model that can correctly classify each image class 100% of the time. For that, you will need more data and more advanced techniques. I would encourage you to take a look at Deep Learning for Computer Vision with Python for more information.
Aiden Ralph
Brilliant post Adrian!
Adrian Rosebrock
Thanks Aiden, I’m glad you liked it!
Aline
Amazing tutorial! So clear!
Adrian Rosebrock
Thanks so much, Aline! I’m glad you found it helpful 🙂
Reed Guo
Hi, Adrian
Excellent post.
Can we improve its accuracy?
Adrian Rosebrock
You could improve the accuracy by:
1. Using more images
2. Applying transfer learning
Models trained on ImageNet already have a panda, dog, and cat class as well so you could even use an out-of-the-box classifier.
Viktor
Hello! How can I download the Animals dataset?
Adrian Rosebrock
Just use the “Downloads” section of the blog post and you will be able to download the code and “Animals” dataset.
Vignesh Suresh
Great Video Adrian. Thanks
Hamid
Nice one Adrian ! I really appreciate it .
Such a wonderful post with elegant and simple explanation .
I wonder if increasing the no.of hidden layers and making dropout to 0.5 would further increase the accuracy from 78
Adrian Rosebrock
It may as those are hyperparameters. Give it a try and see!
Marcelo Mota
My friend, this is the best tutorial so far I have ever seen!! thank you so much.
I am struggling just in one point: I have a binary problem and have to use the to_categorical function from keras. As I am seeing, I can just use it with integers categories, not strings. Is this true?
And how do I use the pickle file to write this integer binary categories (from to_categorical) and also how do I use it in the classification_report (the code uses the “lb”)?
Thank you again and congratulations for such good and complete explanations!
Adrian Rosebrock
Thanks Marcelo, I’m glad you found the tutorial helpful!
For a binary problem you should use the
LabelEncoder
class instead ofLabelBinarizer
. TheLabelBinarizer
will integer-encode the labels which you can then pass intoto_categorical
to obtain the vector representation of the labels.The
LabelEncoder
can be serialized to disk and convert labels just like theLabelBinarizer
does.Mutlucan Tokat
Hi Adrian,
Range of the pixels are same. Every pixel gets values between 0-255. Why we need to scale it between 0 and 1 ?
Adrian Rosebrock
Most (but not all) neural networks expect inputs to be in the range [0, 1]. You may see other scaling and normalization techniques, such as mean subtraction, but those are more advanced methods. Your goal here is to bring every input to a common range. Exactly which range you use may depend on:
1. Weight initialization techniqu
2. Activation technique
3. Whether or not you are performing mean subtraction
In general, scaling to [0, 1] gives your variables less “freedom” and less likely of causing gradient or overflow errors if you keep larger value ranges (such as [0, 255]). [0, 1] scaling is typically your “first” scaling technique that you’ll see.
Bob de Graaf
Hi Adrian,
Great tutorial as always! I’m wondering though, isn’t this almost the same tutorial as the Pokemon one? The where you classify different 5 pokemons in images? The code seems mostly the same 🙂
I do see some small differences though, for example in the Pokemon on you use the Adam optimizer instead of SGD, and the initial learning rate is 0.001 instead of 0.01.
Are these changes things you’ve learned these past months to achieve better results? Or were these randomly chosen?
Adrian Rosebrock
The code is similar but not the same. This tutorial is meant to be significantly more introductory and is intended for readers with little to no experience with Keras and deep learning.
The parameters were also not randomly chosen. They were determined via experiments to find the optimal values for this particular post.
Bob de Graaf
Ah ok, good to know, thanks! I wasn’t trying to be offensive or anything, just curious. Apologies if I came across that way!
Adrian Rosebrock
You certainly were not being offensive, Bob. I just wanted to clarify, that’s all 🙂 Have a great day, friend!
Mutlu
Hi Adrian,
What is chanDim = -1 and chanDim = 1 in beginning of the smallVGGNET?
Great tutorial BTW.
Adrian Rosebrock
It’s the dimension of the channel. For channels-first ordering (ex. Thenano) the channels are ordered first but with channels-last ordering (like TensorFlow) the channels are ordered last — a “-1” value when indexing with Python means the “last dimension/value”.
Roshan
Hi Adrain,
Thank you for the excellent tutorial.
I have a basic question:
During validation, we are considering train, test split as 75% and 25%respectively.
So while testing, the network randomly picks 25% of images.
But if I want to find out which images are used for testing, how can I find out?
I want to know the names of the images used for testing.
Please help me
Adrian Rosebrock
The names of the images won’t be returned by scikit-learn. Instead, if you want the exact image names I would suggest you split your image paths rather than the raw images/labels. That will enable you to still perform the split and have the image paths.
andreas
Hi Adrian,
This was an excellent tutorial, very well presented and clear. I have a question, how would I add the bounding box using either nms or my own algorithm to show boxes around a image, like what is done in face detection?
Thanks,
Andreas
Adrian Rosebrock
We are performing image classification in this post. What you are looking to perform is called object detection. I would suggest you read this tutorial to get you started.
merly
never seen a simple and better tutorial..
Adrian Rosebrock
Thank you for the kind words Merly 🙂 Congratulations on getting your start with Keras!
merly
UserWarning: Trying to unpickle estimator LabelBinarizer from version 0.19.1 when using version 0.19.2. This might lead to breaking code or invalid results. Use at your own risk.
I am getting this error ..what should i do?
Adrian Rosebrock
Hey there, it’s not an error, it’s a warning. I would suggest you train the model first before you try to run it and make predictions.
Hashir
hi adrien,
This blog was awsome . i really appreciate you for this great effort and am your big fan.
After reading this keras + tf tutorial i understood lots of things. But i have to initialize my model or weights manually by using my own method like random initialization. so what step should i do in order to initialize this model manually…
Thanks in advance
Adrian Rosebrock
The model weights are automatically initialized during the call to
.compile
. You can change the initialization method by choosing one of the Keras initializers.Salman Sajid
Thanx Adrian
Can we use this technique for activity reconization or this technique is only for static object detection
Adrian Rosebrock
The method covered here is only for image classification, not activity recognition or object detection.
Wilf
Trained using keras 2.2.2 and tensorflow 1.10.0
Prediction for both simple_nn and smallvggnet failed on the dog.jpg image.
My question: How do you analyze/understand what went wrong? Is it overfitting, too few training images, poor training image selection, overfitting or something else?
Adrian Rosebrock
In order to help get everyone up and running with Keras and deep learning we used a very small dataset for this example. Typically, we would have at least 1,000 images per class. Our network is also far from perfect. We can increase the accuracy of our model by introducing regularization methods, such as L2 weighting, additional data augmentation, etc. If you’re interested in learning more about overfitting/underfitting, including how to detect them, I would suggest you read through Deep Learning for Computer Vision with Python.
Mattia
Hi Adrian,
After many issues with installing opencv, finally i got started with opencv.
I was trying this tutorial and when i launch the program with this command
python train_vgg.py –dataset animals –model output/smallvggnet.model \
–label-bin output/smallvggnet_lb.pickle \
–plot output/smallvggnet_plot.png
this is the result:
> /home/luca/Scrivania/keras-tutorial/train_simple_nn.py(78)()
-> model = Sequential()
(Pdb)
And it doesn’ t move on,
what should i do?
Adrian Rosebrock
How are you trying to execute the script? Via the command line?
Kirill
Got same problem. Run via command line (using fish, virtualenv, python 3.6.5, mac)
Adrian Rosebrock
Does bash produce the same error as fish?
inf111
just execute “continue” command
Hélder Ribeiro
Hhi adrian,
so using the vgg train as you describe everything goes smoothly and i’m getting the 70% plus accuracy, but when i try to predict something using the predict.py I’m always getting the panda prediction.
After doing some research I think it might be something related to the preprocessing of images??
But i’m not sure. One thing is clear when I add:
image = image.astype(‘float32’)
image = image/255
after the image read I start to have some better results, but not sure if this is the way, can you help me?
Thanks
Adrian Rosebrock
It sounds like the network is overfitting to the “panda” class. One method to increase accuracy would be to introduce more regularization, including additional data augmentation.
Stonez
Thanks for the great tutorial! Can I add more classes into the file structure, say, adding cow class?
There should be no changes in the code for this code to recognize dogs, cats, pandas, and cow, correct?
Thanks
Stonez
Adrian Rosebrock
As long as you follow my directory structure for the project and add a directory named “cow” with “cow” images to the
dataset
directory then yes, no code changes are required.Balaji
Hi,
I am getting following error when i try run predict script.
…
line 294, in from_config
model = cls(name=name)
UnboundLocalError: local variable ‘name’ referenced before assignment
Adrian Rosebrock
Hi Balaji, could you clarify which version of Keras and TensorFlow you are using? Additionally, did you train your model before trying to run the prediction script?
Alan
Hi Adrian.
I am with the same problem using tensorflow 1.5.0 (because my computer does not support AVX instructions) and Keras 2.2.3, and tensorflow 1.10.0 and Keras 2.2.3 in other machine.
I am trying:
python predict.py –image images/cat.jpg –model output/simple_nn.model \
–label-bin output/simple_nn_lb.pickle –width 32 –height 32 –flatten 1
python predict.py –image images/panda.jpg –model output/smallvggnet.model \
–label-bin output/smallvggnet_lb.pickle –width 64 –height 64
python predict.py –image images/dog.jpg –model output/smallvggnet.model \
–label-bin output/smallvggnet_lb.pickle –width 64 –height 64
Do I need train the model? I have download your files and I am trying execute them without train.
Thanks.
Adrian Rosebrock
Yes, make sure you train the model before you try to make predictions on images.
Niranjan A
Hello Adrian,
Every article that i check out on pyImageSearch always leaves me impressed. Great work.
I noticed that in every tutorial you use “argparse”. I wish to know if it makes any difference if we directly load our image into a variable directly instead of using argparse. If so, can you let me know what the difference is?
Thanks.
Adrian Rosebrock
Hey Niranjan — I think your confusion can be resolved by reading this guide on how argparse works. As you’ll find out, argparse just allows us to supply arguments via the command line instead of manually hardcoding them 🙂
jacob
hi adrian,
can you give me an example of a path that can be add in help”……” ?
Because when I start the train simple example it arrives to “[INFO] loading images…”
and then it doesn’t go on!
Ah , thanks for these amazing tutorials!!!!!!
Adrian Rosebrock
Hey Jacob, I think your confusion is related to how command line arguments work. Make sure you read this tutorial to help you clear up your confusion.
北凉徐凤年
hi adrian,
I dont understand when training train_vgg.py there is 70/70 [==================]
Where 70 come from? What does 70 mean? 70 pictures every epoch?
Thanks for your articles!
Adrian Rosebrock
That is actually the number of batches per epoch. There are 70 batches of images per epoch.
YewBoon
Hi Adrian,
Could you please explain further about how the 70 is calculated?
I assume is from this line of code, right?
model.fit_generator(aug.flow(trainX, trainY, batch_size=BS), validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS, epochs=EPOCHS)
Adrian Rosebrock
What specifically in Line 70 are you asking how is calculated? The steps_per_epoch value?
Jorge
Hello Adrian. Thank you very much for this excellent tutorial. I have a little question. Could you please describe what would be the architecture in keras code of the convolutional network if the dataset only had two categories? for example cats and dogs (eliminating the pandas folder) and always in the RGB color space. If you could make a parallel with the code of this tutorial for the three categories. I know it is a question that may be basic but it would help me to understand the architecture of the cnn in keras that I still do not have very clear. Thank you very much for your excellent work and congratulations for your wedding. Jorge from Argentina.
Adrian Rosebrock
The architecture itself would not change except for the final FC layer where there will be two nodes rather than three nodes. Other than that, there will be no other changes to the architecture itself. If you want to train a network for binary classification just make sure you use “binary_crossentropy” for your loss.
Yuthika Shekhar
Hi Adrian,
Thanks for the amazing explanation. I have few doubts.
If we try implementing with another dataset, is it supposed to be organized in the way you have organized?
And also, what does 32x32x3=3072 pixels in a flattened input image mean I am not able to understand the multiplication of 3?
Adrian Rosebrock
I would recommend you use the same directory structure that I use in the blog post. It will ensure that the code doesn’t have to be changed at all and you can just run the script to train on your own custom dataset.
As for your second question, images are represented as a 3 channel RGB image. Thus, for a 32×32 RGB image there are a total of 32x32x3=3072 values.
Nick
Hi, this tutorial is self-explanatory I have just started learning machine learning and this image recognition sounds really interesting and cool. I have downloaded all the required files and code from your site. I have Spyder install on Anaconda. I want to run these files. I need help in how can i start integrating these scripts.
How can i run this model on Spyder?
Thank You.
Adrian Rosebrock
Hey Nick, you can certainly use an IDE if you would like but I don’t recommend if you are new to computer vision and deep learning. Take the time to invest in your ability to execute the scripts via the command line. We use the command line quite a bit so become comfortable with it now. Additionally, while I don’t use the Spyder IDE you can use this tutorial on how to use an IDE with Python.
Megan
In Section 9, how do you choose 512 in the model.add(Dense(512)) line 60 of code after you’re done the CONV –> RELU steps?
Adrian Rosebrock
It’s a hyperparameter to the model architecture. You run experiments to tune the hyperparameters of the network. I discuss my best practices, tips, and suggestions to hyperparameter tuning inside my book, Deep Learning for Computer Vision with Python.
Farshad
Hi Adrian. Thanks for nice explanation. Is there any way to create a CNN model from scratch for object detection or object localization using keras? Can keras do it at all? I searched many posts in websites and all of them used keras for image classification only. If yes, I hope you publish a blog post tutorial about object detection by keras. Thanks for your amazing works.
Adrian Rosebrock
Great question, thanks for asking Farshad. I actually cover how to train your own custom Keras object detector inside Deep Learning for Computer Vision with Python.
Juanlu
Great post, but there is one thing missing which is making the predictions fail.
The same way we divide the inputs by 255.0, we need to do the same thing on the predictor before providing image as input on the NN.
Adrian Rosebrock
Thanks so much for pointing this out, Juanlu! It was a typo on my part. I have fixed the typo as well as the code download so the issue no longer exists.
andreas
Hi Adrian,
How do we add the object detection bounding boxes to the images?
Adrian Rosebrock
You cannot use a model trained for image classification as an object detector. I would suggest you read this tutorial on deep learning object detection so you can learn the fundamentals.
andreas
Hi Adrian,
I get this error..any suggestions?
(-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
Adrian Rosebrock
Double-check the path to your input dataset. Your path is likely incorrect and the
cv2.imread
function is returning “None”.Ctibor
Hi Adrian.
Thank you for your excellent tutorial. But it’s just for pictures. How could cnn be used to recognize sounds?
Adrian Rosebrock
Sorry, I don’t have any experience with deep learning for audio applications. I only work with computer vision here. Sorry I couldn’t be of more help!
Zachary Miller
This is by far the most simple to understand and useful tutorial on Keras that I have ever seen. You do a great job of explaining BOTH the concepts behind how the neural network works and what the different functions in the libraries are doing for us (the last part is often left out). Thank you so much!
Adrian Rosebrock
Thank you so much for the kind words, Zachary — I really appreciate that 🙂
moh
Hi, Adrian
Excellent work
If I have 1 channel images (ex medical images) and I wanted to apply this program to classify them, what should I change in this program especially the input_shape ?
Adrian Rosebrock
First, convert your images to grayscale when you load them:
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Secondly, change your
depth=1
when initializing SmallVGGNet.David
please send me source code of this post..
Adrian Rosebrock
You can use the “Downloads” section of the post to download the source code.
Akshay
where can I find the dataset for this tutorial?
Adrian Rosebrock
You can use the “Downloads” section of this post to download the source code and dataset.
Jarvis
Hi Adrian
Thank you for this tutorial.
I have two doubts :
1. At the data =np.array(data, dtype=”float”)
i am getting a sequence error , in short I am not able to convert the data dtype(‘O’) into float.
I got rid of this error by copying it into another float array. But after trying for many hours I could’nt solve this error.
2. I am getting loss =NaN
I have checked my input data and I am sure that none of the input values have nan value.
Any help will be appreciated.
thank you
Adrian Rosebrock
Are you using the exact code and datasets from this tutorial? Or are you working with your own custom dataset?
Jarvis
Hi thanks Adrian for the post again.
I figured the errors myself. I was using a custom dataset and some of the images were corrupted due to which i was getting these errors.
Adrian Rosebrock
Congrats on resolving the issue!
AHMED ARUP KAMAL
Hi Adrian,
I managed to run it. But after 1200 epoch and 0.01 learning rate, my training accuracy is ~1.0 and validation accuracy is ~0.50!
What’s happening??!!
Adrian Rosebrock
Your network is overfitting to the data. Training for longer isn’t necessarily going to give you better accuracy. Instead, you need to learn how to properly set the hyperparameters of the network. To improve your accuracy and learn my tips, suggestions, and best practices to improve the accuracy of your networks, make sure you refer to Deep Learning for Computer Vision with Python.
Saptarshi
Hey! Loved the tutorial.
Can the same VGG network be used for a hand gesture recognition system for classifying gestures from A-Z (26 classes)?
Adrian Rosebrock
Not as it stands. VGG is a classification network and presuming you are referring to the pre-trained VGG network on ImageNet there is no hand gesture classes. You would need to fine-tune VGG to recognize hand gestures. For what it’s worth I cover tranfer learning and fine-tuning inside my book, Deep Learning for Computer Vision with Python.
fishwolf
Is it possible know where the cat/dog/panda are into image?
is it possible running this process in real time with a video streaming?
Thanks
Adrian Rosebrock
What you are referring to is called “object detection”. See this most for more details on object detection.
riyaz
can you do the same problem for binary classification.. i got stuck at doing that… i have 2 classes only.. and i want to save the model also
Adrian Rosebrock
You’ll want to change your loss function to “binary_crossentropy” for 2-class, binary classification. This tutorial covers how to save and load your models with Keras. For more details on deep learning, including how to get started, I would suggest working through Practical Python and OpenCV.
sruthi
Hi,
i didn’t understand how feature extraction is done in this code. I have applied the same code for gender recognition and the only difference from your code is the training set images. I would like to know how feature extraction is done and what features have been extracted.
Adrian Rosebrock
If you’re interested in feature extraction via pre-trained CNNs (including gender recognition) then definitely take a look at Deep Learning for Computer Vision with Python where I cover the topic in detail.
sruthi
I have used this exact same code for gender recogniton. It is working but i would like to know what feature are being extracted as well as how feature extraction is done. Can you please reply asap.The only difference from your code is the dataset used.
Adrian Rosebrock
I cover that exact topic inside Deep Learning for Computer Vision — my suggest is to start there.
Nguyen Anh Duy
Hi Adrian,
I only want to classify dog and cat, so I change “caterogry_crossentropy” to “binary_crossentropy, then it has error:
“expected activation_2 to have shape (2 ) but got array with shape (1 )”
then I change to “spare_category_crossentropy” and it works.
But if I want to classify gray images, for examples “number 0” and “number 1” in MINST dataset, I do “binary_crossentropy” and change the input_shape to:
“model = SmallVGGNet.build(width=28, height=28, depth=1, classes=len(lb.classes_))
then it shows similar error as:
“expected activation_2 to have shape (2 ) but got array with shape (1 )”
Could you help me?
Thank you very much.
Adrian Rosebrock
1. See my note on Lines 66-69 about using Keras’ “to_categorical” function.
2. You should be using “binary_crossentropy” as your loss.
Once you switch both of those you will be able to train the network.
Daniel
Adrian,
how do you use the “to_categorical” function in this context? I don’t have a lot of coding experience with python. I googled for examples but it did not work for me.
Also, in previous related questions you mentioned to change to LabelEncoder (instead of LabelBinarizer). I tried my hand at that but did not work:
I changed:
lb = LabelBinarizer() to lb = LabelEncoder()
I also change to the binary_crossentropy as a loss function. But I am getting an error like Nguyen mentioned above.
Many thanks for the tutorial. It is REALLY helpful.
Adrian Rosebrock
You first encode using LabelEncoder and then call to_categorical, similar to the following:
bramata vikana
hi adriand thank you for your explanation , but can you explain me what is numClasses is ? thank you so much
Adrian Rosebrock
The “numClasses” is the total number of unique class labels. For example, suppose you had a three class dataset: dogs, cats, and pandas. Then “numClasses=3” since you have three total classes.
Parvez Alam
Hello Adrain Brother, Being in final Year of college i found your resources are quite awesome.
>>>>I have a Doubt in using No. of classes <<<<
You have used 3 classes(cat,dog,panda) and you vectorized to trainY and testY as below…
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
and in comments you mentioned if 2 classes is used then, LabelEncoder is used instead of LabelBinarizer and fit_transform and to_categorical is applied for labels, as below
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, numClasses)
Could you please explain me When to binarize the labels, When to binarize the trainY and testY, and when to use LabelBinarizer, LabelEncoder and to_categorical?
Adrian Rosebrock
Hey Parvez — I address that exact question inside Deep Learning for Computer Vision with Python. I suggest you start there.
Beatrice van Eden
Did you get this working? I check this out today but obviously, I do not understand exactly what is going on. I keep on getting errors even after using well-explained code on this.
Adrian Rosebrock
Hi Beatrice — did you see my previous comment? I provided you with code you could use.
Robert
Hi
This is a great tutorial.
I’ve purchased your book and it’s supporting material and look forward to reading and learning even more about this topic.
Keep up the good work.
Robert.
Adrian Rosebrock
Thanks so much, Robert! I hope you are enjoying. By all means, feel free to reach out if you have any questions on it 🙂
Beatrice van Eden
Hi
Yes, I did. I made the modifications but then get an error when training the neural net needs to happen. The shape of the array is not what it is expecting any more.
Alexander
Hello and nice guide!
I got a question, is this tutorial for windows or linux?
Adrian Rosebrock
Provided you have Keras properly installed this tutorial will work on Linux, macOS, and Windows.
Alexander
Thank you for the response!
I have another question, I tried running the train_vgg script and it takes about 3-4 minutes per epoch on my computer. How do I tell Tensorflow to use my GPU instead of my CPU? I assume it uses my CPU since the timers are well over 1 minute.
Adrian Rosebrock
You can use the “nvidia-smi” command to check and see if your GPU is being utilized. You’ll also want to ensure the “tensorflow-gpu” package is installed.
Sky
I learned that the evaluation dataset is used to tunning the hyperparameters.
In this blog, what are the hyperparameters?
Adrian Rosebrock
The hyperparameters include the learning rate, number of nodes/filters for each layer, and any regularization. I would definitely suggest reading through Deep Learning for Computer Vision with Python where I cover hyperparameters (and how to properly tune them) in detail.
Beatrice van Eden
Thank you for sharing this with us. I found it to be of great benefit for me.
# do you have a similar tutorial for RGB-D data? I know you add the extra channel but I suppose my struggle is even before that, with the per processing of the data. I recorded a ROS bag with the RGB-D data then I can extract the RGB in a folder and the D in another, then I get confused when trying to give it as an input to the convnet. (I struggle with the coding).
Adrian Rosebrock
Sorry, I Do not have any tutorials for RGB-D data.
Adrian Rosebrock
See this tutorial.
Tuan Anh Nguyen
Hello! thank you for sharing this with us!!!
I still do not understand how you label dogs, cats and pandas? Please explain to me the label?
Adrian Rosebrock
I manually labeled those images themselves. I created a directory for each of the dogs, cats, and pandas images, then placed each into their corresponding directory.
Beatrice van Eden
Thank you.
Adrian Rosebrock
No problem, I’m glad you found it helpful!
mary
hey this tutorial is awesome,the code for non CNN worked just fine but when i ran it for CNN with smallvggnet it gave me the error:
Import Error: No module named ‘pyimagesearch’
how to resolve this?
and secondly if i use the this line : image = cv2.resize(image, (64, 64))
will it resize all my images into 64×64 no matter what the original size? plus how do i know that the images being fed into the neural network are fine for training,wont the larger images be distorted like that?(the details are unable to be observed for training)
My last question,for this line in smallvggnet script:inputShape = (width, height,depth )
do i write the dimensions which i want the image in or what the image already has?(in a dataset how can i tell about 1 image it has many images!)
Adrian Rosebrock
Hey Mary — make sure you use the “Downloads” section of the code to download the source code. It sounds like you may have copied and pasted which likely caused the error.
Secondly, I would recommend you read Deep Learning for Computer Vision with Python so you can learn the fundamentals of deep learning. That book will help you understand how we preprocess images and better enable you to train your own CNNs.
Chinmaya Panda
Dear Sir,
This is a best literature I have come across internet for ML implementation.
It is the exact way for my assigned work.
Today morning 9am I have started and finished all by 8pm.
I have understood the concepts, and Implemented in Jupyter Notebook, and got the result after few changes.
CNN based model testing is pending, but I will do this from your other blog.
Such a nice way of explanation and detailed code needs lots of appreciation, so I am dropping this message.
Thanks a lot for your contribution for society and Human Race.
Adrian Rosebrock
Thanks Chinmaya, I really appreciate the kind words 🙂 Congrats on training your own NNs and CNNs!
Ali
Hi dear adrian!
Can you help me to train it for two classes only.
Adrian Rosebrock
See this comment thread.
Henrique
Hi Adrian,
Can i use this code to train 1 class only?
I’m trying to identify a object in photo. If the object is there i will receive a “ok” and if it’s not i will receive a “nok”
Ammu
Hi. how to download the animals dataset?
I couldn’t find it in the downloads section.
Thanks
Adrian Rosebrock
Download the .zip of the file using the “Downloads” section of the tutorial. You’ll find the “animals” dataset there.
Andres
This was a very detailed tutorial. If I wanted to use Tensorflow 2.0 with the new keras interface, would I need to simply do something like: “import tensorflow.keras as keras” and the rest would work the same?
Thanks
Adrian Rosebrock
You are absolutely correct! Since TensorFlow 2.0 is making big moves to use the “tensorflow.keras” package you can just import all Keras classes/functions directly from “tensorflow.keras”.
Abdullah
Hi Adrian, if I am adding another class “cow” for example, isn’t it necessary to change the epochs number?
Adrian Rosebrock
Not necessarily. The epochs doesn’t impact the number of classes or vice versa. Try training using the same number of epochs. Additionally you should read Deep Learning for Computer Vision with Python to learn my best practices, tips, and suggestions when training your own deep learning models.
Agnes
Hi Adrian,
I would like to know if there is an explanation for fixing the number of neurons in the first hidden layer as 1024 from the input shape as 3072 in Line 76 in train_simple_nn.py file. I understand in every hidden layer due to dimensionality reduction, the image size gets reduced to one half of the original image size. Hence from Hidden layer 1 -> Hidden Layer 2 the pixels get reduced to 512 from 1024. But How does it change from 3072 to 1024?
Thanks in Advance…..
Srinivas and Mangipudi
Hi I got an error after the training and network evaluation finished. The error was in generating the plot:
Traceback (most recent call last):
File “train_simple_nn.py”, line 111, in
plt.plot(N, H.history[“acc”], label=”train_acc”)
KeyError: ‘acc’
Srinivas and Mangipudi
Hi, I managed to get rid of the error by using metrics=[“acc”] in model.compile.
But after running training set, i notice that the accuracy is below 50%, that means it is performing worse than random chance. Infact I gave it a cat image to predict but it predicted it was a dog with 63% accuracy.
I don’t understand why its doing this?
Adrian Rosebrock
In TensorFlow 2.0 the “acc” key was changed to “accuracy” and “val_accuracy”, respectively.
Arif
Hi Adrian,
If I would like to implement face recognition application based on your codes, What should I do except adding the face detection?
Thank you
Adrian Rosebrock
You should follow my tutorials on face applications and face recognition.
Aditi
Hi Adrain
Thanks for you post. Since i am using keras 2.3.1 and tensorflow 2.0.0. I read the previous comments and i changed the “acc” to accuracy and i got my plot as png. But still the other two model file and the pickle file is not loaded in the output folder. And also the pickle file is not loaded.
Thanks:)
Adrian Rosebrock
Make sure you train your model first. Once the model is trained you can then make predictions using it.
Anja
Hi Adrian,
I created my own model with train_vgg.py and it works great. 🙂
With predict.py I can check individual images. However, I would like to check a live video with the created model.
That’s why I changed the code so that it checks the frames of the webcam – predictVideo.py. Alternatively, video files.
Unfortunately, the recognition (labeling) does not work well here, although I use the same model as the one
Checking individual images.
Example:
I extracted individual pictures from a video file and I check them with predict.py
Result: Everything is recognized correctly.
If I now check the video file with predictVideo.py, nothing is correctly recognized.
Is it because you cannot use the model for live or video file recognition?
Do I have to train the model differently?
Thanks a lot!
Anja
Adrian Rosebrock
It’s hard to say what the issue is without seeing your code or video, but I would suggest you start with this tutorial to help you learn how to apply a Keras model to a video stream.
Secondly, double and triple-check that your preprocessing steps are the same for inference/prediction as they are for training. A common mistake I see beginners make is forgetting to preprocess their images in the same manner as training.
teimoor
Hi how to insert my trained model since i don’t have any trained model in my disk?. it is required argument as per your code
Adrian Rosebrock
You need to train your model before you serialize it to disk. From there you can use it to classify new input images.
Tharumudu
Hi Adrian,
This is a great tutorial and made everything easy for me as always. I would like to know a robust way to predict when I have like around 50,000 images. I’m currently looping through the images with ” tensorflow.keras.backend.clear_session() ” line after the prediction line.
Is there any way of predict all the images at once and then loop through them and save?
Adrian Rosebrock
You mean make predictions on all 50,000 images? Yes, absolutely, just use Keras’
predict_generator
function.vikas
Hello Adrian sir,
Thank you very much for great tutorial .It’s very awesome and very easy to understand.
I have implemented it on my own data set of 3-classes of documents(Driving licences. I got good accuracy and I am getting good results on unseen images which belongs to the classes. But when I try to predict the image other than these 3-classes (ex. Dog or Cat) then also it showing match with one of the classes.. Why these is so? Please help.
Adrian Rosebrock
You need to create a 4-th class called “ignore” that does not contain any of the documents, that way your model can predict one of the 3 document classes or the 4th “ignore” class.
Luis
What should we change if using a binary class dataframe?
using to_Categorical() would change some details in the code, what would be then?
Adrian Rosebrock
Take a look at the comments on this post as I have addressed that question a few times.