In this tutorial, you will learn how to perform fine-tuning with Keras and Deep Learning.
We will take a CNN pre-trained on the ImageNet dataset and fine-tune it to perform image classification and recognize classes it was never trained on.
Today is the final post in our three-part series on fine-tuning:
- Part #1: Transfer learning with Keras and Deep Learning
- Part #2: Feature extraction with on large datasets with Keras and Deep Learning
- Part #3: Fine-tuning with Keras and Deep Learning (today’s post)
I would strongly encourage you to read the previous two tutorials in the series if you haven’t yet — understanding the concept of transfer learning, including performing feature extraction via a pre-trained CNN, will better enable you to understand (and appreciate) fine-tuning.
When performing feature extraction we did not re-train the original CNN. Instead, we treated the CNN as an arbitrary feature extractor and then trained a simple machine learning model on top of the extracted features.
Fine-tuning, on the other hand, requires that we not only update the CNN architecture but also re-train it to learn new object classes.
Fine-tuning is a multi-step process:
- Remove the fully connected nodes at the end of the network (i.e., where the actual class label predictions are made).
- Replace the fully connected nodes with freshly initialized ones.
- Freeze earlier CONV layers earlier in the network (ensuring that any previous robust features learned by the CNN are not destroyed).
- Start training, but only train the FC layer heads.
- Optionally unfreeze some/all of the CONV layers in the network and perform a second pass of training.
If you are new to deep learning and CNNs, I would recommend you stop here and learn how to train your first CNN.
Fine-tuning with Keras is a more advanced technique with plenty of gotchas and pitfalls that will trip you up along the way (for example, it tends to be very easy to overfit a network when performing fine-tuning if you are not careful).
To learn how to perform fine-tuning with Keras and deep learning, just keep reading.
Fine-tuning with Keras and Deep Learning
2020-06-04 Update: This blog post is now TensorFlow 2+ compatible!
Note: Many of the fine-tuning concepts I’ll be covering in this post also appear in my book, Deep Learning for Computer Vision with Python. Inside the book, I go into considerably more detail (and include more of my tips, suggestions, and best practices). If you would like more detail on fine-tuning with Keras after going through this guide, definitely take a look at my book.
In the first part of this tutorial, we’ll discuss the concept of fine-tuning and how we can re-train a neural network to recognize classes it was not originally trained to recognize.
From there we’ll review the dataset we are using for fine-tuning.
I’ll then discuss our project directory structure.
Once we have a good handle on the dataset we’ll then switch to implementing fine-tuning with Keras.
After you have finished going through this tutorial you will be able to:
- Fine-tune networks with Keras.
- Make predictions using the fine-tuned network.
Let’s get started!
What is fine-tuning?

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning and fine-tuning, please refer to the text.
Earlier in this series of posts on transfer learning, we learned how to treat a pre-trained Convolutional Neural Network as a feature extractor.
Using this feature extractor, we forward propagated our dataset of images through the network, extracted the activations at a given layer (treating the activations as a feature vector), and then saved the values to disk.
A standard machine learning classifier (in our case, Logistic Regression), was trained on top of the CNN features, exactly as we would do with hand-engineered features such as SIFT, HOG, LBPs, etc.
This approach to transfer learning is called feature extraction.
But there is another type of transfer learning, one that can actually outperform the feature extraction method. This method is called fine-tuning and requires us to perform “network surgery”.
First, we take a scalpel and cut off the final set of fully connected layers (i.e., the “head” of the network where the class label predictions are returned) from a pre-trained CNN (typically VGG, ResNet, or Inception).
We then replace the head with a new set of fully connected layers with random initializations.
From there, all layers below the head are frozen so their weights cannot be updated (i.e., the backward pass in back propagation does not reach them).
We then train the network using a very small learning rate so the new set of fully connected layers can learn patterns from the previously learned CONV layers earlier in the network — this process is called allowing the FC layers to “warm up”.
Optionally, we may unfreeze the rest of the network and continue training. Applying fine-tuning allows us to utilize pre-trained networks to recognize classes they were not originally trained on.
And furthermore, this method can lead to higher accuracy than transfer learning via feature extraction.
Fine-tuning and network surgery
Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning and fine-tuning, please refer to the text.
As we discussed earlier in this series on transfer learning via feature extraction, pre-trained networks (such as ones trained on the ImageNet dataset) contain rich, discriminative filters. The filters can be used on datasets to predict class labels outside the ones the network has already been trained on.
However, instead of simply applying feature extraction, we are going to perform network surgery and modify the actual architecture so we can re-train parts of the network.
If this sounds like something out of a bad horror movie; don’t worry, there won’t be any blood and gore — but we’ll have some fun and learn a lot about transfer learning via our Dr. Frankenstien-esque network experiments.
To understand how fine-tuning works, consider the following figure:

On the left we have the layers of the VGG16 network.
As we know, the final set of layers (i.e., the “head”) are our fully connected layers along with our softmax classifier.
When performing fine-tuning, we actually sever the head of the network, just as in feature extraction (Figure 2, middle).
However, unlike feature extraction, when we perform fine-tuning we actually build a new fully connected head and place it on top of the original architecture (Figure 2, right).
The new FC layer head is randomly initialized (just like any other layer in a new network) and connected to the body of the original network.
However, there is a problem:
Our CONV layers have already learned rich, discriminative filters while our FC layers are brand new and totally random.
If we allow the gradient to backpropagate from these random values all the way through the network, we risk destroying these powerful features.
To circumvent this problem, we instead let our FC head “warm up” by (ironically) “freezing” all layers in the body of the network (I told you the horror/cadaver analogy works well here) as depicted in Figure 2 (left).

Training data is forward propagated through the network as we usually would; however, the backpropagation is stopped after the FC layers, which allows these layers to start to learn patterns from the highly discriminative CONV layers.
In some cases, we may decide to never unfreeze the body of the network as our new FC head may obtain sufficient accuracy.
However, for some datasets it is often advantageous to allow the original CONV layers to be modified during the fine-tuning process as well (Figure 3, right).
After the FC head has started to learn patterns in our dataset, we can pause training, unfreeze the body, and continue training, but with a very small learning rate — we do not want to alter our CONV filters dramatically.
Training is then allowed to continue until sufficient accuracy is obtained.
Fine-tuning is a super-powerful method to obtain image classifiers on your own custom datasets from pre-trained CNNs (and is even more powerful than transfer learning via feature extraction).
If you’d like to learn more about transfer learning via deep learning, including:
- Deep learning-based feature extraction
- Training models on top of extracted features
- Fine-tuning networks on your own custom datasets
- My personal tips, suggestions, and best practices for transfer learning
…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover these algorithms and techniques in detail.
Configuring your development environment
To configure your system for this tutorial, I first recommend following either of these tutorials:
Either tutorial will help you configure you system with all the necessary software for this blog post in a convenient Python virtual environment.
Please note that PyImageSearch does not recommend or support Windows for CV/DL projects.
2020-06-03 Update: This TensorFlow 2.0 bug affects this blog post: Filling up shuffle buffer (this may take a while)
The Food-11 Dataset

The dataset we’ll be using for fine-tuning is the Food-11 dataset from the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology.
The dataset consists of 16,643 images belonging to 11 major food categories:
- Bread (1724 images)
- Dairy product (721 images)
- Dessert (2,500 images)
- Egg (1,648 images)
- Fried food (1,461images)
- Meat (2,206 images)
- Noodles/pasta (734 images)
- Rice (472 images)
- Seafood (1,505 images)
- Soup (2,500 images)
- Vegetable/fruit (1,172 images)
Using the Food-11 dataset we can train a deep learning model capable of recognizing each major food group — such a model could be used, for example, in a mobile fitness application that automatically tracks estimated food group and caloric intake.
To train such a model, we’ll be utilizing fine-tuning with the Keras deep learning library.
Downloading the Food-11 dataset
Go ahead and grab the zip from the “Downloads” section of this blog post.
Once you’ve downloaded the source code, change directory into fine-tuning-keras :
$ unzip fine-tuning-keras.zip $ cd fine-tuning-keras
2020-06-04 Update: In my experience, I’ve found that downloading the Food-11 dataset is unreliable from its original source. Therefore, I’m providing a direct download link to the Food-11 dataset. The option to use wget or FTP is now removed from this article.
Now, go ahead and download Food-11 using the following link:
After you download the dataset, you can extract it into your project directory (fine-tuning-keras):
$ unzip Food-11.zip
Project structure
Now that we’ve downloaded the project and dataset, go ahead and navigate back to the project root. From there let’s analyze the project structure:
$ cd .. $ tree --dirsfirst --filelimit 10 . ├── Food-11 │ ├── evaluation [3347 entries] │ ├── training [9866 entries] │ └── validation [3430 entries] ├── dataset ├── output │ ├── unfrozen.png │ └── warmup.png ├── pyimagesearch │ ├── __init__.py │ └── config.py ├── build_dataset.py ├── predict.py ├── Food-11.zip └── train.py 7 directories, 8 files
Our project structure is similar to last week’s.
Our original dataset is in the Food-11/ directory. See the direct link above to download the dataset.
Executing build_dataset.py enables us to organize the Food-11 images into the dataset/ directory.
From there, we’ll use train.py to perform fine tuning.
Finally, we’ll use predict.py to make predictions on sample images using our fine-tuned network.
Each of the aforementioned scripts takes advantage of a configuration file named config.py . Let’s go ahead and learn more about the configuration script now.
Understanding our configuration file
Before we can actually fine-tune our network, we first need to create our configuration file to store important variables, including:
- Paths to the input dataset
- Class labels
- Batch size/training parameters
- Output paths, including model files, label encoders, plot histories, etc.
Since there are so many parameters that we need, I’ve opted to use a configuration file to keep our code nice and organized (versus having to utilize many command line arguments).
Our configuration file, config.py, lives in a Python module named pyimagesearch .
We keep the config.py file there for two reasons:
- To ensure we can import the configuration into our own Python scripts
- To keep our code tidy and organized
Note: This config file is similar to the one in last week’s and the prior week’s tutorials.
Let’s fill our config.py file now — open it up in your favorite code editor and insert the following lines:
# import the necessary packages import os # initialize the path to the *original* input directory of images ORIG_INPUT_DATASET = "Food-11" # initialize the base path to the *new* directory that will contain # our images after computing the training and testing split BASE_PATH = "dataset"
First, we import os , enabling us to build file/directory paths directly in this config.
The original dataset path where we extracted the Food-11 dataset is contained in ORIG_INPUT_DATASET (Line 5).
Then we specify the BASE_PATH  where our organized dataset will soon reside.
From there we’ll define the names of our TRAIN , TEST , and VAL  directories:
# define the names of the training, testing, and validation # directories TRAIN = "training" TEST = "evaluation" VAL = "validation"
Followed by listing the eleven CLASSES  of our Food-11 dataset:
# initialize the list of class label names CLASSES = ["Bread", "Dairy product", "Dessert", "Egg", "Fried food", "Meat", "Noodles/Pasta", "Rice", "Seafood", "Soup", "Vegetable/Fruit"]
Finally, we’ll specify our batch size and model + plot paths:
# set the batch size when fine-tuning BATCH_SIZE = 32 # initialize the label encoder file path and the output directory to # where the extracted features (in CSV file format) will be stored LE_PATH = os.path.sep.join(["output", "le.cpickle"]) BASE_CSV_PATH = "output" # set the path to the serialized model after training MODEL_PATH = os.path.sep.join(["output", "food11.model"]) # define the path to the output training history plots UNFROZEN_PLOT_PATH = os.path.sep.join(["output", "unfrozen.png"]) WARMUP_PLOT_PATH = os.path.sep.join(["output", "warmup.png"])
Our BATCH_SIZE of 32 represents the size of the chunks of data that will flow through our CNN.
We’ll eventually store our serialized label encoder as a pickle file (Line 27).
Our fine-tuned serialized Keras model will be exported to the MODEL_PATH (Line 31).
Similarly, we specify the paths where our warmup and unfrozen plot images will be stored (Lines 34 and 35).
Building our image dataset for fine-tuning
If we were to store the entire Food-11 dataset in memory, it would occupy ~10GB of RAM.
Most deep learning rigs should be able to handle that amount of data, but nevertheless, I’ll be showing you how to use the .flow_from_directory function with Keras to only load small batches of data from disk at a time.
However, before we can actually get to fine-tuning and re-training a network, we first must (correctly) organize our dataset of images on disk.
In order to use the .flow_from_directory function, Keras requires that we have our dataset organized using the following template:
dataset_name/class_label/example_of_class_label.jpg
And since the Food-11 dataset also provides pre-supplied data splits, our final directory structure will have the form:
dataset_name/split_name/class_label/example_of_class_label.jpg 
Having the above directory structure ensures that:
- The .flow_from_directoryfunction will properly work.
- Our dataset is organized into a neat, easy to follow directory structure.
In order to take the original Food-11 images and then copy them into our desired directory structure, we need the build_dataset.py script.
Let’s review that script now:
# import the necessary packages
from pyimagesearch import config
from imutils import paths
import shutil
import os
# loop over the data splits
for split in (config.TRAIN, config.TEST, config.VAL):
	# grab all image paths in the current split
	print("[INFO] processing '{} split'...".format(split))
	p = os.path.sep.join([config.ORIG_INPUT_DATASET, split])
	imagePaths = list(paths.list_images(p))
	# loop over the image paths
	for imagePath in imagePaths:
		# extract class label from the filename
		filename = imagePath.split(os.path.sep)[-1]
		label = config.CLASSES[int(filename.split("_")[0])]
		# construct the path to the output directory
		dirPath = os.path.sep.join([config.BASE_PATH, split, label])
		# if the output directory does not exist, create it
		if not os.path.exists(dirPath):
			os.makedirs(dirPath)
		# construct the path to the output image file and copy it
		p = os.path.sep.join([dirPath, filename])
		shutil.copy2(imagePath, p)
Lines 2-5 import our necessary packages, in particular, our config .
From there we loop over data splits beginning on Line 8. Inside, we:
- Extract imagePathsand each classlabel(Lines 11-18).
- Create a directory structure for our organized image files (Lines 21-25).
- Copy the image files into the appropriate destination (Lines 28 and 29).
This script has been reviewed in more detail inside the Transfer learning with Keras and deep learning post. If you would like more detail on the inner-workings of build_dataset.py , please refer to the previous tutorial.
Before continuing, make sure you have used the “Downloads” section of the tutorial to download the source code associated with this blog post.
From there, open up a terminal and execute the following command:
$ python build_dataset.py [INFO] processing 'training split'... [INFO] processing 'evaluation split'... [INFO] processing 'validation split'...
If you investigate the dataset/  directory you’ll see three directories, one for each of our respective data splits:
$ ls dataset/ evaluation training validation
Inside each of the data split directories you’ll also find class label subdirectories:
$ ls -l dataset/training/ Bread Dairy product Dessert Egg Fried food Meat Noodles Rice Seafood Soup Vegetable
And inside each of the class label subdirectories you’ll find images associated with that label:
$ ls -l dataset/training/Bread/*.jpg | head -n 5 dataset/training/Bread/0_0.jpg dataset/training/Bread/0_1.jpg dataset/training/Bread/0_10.jpg dataset/training/Bread/0_100.jpg dataset/training/Bread/0_101.jpg
Implementing fine-tuning with Keras
Now that our images are in the proper directory structure, we can perform fine-tuning with Keras.
Let’s implement the fine-tuning script inside train.py :
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from sklearn.metrics import classification_report
from pyimagesearch import config
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import os
Lines 2-19 import required packages. Let’s briefly review those that are most important to the fine-tuning concepts in today’s post:
- matplotlib: We’ll be plotting our frozen and unfrozen training efforts. Line 3 sets the backend ensuring that we can save our plots to disk as image files.
- ImageDataGenerator: Allows for data augmentation. Be sure to refer to DL4CV and this blog post for more information on this class.
- VGG16: The seminal network trained on ImageNet that we’ll be slicing and dicing with our scalpel for the purposes of fine-tuning.
- classification_report: Calculates basic statistical information upon evaluation of our model.
- config: Our custom configuration file which we reviewed in the “Understanding our configuration file” section.
Be sure to familiarize yourself with the rest of the imports as well.
With the packages at our fingertips, we’re now ready to move on. Let’s start by defining a function for plotting training history:
def plot_training(H, N, plotPath):
	# construct a plot that plots and saves the training history
	plt.style.use("ggplot")
	plt.figure()
	plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
	plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
	plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
	plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
	plt.title("Training Loss and Accuracy")
	plt.xlabel("Epoch #")
	plt.ylabel("Loss/Accuracy")
	plt.legend(loc="lower left")
	plt.savefig(plotPath)
2020-06-03 Update: In order for this plotting snippet to be TensorFlow 2+ compatible the H.history dictionary keys are updated to fully spell out “accuracy” sans “acc” (i.e., H.history["val_accuracy"] and H.history["accuracy"]). It is semi-confusing that “val” is not spelled out as “validation”; we have to learn to love and live with the API and always remember that it is a work in progress that many developers around the world contribute to.
The plot_training function is defined on Lines 21-33. This helper function will be used to construct and save a plot of our training history.
Let’s determine the total number of images in each of our splits:
# derive the paths to the training, validation, and testing # directories trainPath = os.path.sep.join([config.BASE_PATH, config.TRAIN]) valPath = os.path.sep.join([config.BASE_PATH, config.VAL]) testPath = os.path.sep.join([config.BASE_PATH, config.TEST]) # determine the total number of image paths in training, validation, # and testing directories totalTrain = len(list(paths.list_images(trainPath))) totalVal = len(list(paths.list_images(valPath))) totalTest = len(list(paths.list_images(testPath)))
Lines 37-39 define paths to training, validation, and testing directories, respectively.
Then, we determine the total number of images for each split via Lines 43-45 — these values will enable us to calculate the steps per epoch.
Let’s initialize our data augmentation object and establish our mean subtraction value:
# initialize the training data augmentation object trainAug = ImageDataGenerator( rotation_range=30, zoom_range=0.15, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, horizontal_flip=True, fill_mode="nearest") # initialize the validation/testing data augmentation object (which # we'll be adding mean subtraction to) valAug = ImageDataGenerator() # define the ImageNet mean subtraction (in RGB order) and set the # the mean subtraction value for each of the data augmentation # objects mean = np.array([123.68, 116.779, 103.939], dtype="float32") trainAug.mean = mean valAug.mean = mean
The process of data augmentation is important for small datasets. In fact, it is nearly always recommended. Lines 48-55 define our training data augmentation object. The parameters specify random rotations, zooms, translations, shears, and flips to the training data as we train.
Note: A common misconception I see about data augmentation is that the random transforms of the images are then added to the original training data — that’s not the case. The random transformations performed by data augmentation are performed in-place, implying that the dataset size does not increase. These transforms are performed in-place, on the fly, during training.
Although our validation data augmentation object (Line 59) uses the same class, we do not supply any parameters (we don’t apply data augmentation to validation or testing data). The validation ImageDataGenerator will only be used for mean subtraction which is why no parameters are needed.
Next, we set the ImageNet mean subtraction values on Line 64. In this pre-processing technique, we perform a pixel-wise subtraction for all images. Mean subtraction is one of several scaling techniques I explain in the Practitioner Bundle of Deep Learning for Computer Vision with Python. In the text, we’ll even build a custom preprocessor to more efficiently accomplish mean subtraction.
Given the pixel-wise subtraction values, we prepare each of our data augmentation objects for mean subtraction (Lines 65 and 66).
Our data augmentation generators will generate data directly from their respective directories:
# initialize the training generator trainGen = trainAug.flow_from_directory( trainPath, class_mode="categorical", target_size=(224, 224), color_mode="rgb", shuffle=True, batch_size=config.BATCH_SIZE) # initialize the validation generator valGen = valAug.flow_from_directory( valPath, class_mode="categorical", target_size=(224, 224), color_mode="rgb", shuffle=False, batch_size=config.BATCH_SIZE) # initialize the testing generator testGen = valAug.flow_from_directory( testPath, class_mode="categorical", target_size=(224, 224), color_mode="rgb", shuffle=False, batch_size=config.BATCH_SIZE)
Lines 69-93 define generators that will load batches of images from their respective, training, validation, and testing splits.
Using these generators ensures that our machine will not run out of RAM by trying to load all of the data at once.
Let’s go ahead and perform network surgery:
# load the VGG16 network, ensuring the head FC layer sets are left # off baseModel = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3))) # construct the head of the model that will be placed on top of the # the base model headModel = baseModel.output headModel = Flatten(name="flatten")(headModel) headModel = Dense(512, activation="relu")(headModel) headModel = Dropout(0.5)(headModel) headModel = Dense(len(config.CLASSES), activation="softmax")(headModel) # place the head FC model on top of the base model (this will become # the actual model we will train) model = Model(inputs=baseModel.input, outputs=headModel)
First, we’ll load the VGG16 architecture (with pre-trained ImageNet weights) from disk, leaving off the fully connected layers (Lines 97 and 98). By omitting the fully connected layers, we have effectively put the network in a guillotine to behead our network as in Figure 2.
From there, we define a new fully connected layer head (Lines 102-106).
Note: If you are unfamiliar with the contents on Lines 102-106, I recommend that you read my Keras tutorial or CNN tutorial. And if you would like to immerse yourself completely into the world of deep learning, be sure to check out my highly rated deep learning book.
On Line 110 we place the new FC layer head on top of the VGG16 base network. You can think of this as adding sutures to sew the head back on to the network body after surgery.
Take the time to review the above code block carefully as it is where the heart of fine-tuning with Keras begins.
Continuing on with fine-tuning, let’s freeze all of the CONV layers in the body of VGG16:
# loop over all layers in the base model and freeze them so they will # *not* be updated during the first training process for layer in baseModel.layers: layer.trainable = False
Lines 114 and 115 freeze all CONV layers in the VGG16 base model.
Given that the base is now frozen, we’ll go ahead and train our network (only the head weights will be updated):
# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=1e-4, momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])
# train the head of the network for a few epochs (all other layers
# are frozen) -- this will allow the new FC layers to start to become
# initialized with actual "learned" values versus pure random
print("[INFO] training head...")
H = model.fit(
	x=trainGen,
	steps_per_epoch=totalTrain // config.BATCH_SIZE,
	validation_data=valGen,
	validation_steps=totalVal // config.BATCH_SIZE,
	epochs=50)
# reset the testing generator and evaluate the network after
# fine-tuning just the network head
print("[INFO] evaluating after fine-tuning network head...")
testGen.reset()
predIdxs = model.predict(x=testGen,
	steps=(totalTest // config.BATCH_SIZE) + 1)
predIdxs = np.argmax(predIdxs, axis=1)
print(classification_report(testGen.classes, predIdxs,
	target_names=testGen.class_indices.keys()))
plot_training(H, 50, config.WARMUP_PLOT_PATH)
2020-06-03 Update: Per TensorFlow 2.0+, we no-longer use the .fit_generator method; it is replaced with .fit and has the same function signature (i.e., the first argument can be a Python generator object). Similarly, .predict_generator is replaced with .predict.
In this block, we train our model , keeping in mind that no weight updates will occur in the base. Only the head of the network will be tuned at this point.
In this code block, we:
- Compile the model(Lines 120-122). We use"categorical_crossentropy"for ourlossfunction. If you are performing classification with only two classes, be sure to use"binary_crossentropy".
- Train our network while applying data augmentation, only updating the weights for the head of the network (Lines 128-133)
- Reset our testing generator (Line 138).
- Evaluate our network on our testing data (Lines 139-141). We’ll print classification statistics in our terminal via Lines 142 and 143.
- Plot the training history via our plot_trainingfunction (Line 144).
Now let’s proceed to unfreeze the final set of CONV layers in the base model layers:
# reset our data generators
trainGen.reset()
valGen.reset()
# now that the head FC layers have been trained/initialized, lets
# unfreeze the final set of CONV layers and make them trainable
for layer in baseModel.layers[15:]:
	layer.trainable = True
# loop over the layers in the model and show which ones are trainable
# or not
for layer in baseModel.layers:
	print("{}: {}".format(layer, layer.trainable))
We start by resetting our training and validation generators (Lines 147 and 148).
We then unfreeze the final CONV layer block in VGG16 (Lines 152 and 153). Again, only the final CONV block of VGG16 is unfrozen (not the rest of the network).
Just so there is no confusion about what is going on in our network, Lines 157 and 158 will show us which layers are frozen/not frozen (i.e., trainable). The information will print out in our terminal.
Continuing on, let’s fine-tune both the final set of CONV layers and our set of FC layers:
# for the changes to the model to take affect we need to recompile
# the model, this time using SGD with a *very* small learning rate
print("[INFO] re-compiling model...")
opt = SGD(lr=1e-4, momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,
	metrics=["accuracy"])
# train the model again, this time fine-tuning *both* the final set
# of CONV layers along with our set of FC layers
H = model.fit(
	x=trainGen,
	steps_per_epoch=totalTrain // config.BATCH_SIZE,
	validation_data=valGen,
	validation_steps=totalVal // config.BATCH_SIZE,
	epochs=20)
2020-06-03 Update: Per TensorFlow 2.0+, we no-longer use the .fit_generator method; it is replaced with .fit and has the same function signature (i.e., the first argument can be a Python generator object).
Since we’ve unfrozen additional layers, we must re-compile the model (Lines 163-165).
We then train the model again, this time fine-tuning both the FC layer head and the final CONV block (Lines 169-174).
Wrapping up, let’s evaluate the network once more:
# reset the testing generator and then use our trained model to
# make predictions on the data
print("[INFO] evaluating after fine-tuning network...")
testGen.reset()
predIdxs = model.predict(x=testGen,
	steps=(totalTest // config.BATCH_SIZE) + 1)
predIdxs = np.argmax(predIdxs, axis=1)
print(classification_report(testGen.classes, predIdxs,
	target_names=testGen.class_indices.keys()))
plot_training(H, 20, config.UNFROZEN_PLOT_PATH)
# serialize the model to disk
print("[INFO] serializing network...")
model.save(config.MODEL_PATH, save_format="h5")
2020-06-03 Update: Per TensorFlow 2.0+, we no-longer use the .predict_generator; it is replaced with .predict. and has the same function signature (i.e., the first argument can be a Python generator object).
Here we:
- Make predictions on the testing data (Lines 179-182).
- Print a new classification report (Lines 183 and 184).
- Generate the unfrozen training and save it to disk (Line 185).
- And serialize the model to disk, allowing us to recall the model in our predict.pyscript (Line 189).
Great job sticking with me on our fine-tuning journey. We’re going to put our script to work next!
Training a network via fine-tuning with Keras
Now that we’ve implemented our Python script to perform fine-tuning, let’s give it a try and see what happens.
Make sure you’ve used the “Downloads” section of this tutorial to download the source code to this post, and from there, execute the following command:
$ python train.py
Using TensorFlow backend.
Found 9866 images belonging to 11 classes.
Found 3430 images belonging to 11 classes.
Found 3347 images belonging to 11 classes.
[INFO] compiling model...
[INFO] training head...
Epoch 1/50
308/308 [==============================] - 91s 294ms/step - loss: 4.5022 - accuracy: 0.2844 - val_loss: 1.7583 - val_accuracy: 0.4238
Epoch 2/50
308/308 [==============================] - 89s 289ms/step - loss: 1.9905 - accuracy: 0.3551 - val_loss: 1.4969 - val_accuracy: 0.4909
Epoch 3/50 
308/308 [==============================] - 90s 294ms/step - loss: 1.8038 - accuracy: 0.3991 - val_loss: 1.4215 - val_accuracy: 0.5301
...
Epoch 48/50
308/308 [==============================] - 88s 285ms/step - loss: 0.9199 - accuracy: 0.6909 - val_loss: 0.7735 - val_accuracy: 0.7614
Epoch 49/50
308/308 [==============================] - 87s 284ms/step - loss: 0.8974 - accuracy: 0.6977 - val_loss: 0.7749 - val_accuracy: 0.7602
Epoch 50/50
308/308 [==============================] - 88s 287ms/step - loss: 0.8930 - accuracy: 0.7004 - val_loss: 0.7764 - val_accuracy: 0.7646
[INFO] evaluating after fine-tuning network head...
               precision    recall  f1-score   support
        Bread       0.72      0.58      0.64       368
Dairy product       0.75      0.56      0.64       148
      Dessert       0.72      0.68      0.70       500
          Egg       0.66      0.72      0.69       335
   Fried food       0.72      0.70      0.71       287
         Meat       0.75      0.84      0.79       432
      Noodles       0.97      0.95      0.96       147
         Rice       0.85      0.88      0.86        96
      Seafood       0.79      0.81      0.80       303
         Soup       0.89      0.95      0.92       500
    Vegetable       0.86      0.91      0.88       231
     accuracy                           0.78      3347
    macro avg       0.79      0.78      0.78      3347
 weighted avg       0.78      0.78      0.77      3347

After fine-tuning just our newly initialized FC layer head and allowing the FC Layers to warm up, we are obtaining ~76% accuracy which is quite respectable.
Next, we see that we have unfrozen the final block of CONV layers in VGG16 while leaving the rest of the network weights frozen:
<tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f2baa5f6eb8>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c21583198>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c8075ffd0>: False <tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f2c80725630>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c807254a8>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c8072f828>: False <tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f2c807399b0>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c80739828>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804c0b70>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804c52b0>: False <tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f2c804cd0f0>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804cdac8>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804d4dd8>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804d94a8>: False <tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f2c804de3c8>: False <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804e2cf8>: True <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804e2e10>: True <tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f2c804ed828>: True <tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f2c804f2550>: True
Once we’ve unfrozen the final CONV block, we resume fine-tuning:
[INFO] re-compiling model...
Epoch 1/20
308/308 [==============================] - 88s 285ms/step - loss: 0.9120 - accuracy: 0.6978 - val_loss: 0.7333 - val_accuracy: 0.7818
Epoch 2/20
308/308 [==============================] - 88s 285ms/step - loss: 0.8302 - accuracy: 0.7252 - val_loss: 0.6916 - val_accuracy: 0.7941
Epoch 3/20
308/308 [==============================] - 88s 285ms/step - loss: 0.7882 - accuracy: 0.7401 - val_loss: 0.6982 - val_accuracy: 0.7953
Epoch 4/20
308/308 [==============================] - 88s 285ms/step - loss: 0.7418 - accuracy: 0.7557 - val_loss: 0.6813 - val_accuracy: 0.8110
Epoch 5/20
308/308 [==============================] - 88s 285ms/step - loss: 0.6744 - accuracy: 0.7808 - val_loss: 0.6259 - val_accuracy: 0.8195
...
Epoch 16/20
308/308 [==============================] - 89s 289ms/step - loss: 0.4534 - accuracy: 0.8537 - val_loss: 0.5958 - val_accuracy: 0.8516
Epoch 17/20
308/308 [==============================] - 88s 285ms/step - loss: 0.4332 - accuracy: 0.8608 - val_loss: 0.5809 - val_accuracy: 0.8537
Epoch 18/20
308/308 [==============================] - 88s 285ms/step - loss: 0.4188 - accuracy: 0.8646 - val_loss: 0.6200 - val_accuracy: 0.8490
Epoch 19/20
308/308 [==============================] - 88s 285ms/step - loss: 0.4115 - accuracy: 0.8641 - val_loss: 0.5874 - val_accuracy: 0.8493
Epoch 20/20
308/308 [==============================] - 88s 285ms/step - loss: 0.3807 - accuracy: 0.8721 - val_loss: 0.5939 - val_accuracy: 0.8581
[INFO] evaluating after fine-tuning network...
               precision    recall  f1-score   support
        Bread       0.84      0.79      0.81       368
Dairy product       0.85      0.70      0.77       148
      Dessert       0.82      0.82      0.82       500
          Egg       0.84      0.84      0.84       335
   Fried food       0.82      0.86      0.84       287
         Meat       0.87      0.92      0.89       432
      Noodles       1.00      0.97      0.98       147
         Rice       0.91      0.97      0.94        96
      Seafood       0.91      0.90      0.91       303
         Soup       0.96      0.96      0.96       500
    Vegetable       0.92      0.95      0.94       231
     accuracy                           0.88      3347
    macro avg       0.89      0.88      0.88      3347
 weighted avg       0.88      0.88      0.88      3347
[INFO] serializing network...

I decided to not train past epoch 20 for fear of overfitting. If you take a look at Figure 7 you can start to see our training and validation loss start to rapidly divide. When you see training loss falling quickly while validation loss stagnates or even increases, you know you are overfitting.
That said, at the end of our fine-tuning process, we are now obtaining 88% accuracy, a significant increase from just fine-tuning the FC layer heads alone!
Making predictions with fine-tuning and Keras
Now that we’ve fine-tuned our Keras model, let’s see how we can use it to make predictions on images outside the training/testing set (i.e., our own custom images).
Open up predict.py  and insert the following code:
# import the necessary packages
from tensorflow.keras.models import load_model
from pyimagesearch import config
import numpy as np
import argparse
import imutils
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str, required=True,
	help="path to our input image")
args = vars(ap.parse_args())
Lines 2-7 import our required packages. We’re going to use load_model  to recall our Keras fine-tuned model from disk and make predictions. This is also the first time today that we will use OpenCV (cv2 ).
On Lines 10-13 we parse our command line argument. The --image  argument allows us to supply any image from our terminal at runtime with no modifications to the code. It makes sense to take advantage of a command line argument rather than hard-coding the value here or in our config.
Let’s go ahead and load that image from disk and preprocess it:
# load the input image and then clone it so we can draw on it later
image = cv2.imread(args["image"])
output = image.copy()
output = imutils.resize(output, width=400)
# our model was trained on RGB ordered images but OpenCV represents
# images in BGR order, so swap the channels, and then resize to
# 224x224 (the input dimensions for VGG16)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
# convert the image to a floating point data type and perform mean
# subtraction
image = image.astype("float32")
mean = np.array([123.68, 116.779, 103.939][::-1], dtype="float32")
image -= mean
Lines 16-30 load and preprocess our image . The preprocessing steps are identical to training and include:
- Making a copyof the image and resizing it foroutputpurposes (Lines 17 and 18).
- Swapping color channels since we trained with RGB images and OpenCV loaded this imagein BGR order (Line 23).
- Resizing the imageto 224×224 pixels for inference (Line 24).
- Converting the imageto floating point (Line 28).
- Performing mean subtraction (Lines 29 and 30).
Note: When we perform inference using a custom prediction script, if the results are unsatisfactory nine times out of ten it is due to improper preprocessing. Typically having color channels in the wrong order or forgetting to perform mean subtraction altogether will lead to unfavorable results. Keep this in mind when writing your own scripts.
Now that our image is ready, let’s predict its class label:
# load the trained model from disk
print("[INFO] loading model...")
model = load_model(config.MODEL_PATH)
# pass the image through the network to obtain our predictions
preds = model.predict(np.expand_dims(image, axis=0))[0]
i = np.argmax(preds)
label = config.CLASSES[i]
# draw the prediction on the output image
text = "{}: {:.2f}%".format(label, preds[i] * 100)
cv2.putText(output, text, (3, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
	(0, 255, 0), 2)
# show the output image
cv2.imshow("Output", output)
cv2.waitKey(0)
We load our fine-tuned model  via Line 34 and then perform inference. The top prediction class label  is extracted on Lines 37-39.
Finally, we annotate the output  image and display it on screen (Lines 42-48). The text  annotation contains the highest prediction along with its associated confidence.
On to the fun part — testing our script on food! I’m hungry just thinking about it and I bet you may be too.
Keras fine-tuning results
To see our fine-tuned Keras model in action, make sure you use the “Downloads” section of this tutorial to download the source code and example images.
From there, open up a terminal and execute the following command:
$ python predict.py --image dataset/evaluation/Seafood/8_186.jpg

As you can see from Figure 7, we have correctly classified the input image as “Seafood”.
Let’s try another example:
$ python predict.py --image dataset/evaluation/Meat/5_293.jpg

Our fine-tuned network has labeled the image as “Fried food” despite it being in the “Meat” class in our dataset.
Chicken wings are typically fried and these ones clearly are. They are both “Meat” and “Fried food” which is why we are pulled in two directions. Therefore, I’m still declaring it as a “correct” classification. A fun experiment would be to apply fine-tuning with multi-label classification. I’ll leave that as an exercise to you to implement.
Below I have included a few additional results from my fine-tuning experiments:

What's next? We recommend PyImageSearch University.
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: October 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial, you learned how to perform fine-tuning with Keras and deep learning.
To perform fine-tuning, we:
- Loaded the VGG16 network architecture from disk with weights pre-trained on ImageNet.
- Ensured the original fully connected layer heads were removed (i.e., where the output predictions from the network are made).
- Replaced the originally fully connected layers with brand new, freshly initialized ones.
- Froze all CONV layers in VGG16.
- Trained only the fully connected layer heads.
- Unfroze the final set of CONV layer blocks in VGG16.
- Continued training.
Overall, we were able to obtain 88% accuracy on the Food-11 dataset.
Further accuracy can be obtained by applying additional data augmentation and adjusting the parameters to our optimizer and number of FC layer nodes.
If you’re interested in learning more about fine-tuning with Keras, including my tips, suggestions, and best practices, be sure to take a look at Deep Learning for Computer Vision with Python where I cover fine-tuning in more detail.
I hope you enjoyed today’s tutorial on fine-tuning!
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
 
        
 
    
Great post, Adrian,
I have read and re-read your sections on transfer learning in the DL4CV Practitioner Bundle. This blog post is a great introduction to a powerful technique.
It would be interesting to see an example of fine-tuning that used a network architecture other than VGGNET. In particular, there is at least some evidence (and if you take Stack Overflow questions as “evidence” a lot more than “some evidence”) that Batch Normalization Layers can cause subtle problems with fine tuning. I’d also love to see examples of fine tuning on networks that don’t use a fully connected head, such as SqueezeNet.
Another thing is that the PyTorch/fastai world has a different approach on fine tuning. They will unfreeze the whole network but have it learn at a very very low rate. If you were training the new head at a learn rate of R, they would train the top third of the network at 0.1R and the rest of the network at 0.01R. Back in the day the Caffe people used a similar approach. It seems to work for them and at least somewhat mitigates this Batch Normalization problem.
Great comment, thanks David.
Rest assured — this is not the final post you’ll see on fine-tuning and transfer learning. We’ll be diving into some of those concepts in future posts 🙂
Very very great and amazing tutorial in fine tuning published on net ever. thanks a lot.
Thanks Farshad, I’m glad you enjoyed it!
Great Article Dr. Adrian, Thank you very much indeed for your effort.
Its explained in details in your deep learning practitioner book chapter 2-5 I have tried on animals and flowers dataset, it boosts the accuracy to new levels.
I’m still not familiar with the surgery of placing the head FC model on top of the base model -this will become the actual model we will train. then freezing and unfreezing the layers is not easy tasks.
I hope i can practice more samples and things should be more clear to me.
Best Regards
I would suggest doing:
You’ll see each of the layers in each of the respective networks and how they are combined into the final
model.Does this method allow the newly trained model to predict the original classifications (1000 original classes) as well as the new ones (food categories) or just the new food categories? I believe the other method of transfer learning in your last two tutorials was only able to classify the new categories.
No, just the food categories. The entire point of transfer learning (both feature extraction and fine-tuning) is to allow your model to predict classes it was never trained on, but utilizing the weights from a pre-trained model (such as the ImageNet dataset).
for info: I was able to download the file when using ipv4
wget –prefer-family=ipv4 –ftp-user FoodImage@grebvm2.epfl.ch –ftp-password Cahc1moo ftp://tremplin.epfl.ch/Food-11.zip
Thanks for sharing, Lode!
Dear Adrian ,
Thanks for your great post. I read the three posts. In the third post , you teach the “flow_from_directory” function . I think the function could also be applied in the feature_extract in the second post by just removing the “ImageDataGenerator” and doing some small modification . Am I right ?
Great post. Question on this …
“Note: A common misconception I see about data augmentation is that the random transforms of the images are then added to the original training data — that’s not the case. The random transformations performed by data augmentation are performed in-place, implying that the dataset size does not increase. These transforms are performed in-place, on the fly, during training.”
Is that ALWAYS the case. I have been working on some projects and we had a tool that generated more images for us doing all of this stuff, but it seemed to be they were persisted. Would that make it so it could train faster.
How does above work … just keeps the image in memory I guess and applies it … or makes copies of it … loads it up, so you have virtual 10 copies or something like that.
I could see a benefit for storing those images .. but I’m a newbie .. is this a best practice to not persist the stuff
Great question, Paul. I’ve actually decided to write a dedicated blog post on image augmentation as it seems like that question is a common misconception (perhaps more misunderstood than I originally thought). That tutorial will be publishing in a couple of weeks.
Hey Adrian,
As always, a great post. I just want to clarify certain fundamental doubts:
1. How is algorithms such as SSD, YOLO or Mask RCNN different from pre-trained NN architectures such as VGG16/VGG19, Inception? What is the fundamental difference?
2. In the learning curve, we saw that the train_loss and val_loss curves crosses with each other at epoch 3-4, hence we should be using EarlyStopping to prevent overfitting. My question is, what is the minimum value of patience (epoch count) should we use?
I don’t know but I feel the learning curve looks not so like a perfectly fit model, yet surprised to see that you got 87% mAP. I am hoping to use InceptionV3 since it has relatively less number of parameters as compared to VGG16. Will this prevent the overfitting at an early stage than what we witnessed in your example?
Any good alternative suggestion(s) is appreciated.
1. SSDs, YOLO, and Mask R-CNN utilize a “backbone” network such as VGG, Inception, or ResNet. The backbone can be trained from scratch in conjunction with the SSD, etc. but is more typically pre-trained on ImageNet.
2. I don’t typically use the EarlyStopping class. I prefer to let my model train and then examine the validation/loss curves as its training. This gives me MUCH more control over the training process.
I go into more detail regarding both of these points inside Deep Learning for Computer Vision with Python so definitely consider going through the text.
Hi Mr., Thank you for the helpful post. I would like to ask you some questions if you allow me..
– Can I use the source code of this post with my face dataset or the pretrained model specialized for food? If I can, what is the pretrained model for face dataset, can I use exactly the same code with only dataset change?
– I have a limited training data dataset and as I know transfer learning by finetuning used to address limited training data that I trying to do, but the food-11 dataset is rich, so what is the benefit of using finetuning for training it when its data is dequate for training.
1. For face recognition we use an entirely different NN architecture. You could fine-tune a FaceNet, for example.
2. Sorry, I don’t understand your question. Perhaps you can elaborate.
Hi Adrian,
Great post! You are saying multiple times “Fine tuning is used to learn new object classes which the network having trained on”.
Let me know if you disagree but for me it is not always the case. It would be only for the first time you used already trained network on ImageNet for example. But what about the next time? If you add more training data on your food/not food set, you would use again fine tuning but not from the ImageNet trained network but rather the network from your previous fine tuning on food/ not food data set.
You could do that but you would want to ensure you are also including the previous classes/images, otherwise you are performing online/incremental learning.
Yes I am always reusing the previous classes/images plus the new ones. So in that case this also fine tuning and it’s not learning new objects classes.
I fine-tuned VGG-16 to classify dog breeds using the Stanford Dog dataset.
I removed the FC layers and added a Dense(4096) and Dense(120, softmax). First, for a few epochs, I froze the layers of base vgg-16 to warm up FC layers.
After that, I unfroze the last block of Conv layers and trained the model.But still,I am not able to achieve a decent acc/val_acc. Is there anything that I can fix? or should I consider fine-tuning other models like resnet etc?
I would suggest you refer to Deep Learning for Computer Vision with Python which includes my best practices, tips, and suggestions when training and fine-tuning networks.
Hi Adrian,
I have a small doubt, it might sound silly but I wanted to know the difference between .h5 extension (used to store keras model) and .model extension that you are using, is there any?
Is the model file in hdf5 format? I am trying to convert the model into tensorflow .pb file, I’ll be grateful if you could guide me on this. And the Keras doc does not recommend the use of pickle or cpickle but you are using it, why so?
There isn’t any, it’s just a model serialized in HDF5 format.
Hi, great blog!
I would like to ask you about that “mean” extraction part. Why we need to extract it?
I think you meant to say “mean subtraction”. The values 122.68, 116.778, and 103.939 are the average RGB pixel intensties of the ImageNet dataset (what the CNN used here was originally trained on).
Hi!
What version of python is used in this link?
I used Python 3 but it’s also compatible with Python 2.7.
Hi Pyimage ! how can make my own costum images training with Faster RCNN ? i do classification of make model car , i did the labelling and annotation , jpg and xml files
Hey there, I cover how to train your own custom Faster R-CNN models inside Deep Learning for Computer Vision with Python. I suggest you start there.
Hi, is it possible to use the pre-training model for medical image classification?
Yes, absolutely. Deep Learning for Computer Vision with Python will show you how to do exactly that.
I’m trying to do a finetuning from your ” Deep Learning for Computer Vision with Python” on Jetson Nano, but it runs out of memory… When I try to convert the code to use a HDF5 format, it works, but acc is lowering after 3 epochs in network’s head-warm-up… I don’t think its a overfitting problem… Probably the code is wrong. Could you please make a tutorial like that?
Hey Niko — you don’t want to actually train the model on the Nano if you can avoid it. Instead, train the model on a laptop, desktop, or GPU machine. Then prepare the model for inference on the Nano. My book, Raspberry Pi for Computer Vision, covers how to do that in detail.
Hi Adrian!! I loved your fine tuning series. It’s an amazing learning material. I was wondering if fine tuning could also be applied to object identification, and if you are planning on making (or have already made) a tutorial about it.
Best regards from Argentina!
Sorry, I meant object *detection*
Yes, absolutely. My book, Deep Learning for Computer Vision with Python covers how to both (1) train from scratch and (2) fine-tune a model for object detection.
Hi Adrian, Great Explanation!!!!
I do have some question, Let’s Say I have trained my deep learning model initially with 5 classes now I want to add another class without training the whole model over again for those 5 classes. How can I do that?
You can’t. You would need to fine-tune the model on the original 5 classes plus the 1 brand new one.
The only solution would be to apply incremental learning via feature extraction.
Hi, Adrian. What if the head network is change to ResNet50? Will this method or approaches acceptable in Deep Learning?
Hi Adrian,
What are the benefits/drawbacks of using fine tuning versus transfer learning via feature extraction? How do you pick which to use? For example, I am training a classifier to identify two different objects, which technique would be best?
Thanks!
Take a look at Deep Learning for Computer Vision with Python which discusses when to use transfer learning versus fine-tuning.
Hi Adrian, thanks for explanation,
i would like ask question about mean, in the lines 49-56 there was no preprocess function and you didn’t subtract the mean in training.
However at the prediction script, there was a mean substraction. Isn’t substraction also need to be in the training with preprocess function?
Mean subtraction is performed. See Lines 65-67 of the training script.