Today’s blog post is inspired by PyImageSearch reader Ezekiel, who emailed me last week and asked:
Hey Adrian,
I went through your previous blog post on deep learning object detection along
with the followup tutorial for real-time deep learning object detection. Thanks for those.I’ve been using your source code in my example projects but I’m having two issues:
- How do I filter/ignore classes that I am uninterested in?
- How can I add new classes to my object detector? Is that even possible?
I would really appreciate it if you could cover this in a blog post.
Thanks.
Ezekiel isn’t the only reader with those questions. In fact, if you go through the comments section of my two most recent posts on deep learning object detection (linked above), you’ll find that one of the most common questions is typically (paraphrased):
How do I modify your source code to include my own object classes?
Since this appears to be such a common question, and ultimately a misunderstanding on how neural networks/deep learning object detectors actually work, I decided to revisit the topic of deep learning object detection in today’s blog post.
Specifically, in this post you will learn:
- The differences between image classification and object detection
- The components of a deep learning object detector including the differences between an object detection framework and the base model itself
- How to perform deep learning object detection with a pre-trained model
- How you can filter and ignore predicted classes from a deep learning model
- Common misconceptions and misunderstandings when adding or removing classes from a deep neural network
To learn more about deep learning object detections, and perhaps even debunk a few misconceptions or misunderstandings you may have with deep learning-based object detection, just keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionA gentle guide to deep learning object detection
Today’s blog post is meant to be a gentle introduction to deep learning-based object detection.
I’ve done my best to provide a review of the components of deep learning object detectors, including OpenCV + Python source code to perform deep learning using a pre-trained object detector.
Use this guide to help you get started with deep learning object detection, but also realize that the object detection is highly nuanced and detailed — I could not possibly include every detail of deep learning object detection in a single blog post.
That said, we’ll start today’s blog post by discussing the fundamental differences between image classification and object detection, including if a network trained for image classification can be used for object detection (and under what circumstances).
Once we understand what object detection is, we’ll review the core components of a deep learning object detector, including the object detection framework along with the base model, two key components that readers new to object detection tend to misunderstand.
From there, we’ll implement real-time deep learning object detection using OpenCV.
I’ll also demonstrate how you can ignore and filter object classes you are not interested in without having to modify the network architecture or retrain the model.
Finally, we’ll wrap up today’s blog post by discussing how you can add or remove classes from a deep learning object detector, including my recommended resources to help you get started.
Let’s go ahead and dive in to deep learning object detection!
The difference between image classification and object detection
When performing standard image classification, given an input image, we present it to our neural network, and we obtain a single class label and perhaps a probability associated with the class label as well.
This class label is meant to characterize the contents of the entire image, or at least the most dominant, visible contents of the image.
For example, given the input image in Figure 1 above (left) our CNN has labeled the image as “beagle”.
We can thus think of image classification as:
- One image in
- And one class label out
Object detection, regardless of whether performed via deep learning or other computer vision techniques, builds on image classification and seeks to localize exactly where in the image each object appears.
When performing object detection, given an input image, we wish to obtain:
- A list of bounding boxes, or the (x, y)-coordinates for each object in an image
- The class label associated with each bounding box
- The probability/confidence score associated with each bounding box and class label
Figure 1 (right) demonstrates an example of performing deep learning object detection. Notice how both the person and the dog are localized with their bounding boxes and class labels predicted.
Therefore, object detection allows us to:
- Present one image to the network
- And obtain multiple bounding boxes and class labels out
Can a deep learning image classifier be used for object detection?
Okay, so at this point you understand the fundamental difference between image classification and object detection:
- When performing image classification, we present one input image to the network and obtain one class label out.
- But when performing object detection, we can present one input image and obtain multiple bounding boxes and class labels out.
That motivates the question:
Can we take a network already trained for classification and use it for object detection instead?
The answer is a bit tricky as it’s technically “Yes”, but for reasons not so obvious.
The solutions involve:
- Applying standard, computer-vision based object detection methods (i.e., non-deep learning methods) such as sliding windows and image pyramids — this method is typically used in your HOG + Linear SVM-based object detectors.
- Taking the pre-trained network and using it as a base network in a deep learning object detection framework (i.e., Faster R-CNN, SSD, YOLO).
Method #1: The traditional object detection pipeline
The first method is not a pure end-to-end deep learning object detector.
We instead utilize:
- Fixed size sliding windows, which slide from left-to-right and top-to-bottom to localize objects at different locations
- An image pyramid to detect objects at varying scales
- Classification via a pre-trained (classification) Convolutional Neural Network
At each stop of the sliding window + image pyramid, we extract the ROI, feed it into a CNN, and obtain the output classification for the ROI.
If the classification probability of label L is higher than some threshold T, we mark the bounding box of the ROI as the label (L). Repeating this process for every stop of the sliding window and image pyramid, we obtain the output object detectors. Finally, we apply non-maxima suppression to the bounding boxes yielding our final output detections:
This method can work in some specific use cases, but in general it’s slow, tedious, and a bit error-prone.
However, it’s worth learning how to apply this method as it can turn an arbitrary image classification network into an object detector, avoiding the need to explicitly train an end-to-end deep learning object detector. This method could save you a ton of time and effort depending on your use case.
If you’re interested in this object detection method and want to learn more about the sliding window + image pyramid + image classification approach to object detection, please refer to my book, Deep Learning for Computer Vision with Python.
Method #2: Base network of an object detection framework
The second method to deep learning object detection allows you to treat your pre-trained classification network as a base network in a deep learning object detection framework (such as Faster R-CNN, SSD, or YOLO).
The benefit here is that you can create a complete end-to-end deep learning-based object detector.
The downside is that it requires a bit of intimate knowledge on how deep learning object detectors work — we’ll discuss this more in the following section.
The components of a deep learning object detector
There are many components, sub-components, and sub-sub-components of a deep learning object detector, but the two we are going to focus on today are the two that most readers new to deep learning object detection often confuse:
- The object detection framework (ex. Faster R-CNN, SSD, YOLO).
- The base network which fits into the object detection framework.
The base network you are likely already familiar with (you just haven’t heard it referenced as a “base network” before).
Base networks are your common (classification) CNN architectures, including:
- VGGNet
- ResNet
- MobileNet
- DenseNet
Typically these networks are pre-trained to perform classification on a large image dataset, such as ImageNet, to learn a rich set of discerning, discriminating filters.
Object detection frameworks consist of many components and sub-components.
For example, the Faster R-CNN framework includes:
- The Region Proposal Network (RPN)
- A set of anchors
- The Region of Interest (ROI) pooling module
- The final Region-based Convolutional Neural Network
When using Single Shot Detectors (SSDs) you have components and sub-components such as:
- MultiBox
- Priors
- Fixed priors
Keep in mind that the base network is just one of the many components that fit into the overall deep learning object detection framework — Figure 4 at the top of this section depicts the VGG16 base network inside the SSD framework.
Typically, “network surgery” is performed on the base network. This modification:
- Forms it to be fully-convolutional (i.e., accept arbitrary input dimensions).
- Eliminates CONV/POOL layers deeper in the base network architecture and replaces them with a series of new layers (SSD), new modules (Faster R-CNN), or some combination of the two.
The term “network surgery” is a colloquial way of saying we remove some of the original layers of the base network architecture and supplant them with new layers.
You’ve likely seen low budget horror movies where the killer, likely carrying an ax or large knife, attacks their victim and unceremoniously hacks at them.
Network surgery is more precise and exacting than the typical B horror film killer.
Network surgery is also very tactical — we remove parts of the network we do not need and replace it with a new set of components.
Then, when we go to train our framework to perform object detection, both the weights of the (1) new layers/modules and (2) base network are modified.
Again, a complete review of how various deep learning object detection frameworks work (including the role the base network plays) is outside the scope of this blog post.
If you’re interested in complete review of deep learning object detection, including theory and implementation, please refer to my book, Deep Learning for Computer Vision with Python.
How do I measure the accuracy of a deep learning object detector?
When evaluating object detector performance we use an evaluation metric called mean Average Precision (mAP) which is based on the Intersection over Union (IoU) across all classes in our dataset.
Intersection over Union (IoU)
You’ll typically find IoU and mAP used to evaluate the performance of HOG + Linear SVM detectors, Haar cascades, and deep learning-based methods; however, keep in mind that the actual algorithm used to generate the predicted bounding boxes does not matter.
Any algorithm that provides predicted bounding boxes (and optionally class labels) as output can be evaluated using IoU. More formally, in order to apply IoU to evaluate an arbitrary object detector, we need:
- The ground-truth bounding boxes (i.e., the hand-labeled bounding boxes from our testing set that specify where an image our object is).
- The predicted bounding boxes from our model.
- If you want to compute recall along with precision, you’ll also need the ground-truth class labels and predicted class labels.
In Figure 5 (left) I have included a visual example of a ground-truth bounding box (green) versus a predicted bounding box (red). Computing IoU can be determined by the equation illustration in Figure 5 (right).
Examining this equation you can see that IoU is simply a ratio.
In the numerator, we compute the area of overlap between the predicted bounding box and the ground-truth bounding box.
The denominator is the area of the union, or more simply, the area encompassed by both the predicted bounding box and the ground-truth bounding box.
Dividing the area of overlap by the area of union yields a final score — the Intersection over Union.
mean Average Precision (mAP)
Note: I decided to edit this section from its original form. I wanted to keep the discussion of mAP higher level and avoid some of the more confusing recall calculations but as a couple commenters pointed out this section wasn’t technically correct. Because of that I decided to update the post.
Since this is a gentle introduction to deep learning-based object detection I’m going to keep the explanation of mAP on the simplified side just so you understand the fundamentals.
Readers and practitioners new to object detection can be confused by the mAP calculation. This is partially due to the fact that mAP is a more complicated evaluation metric. It’s also the definition of calculation of mAP can even vary from one object detection challenge to another (when I say “object detection challenge” I’m referring to competitions such as COCO, PASCAL VOC, etc.).
Computing the Average Precision (AP) for a particular object detection pipeline is essentially a three step process:
- Compute the precision which is the proportion of true positives.
- Compute the recall which is the proportion of true positives out of all possible positives.
- Average together the maximum precision value across all recall levels in steps of size s.
To compute the precision we first apply our object detection algorithm to an input image. The bounding box scores are then sorted in descending order by their confidence.
We know from a priori knowledge (i.e., it’s a validation/testing example and we therefore know the total number of objects in the image) there are 4 objects in this image. We seek to determine how many “correct” detections our network made. A “correct” prediction here is one where we have a minimum IoU of 0.5 (this value is tunable depending on the challenge but 0.5 is a standard value).
Here is where the calculation starts to become a bit more complicated. We need to compute the precision at different recall values (also called “recall levels” or “recall steps”) .
For example, let’s pretend we are computing the precision and recall values for the top-3 predictions. Out of the top-3 predictions from our deep learning object detector, we made 2 correct. Our precision is then the proportion of true positives: 2/3 = 0.667. Our recall is the proportion of the true positives out of all the possible positives in the image: 2 / 4 = 0.5. We repeat this process for (typically) the top-1 to top-10 predictions. This process yields a list of precision values.
The next step is to compute the average for all your top-N values, hence the term Average Precision (AP). We loop over all recall values r, find the maximum precision p that we can obtain with our recall > r and then compute the average. We now have our average precision for a single evaluation image.
Once we have computed the average precision for all images in our testing/validation set we perform two more calculations:
- Compute the mean of the APs for each class, giving us a mAP for each individual class (for many datasets/challenges you’ll want to examine the mAP class-wise so you can spot if your deep learning object detector is struggling with a specific class)
- Take the mAPs for each individual class and then average them together, yielding the final mAP for the dataset
Again, mAP is more complicated than traditional accuracy so don’t be frustrated if you don’t understand it on the first pass. This is an evaluation metric you’ll want to study multiple times before you fully understand it. The good news is that deep learning object detection implementations handle computing mAP for you.
Deep learning-based object detection with OpenCV
We’ve discussed deep learning and object detection on this blog in previous posts; however, let’s review actual source code in this post as a matter of completeness.
Our example includes the Single Shot Detector (framework) with a MobileNet base model. The model was trained by GitHub user chuanqi305 on the Common Objects in Context (COCO) dataset.
For additional detail, check out my previous post where I introduced chuanqi305’s model with pertinent background information.
Let’s loop back to Ezekiel’s first question from the top of this post:
- How do I filter/ignore classes that I am uninterested in?
I’m going to answer that very question in the following example script.
But first you need to prepare your system:
- You need a minimum of OpenCV 3.3 installed in your Python virtual environment (provided you are using Python virtual environments). OpenCV 3.3+ includes the DNN module required to run the following code. Be sure to use one of the OpenCV installation tutorials on the following page while paying extra attention to which version of OpenCV you download + install.
- You should also install my imutils package. To install/update imutils in your Python virtual environment, simply use pip:
pip install --upgrade imutils
.
When you’re ready, go ahead and create a new file named filter_object_detection.py
and let’s begin:
# import the necessary packages from imutils.video import VideoStream from imutils.video import FPS import numpy as np import argparse import imutils import time import cv2
On Lines 2-8 we import our required packages and modules, notably imutils
and OpenCV. We will be using my VideoStream
class to handle capturing frames from a webcam.
We’re armed with the necessary tools, so let’s continue by parsing command line arguments:
# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") args = vars(ap.parse_args())
Our script requires two command line arguments at runtime:
--prototxt
: The path to the Caffe prototxt file which defines the model definition.--model
: Our CNN model weights file path.
Optionally you may specify --confidence
, a threshold to filter weak detections.
Our model can predict 21 object classes:
# initialize the list of class labels MobileNet SSD was trained to # detect, then generate a set of bounding box colors for each class CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
The CLASSES
list contains all class labels the network was trained on (i.e. COCO labels).
A common misconception of the CLASSES
list is that you can:
- Add a new class label to the list
- Or remove a class label from the list
…and have the network automatically “know” what you are trying to accomplish.
That is not the case.
You cannot simply modify a list of text labels and have the network automatically modify itself to learn, add, or remove patterns on data it was never trained on. That is not how neural networks work.
That said, there is a quick hack you can use to filter and ignore predictions you are uninterested in.
The solution is to:
- Define a set of
IGNORE
labels (i.e., the list of class labels the network was trained on that you want to filter and ignore). - Make a prediction on an input image/video frame.
- Ignore any predictions where the class label exists in the
IGNORE
set.
Implemented in Python, the IGNORE
set looks like this:
IGNORE = set(["person"])
Here we’ll be ignoring all predicted objects with class label "person"
(the if
statement used for filtering will be covered later in this code review).
You can easily add additional elements (class labels from the CLASSES
list) to ignore to the set.
Next, we’ll generate random label/box colors, load our model, and start the video stream:
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3)) # load our serialized model from disk print("[INFO] loading model...") net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"]) # initialize the video stream, allow the cammera sensor to warmup, # and initialize the FPS counter print("[INFO] starting video stream...") vs = VideoStream(src=0).start() time.sleep(2.0) fps = FPS().start()
On Line 27 a random array of COLORS
is generated to correspond to each of the 21 CLASSES
. We’ll use these colors later for display purposes.
Our Caffe model is loaded on Line 31 using the cv2.dnn.readNetFromCaffe
function and both of our required command line arguments passed as parameters.
Then we instantiate the VideoStream
object as vs
, and start our fps
counter (Lines 36-38). The 2-second sleep
allows our camera plenty of time to warm up.
At this point we’re ready to loop over the incoming frames from the camera and send them through our CNN object detector:
# loop over the frames from the video stream while True: # grab the frame from the threaded video stream and resize it # to have a maximum width of 400 pixels frame = vs.read() frame = imutils.resize(frame, width=400) # grab the frame dimensions and convert it to a blob (h, w) = frame.shape[:2] blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5) # pass the blob through the network and obtain the detections and # predictions net.setInput(blob) detections = net.forward()
On Line 44 we grab a frame
and then resize
while preserving aspect ratio for display (Line 45).
From there, we extract the height and width as we’ll need these values later (Line 48).
Lines 48 and 49 generate a blob
from our frame. To learn more about a blob
and how it’s constructed using the cv2.dnn.blobFromImage
function, refer to this previous post for all the details.
Next, we, send that blob
through our neural net
to detect objects (Lines 54 and 55).
Let’s loop over the detections:
# loop over the detections for i in np.arange(0, detections.shape[2]): # extract the confidence (i.e., probability) associated with # the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by ensuring the `confidence` is # greater than the minimum confidence if confidence > args["confidence"]: # extract the index of the class label from the # `detections` idx = int(detections[0, 0, i, 1]) # if the predicted class label is in the set of classes # we want to ignore then skip the detection if CLASSES[idx] in IGNORE: continue
On Line 58 we begin our detections
loop.
For each detection, we extract the confidence
(Line 61) followed by comparing it to our confidence threshold (Line 65).
In the case that our confidence
surpasses the minimum (the default of 0.2 can be changed via the optional command line argument), we’ll consider the detection a positive, valid detection and continue processing it.
First, we extract the index of the class label from detections
(Line 68).
Then, going back to Ezekiel’s first question, we can ignore classes in the IGNORE
set on Lines 72 and 73. If the class is to be ignored, we simply continue
back to the top of the detections loop (and we don’t display labels or boxes for this class). This fulfills our “quick hack” solution.
Otherwise, we’ve detected an object in the whitelist and we need to display the class label and rectangle on the frame:
# compute the (x, y)-coordinates of the bounding box for # the object box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") # draw the prediction on the frame label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100) cv2.rectangle(frame, (startX, startY), (endX, endY), COLORS[idx], 2) y = startY - 15 if startY - 15 > 15 else startY + 15 cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
In this code block, we are extracting bounding box coordinates (Lines 77 and 78) followed by drawing a label and rectangle on the frame (Lines 81-87).
The color of the label + rectangle will be the same for each unique class; objects of the same class will have the same color (i.e. all "boats"
in the video would have the same color label and box).
Finally, still in our while
loop, we’ll display our hard work on our screen:
# show the output frame cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter fps.update() # stop the timer and display FPS information fps.stop() print("[INFO] elapsed time: {:.2f}".format(fps.elapsed())) print("[INFO] approx. FPS: {:.2f}".format(fps.fps())) # do a bit of cleanup cv2.destroyAllWindows() vs.stop()
We display the frame
and capture keypresses on Lines 90 and 91.
If the "q"
key is pressed, we quit by breaking out of the loop (Lines 94 and 95).
Otherwise, we proceed to update our fps
counter (Line 98) and continue grabbing and processing frames.
On the remaining lines, when the loop breaks, we display time + frames per second metrics and cleanup.
Running your deep learning object detector
In order to run today’s script, you’ll need to grab the files by scrolling to the “Downloads” section below.
Once you’ve extracted the files, open a terminal and navigate to downloaded code + model. From there, execute the following command:
$ python filter_object_detection.py --prototxt MobileNetSSD_deploy.prototxt.txt \ --model MobileNetSSD_deploy.caffemodel [INFO] loading model... [INFO] starting video stream... [INFO] elapsed time: 24.05 [INFO] approx. FPS: 13.18
In the GIF above you can see on the left that the “person” class is detected — this is due to me having an empty IGNORE
. On the right you can see that I am not detected — this behavior is due to be adding the “person” class to the IGNORE
set.
While our deep learning object detector is still technically detecting the “person” class, our post-processing code is able to filter it out.
Perhaps you encountered an error running the deep learning object detector?
Troubleshooting step one would be to verify that you have a webcam hooked up. If that’s not the problem, maybe you saw the following error message in your terminal:
$ python filter_object_detection.py usage: filter_object_detection.py [-h] -p PROTOTXT -m MODEL [-c CONFIDENCE] filter_object_detection.py: error: the following arguments are required: -p/--prototxt, -m/--model
If you see this message, then you didn’t pass “command line arguments” to the program. This is a common problem PyImageSearch readers have if they aren’t familiar with Python, argparse, and command line arguments. Check out the link if you are having trouble.
Here is the full version of the video with commentary:
How can I add or remove classes to my deep learning object detector?
As I mentioned earlier in this guide, you cannot simply add or remove class labels from the CLASSES
list — the underlying network itself has not changed.
All you have done, at best, is modify a text file that lists out the class labels.
Instead, if you want to explicitly add or remove classes from a neural network you will either need to either:
- Train from scratch
- Perform fine-tuning
Training from scratch tends to be a time consuming, expensive operation so we try to avoid it when we can — but in some cases this is completely unavoidable.
The other option is to perform fine-tuning.
Fine-tuning is a form of transfer learning and is the process of:
- Removing the fully-connected layer responsible for classification/labeling
- Replacing it with a brand new, freshly and randomly initialized fully-connected layer
We may optionally modify other layers in the network as well (including freezing the weights of some layers and unfreezing them during the training process).
Exactly how to train your own custom deep learning object detector (including both fine-tuning and training from scratch) are advanced topics outside the scope of this blog post, but see the section below to help you get started.
What's next? We recommend PyImageSearch University.
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: January 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In today’s blog post you were gently introduced to some of the intricacies involved in deep learning object detection. We started by reviewing the fundamental differences between image classification and object detection, including how we can use a network trained for image classification for object detection.
We then reviewed the core components of a deep learning object detector:
- The framework
- The base model
The base model is typically a pre-trained (classification) network, normally trained on a large image dataset such as ImageNet to learn a robust set of discerning filters.
We can also train the base network from scratch but this usually takes a significantly longer amount of time for the object detector to reach reasonable accuracy.
You should, in most situations, start with a pre-trained base model instead of trying to train from scratch.
Once we acquired a solid understanding of deep learning object detectors, we implemented an object detector capable of running in real-time in OpenCV.
I also demonstrated how you can filter and ignore class labels that you are uninterested in.
Finally, we learned that actually adding or removing a class to a deep learning object detector is not as simple as adding/removing a label from the hardcoded class labels list.
The neural network itself doesn’t care if you modify a list of class labels — instead, you would need to either:
- Modify the network architecture itself by removing the fully-connected class prediction layer and fine-tuning
- Or train the object detection framework from scratch
For more deep learning object detection projects you will start with a deep learning object detector pre-trained on an object detection task, such as COCO. You then perform fine-tuning on the model to obtain your own detector.
Training an end-to-end custom deep learning object detector is outside the scope of this blog post, so if you’re interested in discovering how to train your own deep learning object detectors, please refer to my book, Deep Learning for Computer Vision with Python.
Inside the book, I have included a number of deep learning object detection examples, including training your own object detectors to:
- Detect traffic signs, such as stop signs, pedestrian crossing signs, etc.
- Along with the front and rear views of vehicles
To learn more about my deep learning book, just click here!
If you enjoyed today’s blog post, be sure to enter your email address in the form below to be notified when future tutorials are published here on PyImageSearch!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Anirban
Really good blog post and with the youtube video , it is even better . I am really happy that I purchased your deep learning for CV with python , in a few months I have learnt so much about DL for CV that I now feel confident that I can apply for a DL in CV post.
Disclaimer : I am a banker by profession. Have not coded in last ten years and this my honest review.
Adrian Rosebrock
Thanks so much for the kind words, Anirban! 😀 I’m so incredibly happy for you and your transition from bank to CV and practitioner. Keep up the great work!
Raym
Thanks for the clarification!!!
Adrian Rosebrock
Thanks Raym, I’m glad it helped 🙂
MImranKhan
but how we can use our own model that we train by our self rather than picking Pre-train model
Adrian Rosebrock
You would typically take a network pre-trained on ImageNet and then fine-tune it to your own dataset. You could train your own base network first and then fine-tune but whether or not that works better really depends on your dataset and project. I would suggest running experiments for both.
Vijin
I think mAP computation mentioned in this blog is wrong.
Adrian Rosebrock
Hey Vijin — what specifically regarding the mAP computation do you think is incorrect?
UPDATE: I went back and updated the mAP computation. I was trying to keep it simplistic but after reading (1) Ye Hu’s comment and (2) reviewing the post itself a few times I decided to go back and include the full calculation.
Tiri
very interesting article! hope to see soon new posts on object detection 🙂
in which bundle of your books do you do the object detection topic and examples like traffic signs?
Adrian Rosebrock
Hi Tiri, there will certainly be more posts on object detection. The Practitioner Bundle of Deep Learning for Computer Vision with Python discusses the traditional sliding window + image pyramid method for object detection, including how to use a CNN trained for classification as an object detector. The ImageNet Bundle includes all examples on training Faster R-CNNs and SSDs for traffic sign and front/rear view vehicle detection.
camp
nice. thank you
Nikhil
Hi Adrian, Why am I getting this error?
$ python3 filter_object_detection.py –prototxt MobileNetSSD_deploy.prototxt.txt –model MobileNetSSD_deploy.caffemodel
…
AttributeError: module ‘cv2’ has no attribute ‘dnn’
Adrian Rosebrock
Make sure you have at least OpenCV 3.3 installed (see the blog post for more details as I discuss why and how you can install OpenCV 3.3+).
Ye Hu
So do I. The mAP involves the precision-recall curve.
Adrian Rosebrock
In the context of object detection the precision would the proportion of our true positives (TP) for each image. The recall would be the proportion of the TP out of all the possible positives for each image. The average precision is then the average of maximum precision values at varying recall steps. I didn’t include the step value for the precision/recall calculation as this is meant to be an introductory blog post to object detection. It’s also not an exhaustive example of how to compute mAP for object detection either (although that could make for a good tutorial).
If anyone is finds the mAP explanation too simplified (or even too complicated) let me know and I will consider rewriting it.
UPDATE: I decided to go back and update the blog post to describe the full calculation. Trying to explain the entire mAP calculation is too much for this already lengthy blog post. I’ll cover a detailed computation of mAP in a future tutorial.
Chandramouleeswar
Hello Adrian,
Can you give me a suggestion for image recognition in videos? I am looking forward to implementing Mask-R CNN using Resnet as a base network for recognising persons, vehicles, traffic signals on roads from a video Dataset. What is the better Dataset for my choice?
Adrian Rosebrock
Just to clarify, are you looking to perform segmentation on each frame in the dataset which is essentially treating it like working with a set of images? Or are you trying to do activity recognition within the dataset as well where sequences of frames are important?
Elain
Can i get a link to the wallpaper?
Adrian Rosebrock
Which wallpaper are you referring to?
Gilad
I would like to understand how we can get 7fps.
When I trained a CNN for face detection and used Haar-cascade to detect the face itself, on the same computer I got ~7fps.
If I understand correctly, under the hood, the algorithm is running thousands inference on each box and calculate what it found. How can we reach 7fps?
Thx for very very interesting post.
G
Adrian Rosebrock
The deep learning face detector in this post will already get you over 7 FPS on the CPU. Haar cascades will run many times faster (but likely less accurate depending on your project). Are you using your own CNN trained for face detection? If so consider pushing the computation to the GPU for faster inference.
Gilad
I would like to understand what is under the hood of the network in your post. Is it indeed doing inference thousands of times for each picture as your post suggest?
Adrian Rosebrock
Be careful with the term “inference” here. Typically we use the term inference to refer to a prediction from the model as it’s inferring from the data. In the context of neural networks, an inference is a single forward pass which returns the prediction.
Perhaps you mean to say the network is performing thousands of computations for each input image? If so, that statement is correct.
Siladittya Manna
This post cleared a lot of confusion I had regarding implementation of object detection and image classification. Thanks a lot!!
Adrian Rosebrock
Thanks Siladittya, I’m happy to hear you found it helpful 🙂
Gilad
Thx Adrian again
https://youtu.be/ULE40CgDrwo
Adrian Rosebrock
Thanks so much for sharing your demo Gilad, great job! 🙂
Zubair Ahmed
Nice blog post and off course I learned this and more from your book. To all the readers, if you like this post make sure you get Adrian’s book
Adrian Rosebrock
Thanks Zubair! 😀
Zubair Ahmed
Well to top it off another tutorial to do Object Counting would be an awesome addition to this series 🙂
Suresh Kumar
Suresh Kumar:
#1
You have ignored, Human from this object detection.
How do I include, Human?
#2
I would like to add one object like watch or mobile to be detected, How do I add to the Caffe Model File?
Adrian Rosebrock
1. You could set the
IGNORE
set to be empty or you could modify the code to use aKEEP
class that includes only the specified set of classes.2. Please read the blog post as I discuss the answer to your question. You’ll want to apply fine-tuning/transfer learning.
Ken Rubio
Hi, Adrian.
I’m just a bit confused on the difference of fine-tuning, training OD framework from scratch, and transfer learning.
What I understand is that you will freeze the weights when retraining the model. But is it retraining the model with old classes but additional images, or is it retraining the model with new classes? How does this differ from training OD framework from scratch and transfer learning. This will be a great help for me. Thank you!
Adrian Rosebrock
It’s retraining the model using the new classes only but the frozen weights were learned from the original set of classes. If you need help regarding training from scratch, transfer learning, and fine-tuning be sure to read Deep Learning for Computer Vision with Python where I cover the topic in detail.
Dave A
Excellent post again. I’m really enjoying these. In a matter of weeks I’ve modified your code to communicate to some Node-Red flows I have sending me snapshots of motion, faces or certain classes of objects when detected on a Raspberry Pi 3b. (And not be ‘that guy’, but you may want to look over your figure numbering and the references within the text.)
You make it almost too easy. Thank you!
Adrian Rosebrock
Congrats on the progress Dave, that’s fantastic!
Suresh Kumar
Yes I have added the person, by excluding the lines of IGNORE.. Thank Sir..
#3
I need a log file to created after stopping the program, How may object are detected and what is the percentage of prediction of each object ..
How can I do that Sir ?
Adrian Rosebrock
You should read up on basic file I/O operations using the Python programming language. I’m happy to help but please take the time to do your proper research and read online. There are many Python tutorials available that teach you the fundamentals of the language.
Carlos
Hello Adrian,
Do you think SSD is better than YOLO for object dertection? I noticed you implement SSD on Image Bundle, and not YOLO. Why is that?
Another question, for detecting targets like airplanes and military targets from satellite images, which one would recommend?
Loving your 2nd book from dl4cv. When finish this, surely will buy the 3rd!
Thanks
Adrian Rosebrock
While YOLO is fast it’s not as accurate as SSDs or Faster R-CNNs. A general rule of thumb is that if you want pure speed and can sacrifice accuracy, use YOLO. If you need to detect tiny objects use Faster R-CNN. If you need a balance, use SSD.
As far as your second question goes, I assume those objects would appear to be pretty tiny. In that case, Faster R-CNN.
Carlos
Thanks for the answer!
I will try to study more about them, as I want to work in this area in the future.
Have a nice weekend!
sophia
quick question to clear up some confusion about comparison between YOLO and SSD. let’s say we need real-time inference, so we rule out any RCNN variant.
The YOLO-v3 paper points out that YOLO-v3’s accuracy is comparable to that of SSD. they also highlight that you can reduce the fps to improve accuracy such that it is faster than SSD and more accurate. Is there more to this aspect of comparing the accuracy-speed trade-off ?
Regarding accuracy on detecting small objects. can you give me some indication of what object would be considered small in an image, for which SSD might be more accurate?
really appreciate all of the work you’ve put in on this blog. looking forward to your reply,
Adrian Rosebrock
Keep in mind that there is a very real difference between the claims of a publication and what is actually obtained when used my practitioners and engineers. I have no doubt that the YOLO results in the publication are correct and that for their tests it matched SSD. However, the vast majority of times I’ve used YOLO for my own projects and trained it from scratch the results are not as good as SSD. It very rarely warrants the FPS increase.
Speaking of FPS increase, the YOLO model running inside OpenCV is actually slower than running a SSD. I’ll be doing a blog post on that soon as well.
All that said — there is no one true “best” object detector. You need to try them on your own projects and let the empirical results guide you.
sophia
thanks so much, Adrian. I look forward to more posts from you!
One last related general question on this topic:
would it be possible to train an SSD model to distinguish between a person’s different poses in an image? right now, SSD detects a bounding box around a person. what if we trained an SSD model on images of a person sitting and person standing? could we then get SSD to distinguish between a person sitting and a person standing?
i’m looking to combine object detection and human pose estimation in one model!
Any guidance on doing this will be greatly appreciated! Thanks again.
Adrian Rosebrock
You can technically do that, yes. Each pose would have its own label. However, you might get better performance out of a model dedicated to “human activity recognition”.
Carlos
Dear Adrian,
On IMAGENET BUNDLE (Faster R-CNNs and Single Shot Detectors (SSDs)) you show how to train these architectures for object detection from my own dataset?
I am trying to identify cars, people and airplanes from aerial images (satellite, drones, UAV).
I finished the Convolutional Neural Networks course from Coursera (Andrew Ng) and we implemented YOLO using YAD2K package, but I have no idea (yet) about how to train deep learning architectures for detect my own targets.
In which book (and chapter) I will find these answers?
Thanks for the attention.
Adrian Rosebrock
Hey Carlos — you are correct, the ImageNet Bundle of Deep Learning for Computer Vision with Python will show you how to train Faster R-CNNs and SSDs on your own custom datasets. You will find all chapters on how to perform object detection in the ImageNet Bundle of the book.
Lluis
Hi Adriam,
thanks for your detailed tutorials, they are a big help to start with deep-learning. What I want to accomplish is to train a network to detect objects (not only classify). The images are in FITS format, used in astronomy images. I was able to train a model in order to classify the object (I followed one of your tutoria Santa/not Santa), but with object detection is not so easy. All the examples or tutorials start with a pretrained newtwork, but I need to start from scratch. Do you have any advice or source that I could follow to accomplish my goal?
Thanks in advance!
Adrian Rosebrock
Hi Lluis — I have a number of chapters inside Deep Learning for Computer Vision with Python that demonstrate how to train an object detector model from scratch. That would be my recommended starting point for you to achieve your goal.
Lluis
Hi Adrian,
thanks, I will take a look, and let you know with the result.
Thanks and regards.
Márcio
Hello Adrian, do you have raspberry sdcard .iso with that project?
Adrian Rosebrock
I do. My Raspbian .img file with OpenCV pre-configured and pre-installed is included in the Quickstart Bundle and Hardcopy Bundle of Practical Python and OpenCV.
Benya Jamiu
Dear Dr.
Infact i’m yet to buy the book or enroll in any of your course but you have made most of my days and im just lloking for a place to practice it right and i have applied for Msc in AI here in Paris to be specialized in Computer Vision , very soon i will be buy both your books but right now i’m practising all your examples online
You are great without leaving my room and im moving closer to …..GURU specialist in Computer Vision even with many stress but still practising sleeping 12-02:00 am sometimes
Adrian Rosebrock
Thank you for the kind words, Benya. I’m so happy to hear you are enjoying the blog and will one day pick up a copy of my books. Keep practicing, you’re doing great! 🙂
Dillon Wells
I am a big fan of the prevalence of your Beagle in these blogs. Truly a wholesome meme.
Adrian Rosebrock
Thanks Dillon 🙂
Charles
Hello Adrian,
I was wondering where to find already provided functions for evaluating an object detector. Is there any package for evaluation of common metrics in an object detection context with train/validation/tests sets? Or should I write them myself?
Thanks for your answer.
Adrian Rosebrock
It actually is fully dependent on the dataset you are using. Some datasets, like COCO or VOC, have very strict sets of training, validation, and testing sets, along with what metrics you are using. Most all datasets use some form of Intersection Over Union and mean Average Precision (mAP).
Carlos
Dear Adrian,
I am learning about Faster RCNN with your book, and now I am practicing in Kaggle challenges.
I have a doubt about the TensorFlow API and DICOM images. In your book, you explain how to initialize the annotation object used to store information regarding the bounding box, and write all information in TFRecords.
Will the tfAnnot.encoding works with ‘.dcm’ filetype (DICOM images)?
The examples in your book are for png and jpeg, and now I am wondering if the script works directly with this medical type of image or need some kind of adjustment.
Thanks for your support!
Adrian Rosebrock
Hey Carlos — I’ve never tried using the TensorFlow Object Detection API with DICOM images so I’m honestly not sure.
Charles
So even for determining if a detection is for instance a true positive? At least the basics?
Adrian Rosebrock
Sorry, I’m not sure what you are asking Charles. Could you elaborate?
Charles
Sure. Let’s say I have run a detector which gives me detections as confidence scores and bounding boxes. I would like to have a function given the ground truth, can characterize each detection as either a TP or FP (true or false positive for correct or wrong detections), and give me also false negatives (when the ground truth is not detected).
I am dealing with an example where I have one class but many objects to detect within an image. It would help me somehow evaluate pre-trained detectors.
Adrian Rosebrock
You would need Intersection Over Union along with mean Average Precision (mAP).
Charles
Yes I implemented functions to compute the precision, the recall, and the average precision (11-points and all-points interpolation). But say I am using a pre-trained detector and use its predictions on a new set of images, should I use a train-test set approach to find good values of thresholds for the confidence and Intersection over Union for the non-maximum suppression technique?
Also, can a detector have a good average precision and recall but a very low precision ?
Andreas
Hi Adrian,
Thank you for this tutorial. How do you test this on images instead of the videostream?
Adrian Rosebrock
Take a look at this tutorial where I cover deep learning object detection in images.
andreas
Thank you for your reply. Please keep the tutorials coming they are both inspirational and highly useful.
Adrian Rosebrock
Thanks Andreas, I have no plan on stopping writing tutorials.
Chetan Mahajan
Hi Sir, I have one question, How to Create a .caffemodel?
Adrian Rosebrock
The model was trained and created using the Caffe deep learning library.
vipul sonar
How to Apply this for offline video??
can you tell me the changes?
Adrian Rosebrock
You can use the
cv2.VideoCapture
function to load a video and loop through the frames. You would then apply the object detector to each individual frame. See this tutorial for an example.mario
Hello. I would like to track/rotoscope/cover the movements of an actress located inside a series of pictures (or inside a video) with the same exact movements of a 3D character that I have created with Blender. Also the camera is moving all around the people inside the footages. Can u tell me if this kind of job is doable with your script. Until now you have talked about object classification and detection,but what’s the pratical use of these ? or better,what’s the next step ? For example,in my specific scenario I could use the deep fake approach to swap the faces of the actors with the faces of the 3D characters,but I see that it needs an high level of computational power that i don’t have on my pc. I can’t use it. I’m here because I hope that I can swap the real human figures (and their movements) with the fake / 3D human figures of characters created in Blender in an easy and fast way. Is that possible with your script ? how ? thanks.
Adrian Rosebrock
Hey Mario — I don’t have any experience with Blender so I unfortunately cannot provide any guidance there. I would suggest you look into “human pose estimation”. That technique will give you a set of keypoints mapping to various parts of the body which you should then be able to ideally transfer to your model.
Gaurav
thanks Adrian.
I need to detect and and count the cars comes from road how can i do it?
please guide me.
Adrian Rosebrock
See my reply to Sandip.
Sandip
Hi Adrian This is the Great Work.
I need to track on the cars like the faces which are successfully tracked by the code which was given by you..
so please tell me how can i track on cars can you guide me please?
Adrian Rosebrock
Try using this tutorial but swap out the “person” class for the car/truck/bus classes.
YuhwanPark
Hi
I saw good posting.
And the implementation shows high detection rate.
I have a question.
Can you answer?
The current configuration is detection for various objects.
But I want to detect only people.
What do i do to detect only people?
If I only detect people, can I get a higher frame rate than before?
If not, do you have a posting that can detect only people?
Are people detection in the posting showing a high detection rate?
Waiting for an answer.
thanks!
Adrian Rosebrock
Trying to detect only a single object class isn’t going to improve the frame rate of the model (as I demonstrated in this post). Perhaps I’m not understanding your question?
david
Thanks for the tutorial. I have a question that I want the program to run in the background of an existing video, what should I do. Please help me.
Adrian Rosebrock
I’m not sure what you mean by running in the background of an existing video — could you clarify?
AP
Hi Adrian, I am using MobileNet-SSD Model for detecting vehicles. Although, the model is able to detect object quickly it needs to be more accurate to be feasible for traffic detection. It doesn’t perform that well with small images of cars in different frames. Can you suggest some ways to increase accuracy and enhance the performance of the model?
Adrian Rosebrock
One of the best methods is to take the model and fine-tune it on your own example images, thereby increasing accuracy. I cover how to fine-tune object detectors, including how to improve accuracy, inside Deep Learning for Computer Vision with Python.
AP
It would be time-consuming and would take a whole lot of effort for compiling a dataset of cars and I can’t see how it would significantly increase the accuracy of the model as it has been trained on car dataset already. Could you suggest some other pre-trained models for the same which might have better accuracy?
Adrian Rosebrock
No ML, DL, or CV algorithm is perfect. Just because a model is trained on a dataset with car examples doesn’t mean it will magically be able to detect all other examples of cars. It could be the images the model was trained on were high quality and perhaps yours are lower quality images.
The point is this:
A model is only as good as its training data, operating under the assumption that the data it was trained on mimics where it is to be deployed.
If that assumption doesn’t hold then you cannot expect good performance from your model.
AP
Thanks, Adrian! Your tutorials have helped a lot throughout. Will try to fine tune my model as to the specific needs.
Jing Lu
Could you explain how NMS works?
Adrian Rosebrock
See this tutorial on NMS.
Eric T
Minor typo in paragraph:
In Figure 4 (left) I have included a visual example of a ground-truth bounding box…
This refers to Figure 5.
Adrian Rosebrock
Thanks Eric!
Umesh S
Hi Adrian, Thanks for great tutorial. I am trying to build object detection model on custom dataset. Can you please let me know which tool you used for label annotation of images?
Adrian Rosebrock
I provide my suggested image annotation tools inside my book, Deep Learning for Computer Vision with Python.
Bart
Great stuff Adrian! Thanks for these very useful guides.
I’m not a python expert, nor do I intend to become one, but it’s cool to use what you put together and try to add my own logic to it. I have some minor PHP experience so I’m not a total stranger to scripting and I’m learning more about Python while playing with it.
I’m trying to make something that detects birds, zoom my camera towards them and make a picture. I’m focussing on working with persons and cars first because I live on a very busy street with almost every few seconds something comes by, mostly people and cars, so it’s easier and more fun to play with it for now. But maybe later I can even use these images to further train a bird model.
So far so good. I’ve tried your guides on detecting object in images. Really cool that it works perfectly on my own images too, but of course this is the point. Then on to videos and then went on to interpreting my webcam stream.
Now playing around printing variables like detections, idx, label, box and seeing what they are and how they work. My next challenge is to being able to make arrays per frame of only unique persons and their centre coordinates. Then I’ll have enough overview in the code (and I guess my head) to play with them and calculate distance between them for instance.
Adrian Rosebrock
Thanks for the comment Bart, I’m glad you’re enjoying the guides!
All the best to you.
Simo
Hi Adrian, Thank you for the great post.
Wanted to know how the approach with faster R-CNN deals with small objects and also clustered objects ? since you mentioned those as downside in the YOLO blog post.
Thank you
Adrian Rosebrock
Faster R-CNN will indeed work better with smaller objects, including clustered objects. You should refer to Deep Learning for Computer Vision with Python if you want to learn how to train your own Faster R-CNNs.
Charlotte
Thanks for great tutorial Adrian, but i have some problems. When i use this code it works but i want one box around the object like your video but i have many many boxes around the object. I dontunderstand this problem How can i fixed this? Maybe you can help. Thanks for your help.
rajmeet
Hello Adrian
i want to detect some object in a frame by camera mounted on the robot and then robot moves to wards that object. Is this possible if yes then kindly help me. i am using raspberry pi3 + raspb pi camera.
Adrian Rosebrock
That exact project is covered inside Raspberry Pi for Computer Vision.
Andre
Thanks this was a great tutorial… I’m studying CS in college and your blogs and books are better than everything else I’ve seen
Adrian Rosebrock
Thanks Andre, I really appreciate that 🙂 Good luck with your studies!
Hakim
Hi, Adrian.
First, thank you for this tutorial and the many other tutorials that you have made. It has really helped me with my studies and creating content that I could place within my portfolio. Plus, it was a lot of fun to make these things using the raspberry pi camera.
I followed what you did for preventing certain kinds of objects from being detected, and it worked once. However, future runs of the program led to the terminal freezing up, or just the application not reaching the stage where the camera would be activated. I know that it is something to do with the statement that checks the list. I’m not sure about how to get around this, or if there really is a way around this, without having to make a whole new training set.
Also, I was wondering about your personal take on having multiple kinds of detections happening at the same time. Currently, i’m working with a team that is trying to build a robot. We have the edge detection down, but we also need object recognition as well. I tried to get both running on the same camera at the same time, but the result was rather bad ( slow, buggy, inconsistent). How would you handle multiple processes like that on a camera and on the raspberry pi?
Likui
HI,
You have good tutorial for deep learning.
Are you planning to do any tutorial on 3D object detection using Lidar data?
If you do, it would be very greatly helpful.
regards
Adrian Rosebrock
I don’t have any plans on that topic but I’ll consider it for the future.
Jasmeet
Hi Adrian, thank you for sharing knowledge with all of us without a price :).
I would like to ask about the above gif (Figure 8: Real-time deep learning object detection for front and rear views of vehicles.)
This is exactly what i’m looking for, is this done on this same code by ignoring other classes or this is completely different code for real time car tracking, mind sharing ?
Adrian Rosebrock
That code (and pre-trained model) is contained with Deep Learning for Computer Vision with Python. I would suggest you start there if you’re interested in the project.
Arvind Chandel
Hi Adrian, I have a query related to Tensorflow object detection API below:
Suppose i train tensorflow faster Rcnn_inception on any custom data having 10 classes like ball, bottle, Coca etc.. and its performing quite well. Now later i got some new data of 10 more classes like Paperboat, Thums up etc and want my model to trained on these too. Is there any method so that i can retrain my generated model for these 10 new classes too to upgrade itself for 20 classes, rather starting training from scratch.
Adrian Rosebrock
Hey Arvind — if you need help using the TensorFlow Object Detection API, including how to train from scratch and fine-tune, I recommend you read Deep Learning for Computer Vision with Python.
Saurabh
Hi Adrian,
Thanks for sharing the interesting blog!
I have trained object detection using ssd (mobilenet-v1) on custom dataset. The dataset consist of uno playing card images (skip, reverse, and draw four). On all these cards, model performs pretty well as I have trained model only on these 3 card (around 278 images with 829 bounding boxes collected using mobile phone).
However, I haven’t trained model on any other card but still it detects other cards (inference using webcam).
How can I fix this? Should I also collect other class images (anything other than skip, reverse and draw four cards) and ignore this class in operation? So that model sees this class images during training and doesn’t put any label during inference.
Please share your views and feel free to correct me!
Thanking you!
Adrian Rosebrock
I would suggest you read Deep Learning for Computer Vision with Python which covers my tips, suggestions, and best practices on training your own custom object detectors.
Salma
I want to run object detection algorithm to detect only Human
on video frames and then re-assemble them into another video by cropped the human detected section only in which Human do some activity , can you please suggest me…
Dhruv
Hi, First of all, thanks for these beautiful blogs.
I have one doubt on this please help me clear that when I’m trying to run this same code for a video file why the video is playing so fast?
Adrian Rosebrock
Because OpenCV is processing the frames as fast as it possibly can. It has no concept of a frame rate — it just wants to process frames as quickly as possible.