In this tutorial, you’ll learn how to use the YOLO object detector to detect objects in both images and video streams using Deep Learning, OpenCV, and Python.
By applying object detection, you’ll not only be able to determine what is in an image but also where a given object resides!
We’ll start with a brief discussion of the YOLO object detector, including how the object detector works.
From there we’ll use OpenCV, Python, and deep learning to:
- Apply the YOLO object detector to images
- Apply YOLO to video streams
We’ll wrap up the tutorial by discussing some of the limitations and drawbacks of the YOLO object detector, including some of my personal tips and suggestions.
A dataset with annotated objects is critical for understanding and implementing YOLO object detection. It aids in building a model that can detect and classify various objects in images or videos.
Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.
Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.
You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.
Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.
To learn how to use YOLO for object detection with OpenCV, just keep reading!
- Update July 2021: Added section on YOLO v4 and YOLO v5, including how they can be incorporated into OpenCV and PyTorch projects.
Looking for the source code to this post?
Jump Right To The Downloads SectionYOLO Object detection with OpenCV
In the rest of this tutorial we’ll:
- Discuss the YOLO object detector model and architecture
- Utilize YOLO to detect objects in images
- Apply YOLO to detect objects in video streams
- Discuss some of the limitations and drawbacks of the YOLO object detector
Let’s dive in!
Note: This post was last updated on February 5th, 2022 to update images, references, and formatting. Enjoy!
What is the YOLO object detector?
When it comes to deep learning-based object detection, there are three primary object detectors you’ll encounter:
- R-CNN and their variants, including the original R-CNN, Fast R- CNN, and Faster R-CNN
- Single Shot Detector (SSDs)
- YOLO
R-CNNs are one of the first deep learning-based object detectors and are an example of a two-stage detector.
- In the first R-CNN publication, Rich feature hierarchies for accurate object detection and semantic segmentation, (2013) Girshick et al. proposed an object detector that required an algorithm such as Selective Search (or equivalent) to propose candidate bounding boxes that could contain objects.
- These regions were then passed into a CNN for classification, ultimately leading to one of the first deep learning-based object detectors.
The problem with the standard R-CNN method was that it was painfully slow and not a complete end-to-end object detector.
Girshick et al. published a second paper in 2015, entitled Fast R- CNN. The Fast R-CNN algorithm made considerable improvements to the original R-CNN, namely increasing accuracy and reducing the time it took to perform a forward pass; however, the model still relied on an external region proposal algorithm.
It wasn’t until Girshick et al.’s follow-up 2015 paper, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, that R-CNNs became a true end-to-end deep learning object detector by removing the Selective Search requirement and instead relying on a Region Proposal Network (RPN) that is (1) fully convolutional and (2) can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image). The outputs of the RPNs are then passed into the R-CNN component for final classification and labeling.
While R-CNNs tend to be very accurate, the biggest problem with the R-CNN family of networks is their speed — they were incredibly slow, obtaining only 5 FPS on a GPU.
To help increase the speed of deep learning-based object detectors, both Single Shot Detectors (SSDs) and YOLO use a one-stage detector strategy.
These algorithms treat object detection as a regression problem, taking a given input image and simultaneously learning bounding box coordinates and corresponding class label probabilities.
In general, single-stage detectors tend to be less accurate than two-stage detectors but are significantly faster.
YOLO is a great example of a single stage detector.
First introduced in 2015 by Redmon et al., their paper, You Only Look Once: Unified, Real-Time Object Detection, details an object detector capable of super real-time object detection, obtaining 45 FPS on a GPU.
Note: A smaller variant of their model called “Fast YOLO” claims to achieve 155 FPS on a GPU.
YOLO has gone through a number of different iterations, including YOLO9000: Better, Faster, Stronger (i.e., YOLOv2), capable of detecting over 9,000 object detectors.
Redmon and Farhadi are able to achieve such a large number of object detections by performing joint training for both object detection and classification. Using joint training the authors trained YOLO9000 simultaneously on both the ImageNet classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000, that can predict detections for object classes that don’t have labeled detection data.
While interesting and novel, YOLOv2’s performance was a bit underwhelming given the title and abstract of the paper.
On the 156 class version of COCO, YOLO9000 achieved 16% mean Average Precision (mAP), and yes, while YOLO can detect 9,000 separate classes, the accuracy is not quite what we would desire.
Redmon and Farhadi recently published a new YOLO paper, YOLOv3: An Incremental Improvement (2018). YOLOv3 is significantly larger than previous models but is, in my opinion, the best one yet out of the YOLO family of object detectors.
We’ll be using YOLOv3 in this blog post, in particular, YOLO trained on the COCO dataset.
The COCO dataset consists of 80 labels, including, but not limited to:
- People
- Bicycles
- Cars and trucks
- Airplanes
- Stop signs and fire hydrants
- Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
- Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
- … and much more!
You can find a full list of what YOLO trained on the COCO dataset can detect using this link.
I’ll wrap up this section by saying that any academic needs to read Redmon’s YOLO papers and tech reports — not only are they novel and insightful, they are incredibly entertaining as well.
But seriously, if you do nothing else today read the YOLOv3 tech report.
It’s only 6 pages and one of those pages is just references/citations.
Furthermore, the tech report is honest in a way that academic papers rarely, if ever, are.
Project structure
Let’s take a look at today’s project layout. You can use your OS’s GUI (Finder for OSX, Nautilus for Ubuntu), but you may find it easier and faster to use the tree
command in your terminal:
$ tree . ├── images │ ├── baggage_claim.jpg │ ├── dining_table.jpg │ ├── living_room.jpg │ └── soccer.jpg ├── output │ ├── airport_output.avi │ ├── car_chase_01_output.avi │ ├── car_chase_02_output.avi │ └── overpass_output.avi ├── videos │ ├── airport.mp4 │ ├── car_chase_01.mp4 │ ├── car_chase_02.mp4 │ └── overpass.mp4 ├── yolo-coco │ ├── coco.names │ ├── yolov3.cfg │ └── yolov3.weights ├── yolo.py └── yolo_video.py 4 directories, 19 files
Our project today consists of 4 directories and two Python scripts.
The directories (in order of importance) are:
yolo-coco/
: The YOLOv3 object detector pre-trained (on the COCO dataset) model files. These were trained by the Darknet team.images/
: This folder contains four static images which we’ll perform object detection on for testing and evaluation purposes.videos/
: After performing object detection with YOLO on images, we’ll process videos in real time. This directory contains four sample videos for you to test with.output/
: Output videos that have been processed by YOLO and annotated with bounding boxes and class names can go in this folder.
We’re reviewing two Python scripts — yolo.py
and yolo_video.py
. The first script is for images and then we’ll take what we learn and apply it to video in the second script.
Are you ready?
YOLO object detection in images
Let’s get started applying the YOLO object detector to images!
Open up the yolo.py
file in your project and insert the following code:
# import the necessary packages import numpy as np import argparse import time import cv2 import os # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image") ap.add_argument("-y", "--yolo", required=True, help="base path to YOLO directory") ap.add_argument("-c", "--confidence", type=float, default=0.5, help="minimum probability to filter weak detections") ap.add_argument("-t", "--threshold", type=float, default=0.3, help="threshold when applying non-maxima suppression") args = vars(ap.parse_args())
All you need installed for this script is OpenCV 3.4.2+ with Python bindings. You can find my OpenCV installation tutorials here, just keep in mind that OpenCV 4 is in beta right now — you may run into issues installing or running certain scripts since it’s not an official release. For the time being I recommend going for OpenCV 3.4.2+. You can actually be up and running in less than 5 minutes with pip as well.
First, we import our required packages — as long as OpenCV and NumPy are installed, your interpreter will breeze past these lines.
Now let’s parse four command line arguments. Command line arguments are processed at runtime and allow us to change the inputs to our script from the terminal. If you aren’t familiar with them, I encourage you to read more in my previous tutorial. Our command line arguments include:
--image
: The path to the input image. We’ll detect objects in this image using YOLO.--yolo
: The base path to the YOLO directory. Our script will then load the required YOLO files in order to perform object detection on the image.--confidence
: Minimum probability to filter weak detections. I’ve given this a default value of 50% (0.5
), but you should feel free to experiment with this value.--threshold
: This is our non-maxima suppression threshold with a default value of0.3
. You can read more about non-maxima suppression here.
After parsing, the args
variable is now a dictionary containing the key-value pairs for the command line arguments. You’ll see args
a number of times in the rest of this script.
Let’s load our class labels and set random colors for each:
# load the COCO class labels our YOLO model was trained on labelsPath = os.path.sep.join([args["yolo"], "coco.names"]) LABELS = open(labelsPath).read().strip().split("\n") # initialize a list of colors to represent each possible class label np.random.seed(42) COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8")
Here we load all of our class LABELS
(notice the first command line argument, args["yolo"]
being used) on Lines 21 and 22. Random COLORS
are then assigned to each label on Lines 25-27.
Let’s derive the paths to the YOLO weights and configuration files followed by loading YOLO from disk:
# derive the paths to the YOLO weights and model configuration weightsPath = os.path.sep.join([args["yolo"], "yolov3.weights"]) configPath = os.path.sep.join([args["yolo"], "yolov3.cfg"]) # load our YOLO object detector trained on COCO dataset (80 classes) print("[INFO] loading YOLO from disk...") net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
To load YOLO from disk on Line 35, we’ll take advantage of OpenCV’s DNN function called cv2.dnn.readNetFromDarknet
. This function requires both a configPath
and weightsPath
which are established via command line arguments on Lines 30 and 31.
I cannot stress this enough: you’ll need at least OpenCV 3.4.2 to run this code as it has the updated dnn
module required to load YOLO.
Let’s load the image and send it through the network:
# load our input image and grab its spatial dimensions image = cv2.imread(args["image"]) (H, W) = image.shape[:2] # determine only the *output* layer names that we need from YOLO ln = net.getLayerNames() ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()] # construct a blob from the input image and then perform a forward # pass of the YOLO object detector, giving us our bounding boxes and # associated probabilities blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False) net.setInput(blob) start = time.time() layerOutputs = net.forward(ln) end = time.time() # show timing information on YOLO print("[INFO] YOLO took {:.6f} seconds".format(end - start))
In this block we:
- Load the input
image
and extract its dimensions (Lines 38 and 39). - Determine the output layer names from the YOLO model (Lines 42 and 43).
- Construct a
blob
from the image (Lines 48 and 49). Are you confused about what a blob is or what thecv2.dnn.blobFromImage
does? Give this blog post a read.
Now that our blob is prepared, we’ll
- Perform a forward pass through our YOLO network (Lines 50 and 52)
- Show the inference time for YOLO (Line 56)
What good is object detection unless we visualize our results? Let’s take steps now to filter and visualize our results.
But first, let’s initialize some lists we’ll need in the process of doing so:
# initialize our lists of detected bounding boxes, confidences, and # class IDs, respectively boxes = [] confidences = [] classIDs = []
These lists include:
boxes
: Our bounding boxes around the object.confidences
: The confidence value that YOLO assigns to an object. Lower confidence values indicate that the object might not be what the network thinks it is. Remember from our command line arguments above that we’ll filter out objects that don’t meet the0.5
threshold.classIDs
: The detected object’s class label.
Let’s begin populating these lists with data from our YOLO layerOutputs
:
# loop over each of the layer outputs for output in layerOutputs: # loop over each of the detections for detection in output: # extract the class ID and confidence (i.e., probability) of # the current object detection scores = detection[5:] classID = np.argmax(scores) confidence = scores[classID] # filter out weak predictions by ensuring the detected # probability is greater than the minimum probability if confidence > args["confidence"]: # scale the bounding box coordinates back relative to the # size of the image, keeping in mind that YOLO actually # returns the center (x, y)-coordinates of the bounding # box followed by the boxes' width and height box = detection[0:4] * np.array([W, H, W, H]) (centerX, centerY, width, height) = box.astype("int") # use the center (x, y)-coordinates to derive the top and # and left corner of the bounding box x = int(centerX - (width / 2)) y = int(centerY - (height / 2)) # update our list of bounding box coordinates, confidences, # and class IDs boxes.append([x, y, int(width), int(height)]) confidences.append(float(confidence)) classIDs.append(classID)
There’s a lot here in this code block — let’s break it down.
In this block, we:
- Loop over each of the
layerOutputs
(beginning on Line 65). - Loop over each
detection
inoutput
(a nested loop beginning on Line 67). - Extract the
classID
andconfidence
(Lines 70-72). - Use the
confidence
to filter out weak detections (Line 76).
Now that we’ve filtered out unwanted detections, we’re going to:
- Scale bounding box coordinates so we can display them properly on our original image (Line 81).
- Extract coordinates and dimensions of the bounding box (Line 82). YOLO returns bounding box coordinates in the form:
(centerX, centerY, width, and height)
. - Use this information to derive the top-left (x, y)-coordinates of the bounding box (Lines 86 and 87).
- Update the
boxes
,confidences
, andclassIDs
lists (Lines 91-93).
With this data, we’re now going to apply what is called “non-maxima suppression”:
# apply non-maxima suppression to suppress weak, overlapping bounding # boxes idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"], args["threshold"])
YOLO does not apply non-maxima suppression for us, so we need to explicitly apply it.
Applying non-maxima suppression suppresses significantly overlapping bounding boxes, keeping only the most confident ones.
NMS also ensures that we do not have any redundant or extraneous bounding boxes.
Taking advantage of OpenCV’s built-in DNN module implementation of NMS, we perform non-maxima suppression on Lines 97 and 98. All that is required is that we submit our bounding boxes
, confidences
, as well as both our confidence threshold and NMS threshold.
If you’ve been reading this blog, you might be wondering why we didn’t use my imutils implementation of NMS. The primary reason is that the NMSBoxes
function is now working in OpenCV. Previously it failed for some inputs and resulted in an error message. Now that the NMSBoxes
function is working, we can use it in our own scripts.
Let’s draw the boxes and class text on the image!
# ensure at least one detection exists if len(idxs) > 0: # loop over the indexes we are keeping for i in idxs.flatten(): # extract the bounding box coordinates (x, y) = (boxes[i][0], boxes[i][1]) (w, h) = (boxes[i][2], boxes[i][3]) # draw a bounding box rectangle and label on the image color = [int(c) for c in COLORS[classIDs[i]]] cv2.rectangle(image, (x, y), (x + w, y + h), color, 2) text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i]) cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) # show the output image cv2.imshow("Image", image) cv2.waitKey(0)
Assuming at least one detection exists (Line 101), we proceed to loop over idxs
determined by non-maxima suppression.
Then, we simply draw the bounding box and text on image
using our random class colors (Lines 105-113).
Finally, we display our resulting image until the user presses any key on their keyboard (ensuring the window opened by OpenCV is selected and focused).
To follow along with this guide, make sure you use the “Downloads” section of this tutorial to download the source code, YOLO model, and example images.
From there, open up a terminal and execute the following command:
$ python yolo.py --image images/baggage_claim.jpg --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] YOLO took 0.347815 seconds
Here you can see that YOLO has not only detected each person in the input image, but also the suitcases as well!
Furthermore, if you take a look at the right corner of the image you’ll see that YOLO has also detected the handbag on the lady’s shoulder.
Let’s try another example:
$ python yolo.py --image images/living_room.jpg --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] YOLO took 0.340221 seconds
The image above contains a person (myself) and a dog (Jemma, the family beagle).
YOLO also detects the TV monitor and a chair as well. I’m particularly impressed that YOLO was able to detect the chair given that it’s handmade, old fashioned “baby high chair”.
Interestingly, YOLO thinks there is a “remote” in my hand. It’s actually not a remote — it’s the reflection of glass on a VHS tape; however, if you stare at the region it actually does look like it could be a remote.
The following example image demonstrates a limitation and weakness of the YOLO object detector:
$ python yolo.py --image images/dining_table.jpg --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] YOLO took 0.362369 seconds
While both the wine bottle, dining table, and vase are correctly detected by YOLO, only one of the two wine glasses is properly detected.
We discuss why YOLO struggles with objects close together in the “Limitations and drawbacks of the YOLO object detector” section below.
Let’s try one final image:
$ python yolo.py --image images/soccer.jpg --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] YOLO took 0.345656 seconds
YOLO is able to correctly detect each of the players on the pitch, including the soccer ball itself. Notice the person in the background who is detected despite the area being highly blurred and partially obscured.
YOLO object detection in video streams
Now that we’ve learned how to apply the YOLO object detector to single images, let’s also utilize YOLO to perform object detection in input video files as well.
Open up the yolo_video.py
file and insert the following code:
# import the necessary packages import numpy as np import argparse import imutils import time import cv2 import os # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--input", required=True, help="path to input video") ap.add_argument("-o", "--output", required=True, help="path to output video") ap.add_argument("-y", "--yolo", required=True, help="base path to YOLO directory") ap.add_argument("-c", "--confidence", type=float, default=0.5, help="minimum probability to filter weak detections") ap.add_argument("-t", "--threshold", type=float, default=0.3, help="threshold when applyong non-maxima suppression") args = vars(ap.parse_args())
We begin with our imports and command line arguments.
Notice that this script doesn’t have the --image
argument as before. To take its place, we now have two video-related arguments:
--input
: The path to the input video file.--output
: Our path to the output video file.
Given these arguments, you can now use videos that you record of scenes with your smartphone or videos you find online. You can then process the video file producing an annotated output video. Of course if you want to use your webcam to process a live video stream, that is possible too. Just find examples on PyImageSearch where the VideoStream
class from imutils.video
is utilized and make some minor changes.
Moving on, the next block is identical to the block from the YOLO image processing script:
# load the COCO class labels our YOLO model was trained on labelsPath = os.path.sep.join([args["yolo"], "coco.names"]) LABELS = open(labelsPath).read().strip().split("\n") # initialize a list of colors to represent each possible class label np.random.seed(42) COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8") # derive the paths to the YOLO weights and model configuration weightsPath = os.path.sep.join([args["yolo"], "yolov3.weights"]) configPath = os.path.sep.join([args["yolo"], "yolov3.cfg"]) # load our YOLO object detector trained on COCO dataset (80 classes) # and determine only the *output* layer names that we need from YOLO print("[INFO] loading YOLO from disk...") net = cv2.dnn.readNetFromDarknet(configPath, weightsPath) ln = net.getLayerNames() ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
Here we load labels and generate colors followed by loading our YOLO model and determining output layer names.
Next, we’ll take care of some video-specific tasks:
# initialize the video stream, pointer to output video file, and # frame dimensions vs = cv2.VideoCapture(args["input"]) writer = None (W, H) = (None, None) # try to determine the total number of frames in the video file try: prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \ else cv2.CAP_PROP_FRAME_COUNT total = int(vs.get(prop)) print("[INFO] {} total frames in video".format(total)) # an error occurred while trying to determine the total # number of frames in the video file except: print("[INFO] could not determine # of frames in video") print("[INFO] no approx. completion time can be provided") total = -1
In this block, we:
- Open a file pointer to the video file for reading frames in the upcoming loop (Line 45).
- Initialize our video
writer
and frame dimensions (Lines 46 and 47). - Try to determine the
total
number of frames in the video file so we can estimate how long processing the entire video will take (Lines 50-61).
Now we’re ready to start processing frames one by one:
# loop over frames from the video file stream while True: # read the next frame from the file (grabbed, frame) = vs.read() # if the frame was not grabbed, then we have reached the end # of the stream if not grabbed: break # if the frame dimensions are empty, grab them if W is None or H is None: (H, W) = frame.shape[:2]
We define a while
loop (Line 64) and then we grab our first frame (Line 66).
We make a check to see if it is the last frame of the video. If so we need to break
from the while
loop (Lines 70 and 71).
Next, we grab the frame dimensions if they haven’t been grabbed yet (Lines 74 and 75).
Next, let’s perform a forward pass of YOLO, using our current frame
as the input:
# construct a blob from the input frame and then perform a forward # pass of the YOLO object detector, giving us our bounding boxes # and associated probabilities blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False) net.setInput(blob) start = time.time() layerOutputs = net.forward(ln) end = time.time() # initialize our lists of detected bounding boxes, confidences, # and class IDs, respectively boxes = [] confidences = [] classIDs = []
Here we construct a blob
and pass it through the network, obtaining predictions. I’ve surrounded the forward pass operation with time stamps so we can calculate the elapsed time to make predictions on one frame — this will help us estimate the time needed to process the entire video.
We’ll then go ahead and initialize the same three lists we used in our previous script: boxes
, confidences
, and classIDs
.
This next block is, again, identical to our previous script:
# loop over each of the layer outputs for output in layerOutputs: # loop over each of the detections for detection in output: # extract the class ID and confidence (i.e., probability) # of the current object detection scores = detection[5:] classID = np.argmax(scores) confidence = scores[classID] # filter out weak predictions by ensuring the detected # probability is greater than the minimum probability if confidence > args["confidence"]: # scale the bounding box coordinates back relative to # the size of the image, keeping in mind that YOLO # actually returns the center (x, y)-coordinates of # the bounding box followed by the boxes' width and # height box = detection[0:4] * np.array([W, H, W, H]) (centerX, centerY, width, height) = box.astype("int") # use the center (x, y)-coordinates to derive the top # and and left corner of the bounding box x = int(centerX - (width / 2)) y = int(centerY - (height / 2)) # update our list of bounding box coordinates, # confidences, and class IDs boxes.append([x, y, int(width), int(height)]) confidences.append(float(confidence)) classIDs.append(classID)
In this code block, we:
- Loop over output layers and detections (Lines 94-96).
- Extract the
classID
and filter out weak predictions (Lines 99-105). - Compute bounding box coordinates (Lines 111-117).
- Update our respective lists (Lines 121-123).
Next, we’ll apply non-maxima suppression and begin to proceed to annotate the frame:
# apply non-maxima suppression to suppress weak, overlapping # bounding boxes idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"], args["threshold"]) # ensure at least one detection exists if len(idxs) > 0: # loop over the indexes we are keeping for i in idxs.flatten(): # extract the bounding box coordinates (x, y) = (boxes[i][0], boxes[i][1]) (w, h) = (boxes[i][2], boxes[i][3]) # draw a bounding box rectangle and label on the frame color = [int(c) for c in COLORS[classIDs[i]]] cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2) text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i]) cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
You should recognize these lines as well. Here we:
- Apply NMS using the
cv2.dnn.NMSBoxes
function (Lines 127 and 128) to suppress weak, overlapping bounding boxes. You can read more about non-maxima suppression here. - Loop over the
idxs
calculated by NMS and draw the corresponding bounding boxes + labels (Lines 131-144).
Let’s finish out the script:
# check if the video writer is None if writer is None: # initialize our video writer fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = cv2.VideoWriter(args["output"], fourcc, 30, (frame.shape[1], frame.shape[0]), True) # some information on processing single frame if total > 0: elap = (end - start) print("[INFO] single frame took {:.4f} seconds".format(elap)) print("[INFO] estimated total time to finish: {:.4f}".format( elap * total)) # write the output frame to disk writer.write(frame) # release the file pointers print("[INFO] cleaning up...") writer.release() vs.release()
To wrap up, we simply:
- Initialize our video
writer
if necessary (Lines 147-151). Thewriter
will be initialized on the first iteration of the loop. - Print out our estimates of how long it will take to process the video (Lines 154-158).
- Write the
frame
to the output video file (Line 161). - Cleanup and release pointers (Lines 165 and 166).
To apply YOLO object detection to video streams, make sure you use the “Downloads” section of this blog post to download the source, YOLO object detector, and example videos.
From there, open up a terminal and execute the following command:
$ python yolo_video.py --input videos/car_chase_01.mp4 \ --output output/car_chase_01.avi --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] 583 total frames in video [INFO] single frame took 0.3500 seconds [INFO] estimated total time to finish: 204.0238 [INFO] cleaning up...
When you execute the commands above you will see a GIF excerpt from a car chase video I found on YouTube. Why not share it with me on twitter @pyimagesearch?
In the video/GIF, you can see not only the vehicles being detected, but people, as well as the traffic lights, are detected too!
The YOLO object detector is performing quite well here. Let’s try a different video clip from the same car chase video:
$ python yolo_video.py --input videos/car_chase_02.mp4 \ --output output/car_chase_02.avi --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] 3132 total frames in video [INFO] single frame took 0.3455 seconds [INFO] estimated total time to finish: 1082.0806 [INFO] cleaning up...
When you run the commands above you will see a suspect on the run, we have used OpenCV and YOLO object detection to find the person! Share your output with me again on Twitter @pyimagesearch.
YOLO is once again able to detect people.
At one point the suspect is actually able to make it back to their car and continue the chase — let’s see how YOLO performs there as well:
$ python yolo_video.py --input videos/car_chase_03.mp4 \ --output output/car_chase_03.avi --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] 749 total frames in video [INFO] single frame took 0.3442 seconds [INFO] estimated total time to finish: 257.8418 [INFO] cleaning up...
As a final example, run the code above and you’ll see you can use YOLO as a starting point to building a traffic counter. Can you think of a great application that uses traffic counting?
I can — how about peak traffic detectors to notify people when to start a commute using traffic counting. Another option is to use traffic counting in order to measure the number of inbound customers to a brick and mortar business.
Bingo! There’s a business waiting to be launched right there. Be sure to run the script below and see the output.
$ python yolo_video.py --input videos/overpass.mp4 \ --output output/overpass.avi --yolo yolo-coco [INFO] loading YOLO from disk... [INFO] 812 total frames in video [INFO] single frame took 0.3534 seconds [INFO] estimated total time to finish: 286.9583 [INFO] cleaning up...
I’ve put together a full video of YOLO object detection examples below:
Credits for video and audio:
- Car chase video posted on YouTube by Quaker Oats.
- Overpass video on YouTube by Vlad Kiraly.
- “White Crow” on the FreeMusicArchive by XTaKeRuX.
Limitations and drawbacks of the YOLO object detector
Arguably the largest limitation and drawback of the YOLO object detector is that:
- It does not always handle small objects well
- It especially does not handle objects grouped close together
The reason for this limitation is due to the YOLO algorithm itself:
- The YOLO object detector divides an input image into an SxS grid where each cell in the grid predicts only a single object.
- If there exist multiple, small objects in a single cell then YOLO will be unable to detect them, ultimately leading to missed object detections.
Therefore, if you know your dataset consists of many small objects grouped close together then you should not use the YOLO object detector.
In terms of small objects, Faster R-CNN tends to work the best; however, it’s also the slowest.
SSDs can also be used here; however, SSDs can also struggle with smaller objects (but not as much as YOLO).
SSDs often give a nice tradeoff in terms of speed and accuracy as well.
It’s also worth noting that YOLO ran slower than SSDs in this tutorial. In my previous tutorial on OpenCV object detection, we utilized an SSD — a single forward pass of the SSD took ~0.03 seconds.
However, from this tutorial, we know that a forward pass of the YOLO object detector took ≈0.3 seconds, approximately, an order of magnitude slower!
If you’re using the pre-trained deep learning object detectors OpenCV supplies you may want to consider using SSDs over YOLO. From my personal experience, I’ve rarely encountered situations where I needed to use YOLO over SSDs:
- I have found SSDs much easier to train and their performance in terms of accuracy almost always outperforms YOLO (at least for the datasets I’ve worked with).
- YOLO may have excellent results on the COCO dataset; however, I have not found that same level of accuracy for my own tasks.
I, therefore, tend to use the following guidelines when picking an object detector for a given problem:
- If I know I need to detect small objects and speed is not a concern, I tend to use Faster R-CNN.
- If speed is absolutely paramount, I use YOLO.
- If I need a middle ground, I tend to go with SSDs.
In most of my situations I end up using SSDs or RetinaNet — both are a great balance between the YOLO/Faster R-CNN.
Alternative YOLO object detection models
We utilized YOLO v3 inside this tutorial to perform YOLO object detection with OpenCV.
Joseph Redmon, the creator of the YOLO object detector, has ceased working on YOLO due to privacy concerns and misuse in military applications; however, other researchers in the computer vision and deep learning community have continued his work.
While these are not “official” YOLO models (in the sense that Joseph Redmon did not create them nor does he endorse them), you will find publications and references to both YOLO v4 and YOLO v5 online:
If you use the PyTorch deep learning library, then definitely check out YOLO v5 — the library makes it super easy to train custom YOLO models; however, the output YOLO v5 models are not directly compatible with OpenCV (i.e., you’ll need to write additional code to make predictions on images/frames if you’re using OpenCV and YOLO v5 together).
YOLO v4, on the other hand, is compatible with OpenCV using the same code provided in this tutorial. All you need to do is provide the YOLO v4 weights and configuration files. This tutorial will help you get started using YOLO v4 with OpenCV.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial we learned how to perform YOLO object detection using Deep Learning, OpenCV, and Python.
We then briefly discussed the YOLO architecture followed by implementing Python code to:
- Apply YOLO object detection to single images
- Apply the YOLO object detector to video streams
On my machine with a 3GHz Intel Xeon W processor, a single forward pass of YOLO took ≈0.3 seconds; however, using a Single Shot Detector (SSD) from a previous tutorial, resulted in only 0.03 second detection, an order of magnitude faster!
For real-time deep learning-based object detection on your CPU with OpenCV and Python, you may want to consider using the SSD.
If you are interested in training your own deep learning object detectors on your own custom datasets, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I provide detailed guides on how to successfully train your own detectors.
I hope you enjoyed today’s YOLO object detection tutorial!
To download the source code to today’s post, and be notified when future PyImageSearch blog posts are published, just enter your email address in the form below.
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Naser
Great tutarial
Thanks adrian.
Can you make a tutarial and explain in that how can train yolo on our custom dataset??
Thank you.
Adrian Rosebrock
I actually cover how to train your own custom object detectors inside Deep Learning for Computer Vision with Python. I would suggest starting there.
Gary
Hi Adrian,
did you show in your book training custom objects with different frameworks like Yolo,YoloV3,Tensorflow,Mxnet and Caffe with faster-RNN vs. SSD?
If not, that would be great to see which framework has the best object multi detector for small and close objects. Hope you will think about this.
Thanks a lot for all your great tutorials.
Adrian Rosebrock
Inside the book I focus on Faster R-CNN, SSDs, and RetinaNet. Per my suggestions in this blog post I don’t tend to use YOLO that often.
yan
When I was using Raspberry 3B +, I encountered the error of Attribute Error:’ NoneType’ Object Has No Attribute’ Shape’, but I don’t know how to fix it. I hope I can get your guidance
Adrian Rosebrock
Your path to the input image is invalid and
cv2.imread
is returning “None”. Double-check the path to your input image. Also read this tutorial on NoneType errors.Yonten
I have trained my dataset on darknet and I am using your code to detect my trained images but I cannot see the bounding box. When I run in darknet, I can cleary see the output with the bounding box. Can you tell me which code I should edit?
Adrian Rosebrock
I would raise that question with the OpenCV developers. Your architecture may be different or some additional model conversion may need to take place.
issaiass
I had a similar error in my PC. There were two major issues:
1 – Something with Python 2.7 (not sure)
2 – Something with OpenCV below 3.4.2
Solutions:
1 – Anaconda3, Python version 3.6.x
2 – OpenCV over 3.4.2
Halil Gorkem
Thanks for the tutorial. However OpenCV does not make use of GPU. So in video processing it takes time
aiwen
so good!
Adrian Rosebrock
Thanks Aiwen, I’m glad you liked it!
ShivaGuntuku
Hi Adrian, thank you for the tutorial,
Although I am getting this error in ubuntu18 , python3.6 and cv2 version ‘3.4.0’
…
error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile,
Please help me out.
Adrian Rosebrock
You need at least OpenCV 3.4.2 for this tutorial. OpenCV 4 would work as well.
ShivaGuntuku
Thanks Adrian, it worked. Nice
Adrian Rosebrock
Awesome, I’m glad that worked.
Richard Wiseman
Might be worth updating the article to say 3.4.2 rather than 3.4 as it currently does. This caught me out too.
Adrian Rosebrock
Thanks for catching that Richard. I’ve updated the post 😉
John Kang
I got same error on Windows. I have OpenCV-Python 3.4.0 installed. How to install opencv-python 3.4.2 on windows?
thanks in advance
Adrian Rosebrock
I’m sorry to hear about the error message. You would indeed need to install OpenCV 3.4.2 or higher. That said, I do not officially support Windows here on the PyImageSearch blog (I haven’t even used a Windows machine in 11+ years now). When it comes to computer vision and deep learning I highly recommend you use a Unix-based machine such as Ubuntu or macOS. I have a number of OpenCV install tutorials for those operating systems. If you need help with Windows I would need to refer you to the official OpenCV website.
Ulrich
Hello John,
I use python 2.7 and opencv 2.4.11 on a windows10 System. I updated my opencv by using “pip install opencv-contrib-python” and opencv 3.4.5 was installed in a view minutes.
The code is running well!
Sourav
can it be implemented on a pi 3B?
Adrian Rosebrock
Yes, but it would be extremely slow, under 1 FPS (at least for the OpenCV + YOLO version). The Movidius NCS does have a YOLO model that supposedly works but I have never tried it — that would likely get you to a FPS.
wally kulecz
If you are talking about this TinyYolo model for the Movidius from the appzoo:
https://github.com/movidius/ncappzoo/tree/master/caffe/TinyYolo
Its not the same model used in this tutorial.
I’ve played with it and it was really poor at detecting people and really good at finding people in shadows (false positives) so it was useless for my purposes.
The YOLOv3 model used here has performed admirably on the test images where the TinyYolo model from the NCS appzoo (linked above) failed miserably.
If there is a Movidius version of this YOLOv3 model, point me to it and I’ll give it a try and report back.
Adrian Rosebrock
That was the one I was thinking of, thanks Wally. I’m not aware of a YOLOv3 model for the Movidius though.
wally kulecz
Looks like a Movidius NCS2 using the Myriad X is available, the splash pages suggests “up to 8X faster” than the Movidius:
https://software.intel.com/en-us/neural-compute-stick
They are available, I just ordered one from Mouser for $99 + tax and shipping.
No mention of Raspberry Pi support, for now. It looks like the the OpenVINO toolkit will be required to use it. They are supporting Windows 10 for this one. Its a free download:
https://software.intel.com/en-us/openvino-toolkit/choose-download/free-download-linux
The Pi is where this improved device could really help, but it looks like it needs USB3 and a specific driver which may explain the lack of Pi support.
I’m expecting a challenge to get the tool kit install.
Adrian Rosebrock
Intel sent me a NCS2 but I must admit that I never unboxed it 🙁 I’ve been too busy releasing the 2nd edition of DL4CV. I’ll have to carve out some time and play with it as well 🙂 Thanks for the motivation, Wally.
wally kulecz
Perhaps a bit more motivation.
I installed the openVINO SDK on that old i3 system that I mentioned in another reply (failed with library version errors on Ubuntu 18.04 , so I installed 16.04 to the free space on the drive and dual boot).
Running their C++ interactive_facedetection_demo sample code with a USB WebCam I get these results:
NCS facedetection: ~16 fps face analysis: ~3 fps
NCS2 ~42 fps ~10 fps
CPU ~17 fps ~5.4 fps
Note that the CPU needed an FP32 model where the NCS used fp16. As with my MobleNet-SSD Python code, the CPU on the i3 is about the same as the Movidius NCS, but the NCS2 shows very worthwhile improvement.
The SDK auto-detects NCS vs NCS2 so it was just a matter of unplugging the NCS and plugging int the NCS2 to get these numbers from the live openCV overlay.
The SDK compiles openCV v4.0.0-pre. It appears to support Python Virtual Environment, although I didn’t use one.
The GPU support seems not to work on this old i3-i915 motherboard.
There is a C++ example for YOLOv3 object detection in the installed sample code.
But my first task will be to see if I can re-write my Python code to use the openVINO Python support as from my limited test it looks like one NCS2 might be able to exceed the fps I get with three NCS sticks.
Devin
Hi, Doctor Adrian, very glad to read ur blog. i have a project that should recognize and detect object in the video based on Raspberry Pi 3B+ , my boss wanna i use deep learning method, such as resnet, ssd, yolov3, etc… but, in your blog, i know it’s difficult to achieve real time…what should i do? could u please give me some advice?
thanks!
Adrian Rosebrock
Hey Devin — I cover how to train your own custom object detectors (Faster R-CNN, SSDs, RetinaNet, etc.) inside my book, Deep Learning for Computer Vision with Python. I also discuss and demonstrate how to obtain real-time performance and which model is suitable for various tasks. I would suggest you start there.
Irwin
What an elegant post, and very understandable. When I try this on a video file that has a pick-up truck, sometimes it detects the truck in one frame and detects the same truck as a car in another frame. Do you have any quick suggestions on how to correct this? I would still like to detect both “trucks” and “cars” from the same video.
Adrian Rosebrock
You could do a rolling average over time. Keep track of the top 2-3 predictions over consecutive frames, average them, and pick the label with the highest average probability.
wally kulecz
The downloaded tutorial code runs fine on my Pi3B+ with python3 and openCV 3.4.2, but it takes 14 seconds to process an image. Can’t imagine how this could be of any use beyond a demo.
sset
Thanks for great article.
How do we custom train for customized dataset?
Adrian Rosebrock
I provide code and discuss how to train your own custom object detectors on your own datasets inside my book, Deep Learning for Computer Vision with Python.
Alex
Hello Adrian, which GPU did you use to achieve this performance?
Adrian Rosebrock
I did not use a GPU, it was CPU only. OpenCV’s “dnn” module does not yet support many GPUs.
Cenk Camkoy
This is really very cool. Thanks for sharing all these together with your valuable benchmarks. By the way, out of my curiosity, do you know what type of object detector is used in Google’s autonomous cars? SSD or other?
Adrian Rosebrock
Hm, no, I don’t know what Google is using in their autonomous cars. SSDs are rooted in Google research though so that would likely be my guess.
JBeale
YOLO may not win on real-world metrics, but it is clearly #1 in readability of the associated papers.
Adrian Rosebrock
Agreed 🙂
Javier
Hi Adrian, great post!!! image tagging systems as Imagga, work as an object detector? Or by image similarity?
Max
Hi,
It is possible to make it up and running on a GPU?
Adrian Rosebrock
That depends. OpenCV’s “dnn” module currently does not support NVIDIA GPUs. It does work with some Intel GPUs though.
Bob Estes
To be clear, your performance numbers for YOLO and SSD are for a CPU version, not a GPU version, right? Thanks.
Adrian Rosebrock
That is correct. YOLO can run 40+ FPS on a GPU. Tiny-YOLO can reportedly get past 100+ FPS.
julio
If you work OpenCV with CUDA support, can you achieve 30FPS in real time? … I mean …
1. YoloV3 + module dnn + CPU is very slow
2. YoloV3 + module dnn + GPU that FPS speed could reach for real-time applications?
How could I use Yolo in real time on a laptop GPU like Asus’ GeForce 930MX?
Adrian Rosebrock
See my replies to the other comments in this post — OpenCV does not yet support NVIDIA GPUs for their “dnn” module (hopefully soon though). That said, YOLO by itself can achieve 40+ FPS when ran on a GPU.
kelemu
Hi Adrian, I am waits like this tutorials but now I am lucky to get from you really tanks a lot. How to train YOLO with our datasets?
Adrian Rosebrock
I don’t have any tutorials for training YOLO from scratch. Typically I recommend using SSDs or RetinaNet, both of which (and Faster R-CNNs), are covered inside Deep Learning for Computer Vision with Python.
Sam
Thanks Adrian.. great post.
Can I use it with Movidius NCS with custom dataset?
Adrian Rosebrock
Take a look at Wally’s comment.
Robert
Thanks for suggesting to read the Yolo v3 research paper, that’s easily the most entertaining and honest research paper I’ve ever read, all the way to the last line!
Adrian Rosebrock
Awesome, I’m glad you enjoyed it Robert!
Hemant
Hey Adrian, nice article and very useful. I tried it on Pi 3 and as you stated, it is very slow. I am getting object detection rate of 1 frame per 16 seconds. Processing of the airport.mp4 took little less than 4 hours. Looking forward to your second edition of the book.
Adrian Rosebrock
Thank you for checking YOLO performance on the Pi, Hemant!
wally kulecz
Nice timing on this, I just finished installing Ubuntu-Mate 18.04 on an i3 system. The installation of the Movidius v.1 SDK pulled in openCV 3.4.3 (presumably from PyPi) so I grabbed this sample code and gave it a try.
The yolo is taking ~1.47 seconds.
This is not a powerful machine (1.8 GHz if I remember right), but I’m getting about 10 fps with MobilenetSSD (from a previous tutorial) and one NCS stick handling 4 cameras (round-robin sampling) and near linear speed up with multiple sticks — 19.5 fps with 2 sticks 29 fps with 3 sticks. This is heavily threaded Python code with one main thread and one thread for each NCS stick and one thread for each Onvif network camera. A 4th NCS ( 9 threads) may be too much of a good thing as it drops to 24.6 fps. Although I had to have two sticks on a powered hub when I added the 4th stick for lack of ports, this may be a bit of a bottleneck as re-running the 3 stick test with two of them on hub dropped about 2 fps.
I hope one of the AI gurus can compile this yolo model for the NCS, although I realize this may not be possible.
Does your Xeon system use GPU (CUDA) acceleration? If so how many cuda cores?
My i7 Desktop has a GTX-950 with 2GB ram and 768 cuda cores, so I’m wondering if its worth the trouble to try and enable it. I need to update its openCV from 3.3.0 to 3.4.3 before I can run this tutorial, so this could be a good time for me to try and activate cuda.
Adrian Rosebrock
I love your multi-Movidius NCS setup, Wally! I would love to learn more about it and how you are using it.
As for my Xeon system, no, there is no CUDA acceleration. Although my iMac does have a Vega GPU so I suppose I could look into trying out the Intel + OpenCV + dnn drivers.
In your case don’t bother with it. OpenCV doesn’t yet support NVIDIA GPUs with their “dnn” module (hopefully soon though!)
wally kulecz
Thanks for the most useful info about openCV and CUDA, maybe for openCV 4.x.x it’ll be worth revisiting. I really appreciate shared experience that saves me from a dead end!
My multi-Movidius Python code uses NCSDK API v.1 and has been tested with Python 3.6 and 2.7 on Ubuntu-Mate 18.04, Raspbian Stretch on a Pi3B+ with Python 2.7 and 3.5, and Ubuntu-Mate 16.04 with Python 3.5 virtual environment (I never setup the virtual environment for python 2.7). If no Movidius are found, it drops down to using your Caffe version of Mobilenet-SSD on the CPU with one thread per camera.
On my i7 with four cameras and three NCS I’m getting ~30 fps (8 threads) and with no NCS I’m getting about the same ~30 fps (9 threads). In each case there is evidence that the AI spends significant time waiting for images
On an i3 (same four cameras) its getting ~29 fps with three NCS, but it falls apart with no NCS only getting ~8 fps and its clear the camera threads that are waiting for the AI threads. Just not enough cores for the CPU AI.
On a Pi3B+ with three cameras its getting ~6.7 fps with one NCS (5 threads), ~11 fps with two NCS (6 threads), and ~13 fps with three NCS (7 threads). Two NCS seems to spend significant time waiting on the AI, while three NCS appears to spend significant time waiting on images, based on summary counts in the threads that the camera thread would block on queue.put() and the NCS thread would block on queue.get().
Right now its only supported input is Onvif netcameras via their “snapshot” URL. The single stick version used your imutils to optionally use USB cameras or the PiCamera module, but I ripped this support out of the multi-stick version as few USB cameras work with IR illumination and only one PiCamera module can be used on a Pi as far as I know.
I need four cameras minimum, my use is for a video security system where a commercial “security DVR” provides 24/7 video recording while the AI provides near zero false positive rate high priority “push” notifications when it is armed in “not home mode”, audio alerts (via espeak-ng) if armed in “at home mode”, and nothing when in “idle mode”.
The Python code does the AI, node-red does the controlling and notifications, and MQTT glues it all together. The basic system has been running since early July and it works extremely well. It continues to evolve, mostly to improve the frame rate and reduce the detection latency — think “bad guys” marshaling on your property for a “home invasion”. But we love it for when the mailman comes or a package is delivered 🙂
I’d be happy to send you the Python code if you are interested, in fact I’d like to see if it works on a Mac. The CPU only part runs on Windows 10 and 7 (no NCS support without way more effort than I’m willing to apply) in limited testing with the single stick (AI thread) version (I’ve removed the Windows support from the multi-stick code). I’ve totally given up on Windows since I retired, but a couple of Windows only friends were interested early on (hence the Win7 and Win10 tests), and I must say that this was by far the best cross-platform development experience I’ve ever had! Python has really impressed me!
I plan to put it up on GitHub eventually, the Ubuntu 18.04 and PyPi openCV install was so easy I finally think I could write a README.md (in a reasonable amount of time) that someone could actually use from a fresh install of Raspbian or Ubuntu.
Adrian Rosebrock
Thanks for the detailed writeup, Wally! Let me know when you publish it on GitHub and I’ll take a look 🙂
faurog
Hi, Dr. Adrian. It would be nice if you tried using an Intel iGPU + OpenCV + dnn module. My laptop has a Nvidia GPU (not well supported yet) and an integrated Intel GPU, but I couldn’t make it work (net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL)). Anyway, if you try something, let us know. I would like to know if it indeed improves the performance. Thank you for another incredible post. Cheers.
Adrian Rosebrock
I unfortunately do not have an Intel GPU right now. I hope to try it in the future though. Perhaps another reader can share their experience.
blank
always cool tutorial, keep it up, have a great day! 🙂
Adrian Rosebrock
Thanks, you too 🙂
Balaji
Hi,
Nice tutorial for Yolo and valid comparsion with other object detection models.
I want to detect small objects, so more interested in Faster-Rcnn resnet models, In this blog I can see you have mentioned they will outperform with ~5fps. I am using Faster-Rcnn resnet101 model in GPU 1080, but I am getting only 1.5 fps.
Can you please suggest how to improve the speed.
And as a user want to ask, When can we except a blog on Faster Rcnn Models and their advantages with custom training.
Thank You
Adrian Rosebrock
Hey Balaji — I actually show you how to train your own custom Faster R-CNN models on your own datasets inside my book, Deep Learning for Computer Vision with Python. I also provide you with my tips, best practices, and suggestions on how to improve your model performance and speed. Be sure to take a look, I think it will really help you out.
Jacob
What performance do you expect when run with a Tesla V100 GPU with 608×608 images? With darknet, I can process images with yolo between 80-90 fps. Yolo is typically much slower when implemented in python–does this opencv implementation also have a significant reduction in performance compared to darknet?
Adrian Rosebrock
OpenCV doesn’t yet support NVIDIA GPUs with their “dnn” module so we cannot yet obtain that benchmark. NVIDIA GPU support is coming soon but it’s not quite there yet.
adam_Viz
Oh Adrain!!! Awesome,am implemented successfully without any hasle..thankx for your contribution .
Adrian Rosebrock
Thanks Adam — and thank you for being a PyImageSearch reader.
Alexander
Hello, Adrian!
What could you think about problem with real-time video from web-cameras? In our project (on-line detecting cars and peoples) when we used OpenCV3 with real-time video, we got big delay between frames… We solved this problem, but now we don`t using real-time video-streams from OpenCV.
Could you have sample with real-time stream, not mp4 or avi-files?
Best wishes, Alexander,
Russia, Novosibirsk.
Adrian Rosebrock
Keep in mind that deep learning models will run significantly faster on a GPU. You might want to refactor your code to use pure Keras, TensorFlow, Caffe, or whatever your model was trained with, enabling you to access your GPU. More GPU support with OpenCV is coming soon but it’s not quite there yet.
TAYFUN ARABACI
very very nice Adrian :=)
Adrian Rosebrock
Thanks Tayfun!
Riad
Great tutorial ! But I notice that the code doesn’t work with grayscale images. Is there some parameters I can tweak to make it work?
Adrian Rosebrock
YOLO expects three channel RGB input images. If you have an input grayscale image just stack it to create a “faux” RGB/grayscale image:
image = np.dstack([gray] * 3)
Anusha
Hey Adrian, this is a great post and I really liked the way you put everything in sequential order. I have a question though. I was wondering how can I replace the YOLO model for this object detection with Faster RCNN to suit my purposes as I have fairly small objects in my videos which I need to detect. I mean is there a deploy model and prototxt available for Faster RCNN?
Adrian Rosebrock
Yes, you would:
1. Train your Faster R-CNN on whatever dataset you are using
2. Then take the prototxt and Caffe model weights and swap them in
Keep in mind that loading Faster R-CNN models is not yet 100% supported by OpenCV yet. It’s partially supported but it can be a bit of a pain.
Sophia
yet another amazingly informative tutorial! how does the speed-accuracy tradeoff of SSD compare with that of RetinaNet? thanks,
Adrian Rosebrock
In my experience RetinaNet tends to be slightly slower but also (1) slightly more accurate and (2) a bit easier to train.
sophia
thank you for replying, Adrian. that’s helpful information.
joeSIX
This is a great tutorial, can’t thank you enough.
unfortunately, I was unable to test it on my own (macbook pro, anaconda environment, opencv 3.4.2):
error: (-215:Assertion failed) ifile.is_open() in function ‘ReadDarknetFromWeightsFile’
Adrian Rosebrock
Double-check your path to the input weights and configuration file. It sounds like your paths may be incorrect.
Adrian Rosebrock
Keep in mind that the YOLO model is not accessing your GPU here. The YOLO + OpenCV implementation is running on your CPU which is why it’s taking a long time for inference.
Jason
Adrian, as always, you have a nice tutorial. Thanks a lot.
You can speed up the YOLO model on CPU by using OpenMP. Open makefile, and set AVX=1 and OPENMP=1.
Adrian Rosebrock
Thanks Jason. How much of a speed increase are you seeing with that change?
Jason
I have not had the chance to download your codes yet. I am currently using my own data to train YOLOv3. It takes a lot time to prepare the images for training because you have to draw a bounding box for each objects in each images. Once I finish the training, I will let you know the speed difference between turning OpenMP on and off in prediction.
By the way, you can also set OpenCV on and off in YOLO.
git-scientist
Hi Jason, could you give some detailed info about OpenMP? How one should make use of it? And, where does that makefile reside?
Yurii
Hi Adrian,
Is there a way to specify particular object to detect? For instance only cars and stop signs. It should speed up process I suppose as resources are not wasted on recognition of other objects.
Adrian Rosebrock
You can fine-tune the model to remove classes you’re not wanted in but keep in mind the number of classes isn’t going to dramatically slow down or speedup the network — all the computation is happening earlier in the network.
Ramkumar
Hi Adrian,
The article you explained very interestingly for a beginner. Can we implement a smoke detection from image using yolo? Or only for hard objects?
Adrian Rosebrock
Object detectors work best for objects that have some sort of “form”. Smoke, like water, doesn’t have a true rigid form hence YOLO and other object detectors would not work well for smoke detection.
Marcelo Mota
Thanks for another great tutorial, Adrian!
Could you please explain in more details lines 41 to 43? Why do you get layer names, and unconnected layers? And why that ” – 1″? code below:
# determine only the *output* layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] – 1] for i in net.getUnconnectedOutLayers()]
And also line 52, why do you need to do a forward pass in just the “ln” layers? code below:
layerOutputs = net.forward(ln)
thank you!
Marcelo
Adrian Rosebrock
The YOLO model is trained via the Darknet framework. We need to explicitly supply the output layer names into the call to “.forward()”. It’s a requirement when using Darknet models with OpenCV.
tauno
Is it so that the function getUnconnectedOutLayers() is used to obtain indexes of the unconnected output layers in order to find out how far function forward() must run through the network?
I don’t understand why these output layers are denoted as unconnected. Does that mean that in certain cases we wouldn’t run our data through the whole network?
If so, why? Another thing bugging me about the line in which we use the getUnconnectedOutLayers() function is the ln[i[0]-1 part of it. I believe that is some way of traversing ln array in reverse, but I don’t fully understand it.
JC
For my understanding, the value i is like an index that represents which layer it is in the model which is started from 1. And there are a total of 254 layers in the model (you can see the length of ln to find out). So you need i[0] – 1 to match the python indexing which starts from 0.
You can print out the results to have a clearer understanding
for i in net.getUnconnectedOutLayers():
print(i)
print(i[0])
print(ln[i[0] -1])
Taha
I’m getting 0.5 fps on a 1.7Ghz processor which intel core i3 4th gen. Is that okay speed for this model and system.
Adrian Rosebrock
Given that the model is running on the CPU, yes, those results seem accurate.
Andrew
Hello Adrian,
Nice tutorial…have you tried running YOLOv3 in C, given that it was originally written in C?
I think there are some python wrappers out there for the datatypes
Adrian Rosebrock
I haven’t tried in C but I know there is the darknetpy wrapper which can be used to run YOLO on a GPU.
Thanks a lot!
Hi, Can you please tell me if I can run this code on Windows or not? I am stuck in Windows and cannot find a comprehensive tutorial of Yolo in Windows. Please help.
Adrian Rosebrock
This tutorial will work on Windows provided you:
1. Use the “Downloads” section of the tutorial to download the code + trained YOLO model
2. Have OpenCV 3.4.2 or higher installed
Sunny
Hi Adrian,
If I want to only detect the red car inside the car chasing video by using YOLO, any suggestions on fulfilling the goal? Thank you
Adrian Rosebrock
You would:
1. Filter on the “car” class, ignoring all other non-car detections
2. Determine the object color
moxran
Hi,
Since this is a yolo detector with OpenCV, it is not using the gpu, right? I’m getting 1 fps on a Intel core i7 2.2 Ghz processor, which is really slow. any reason you can see? Thanks!
Adrian Rosebrock
Correct, the YOLO detector is running on the CPU, not the GPU. Please see the other comments on this page where I’ve addressed OpenCV’s GPU capabilities.
Dheeraj
Can we count people using YOLO based Approach? If so, then what changes should be made in the code or need to use my own data set to train the model?
Adrian Rosebrock
I actually describe how to build an people counter in this tutorial.
Jelo
Hi Adrian,
First at all let me thank you for your all posts, really it is very useful for all.
I would like to ask you can we use the deep learning to estimate the detect object position ? can you share some links ?
Thank you again
Adrian Rosebrock
Could you elaborate on what you mean by “object position”? What specifically are you trying to measure?
Oscar Mejia
Hi Adrian, first of all, Thanks for your help and time to do this kind of tutorials. I really appreciate your help.
Adrian, I would like to know if you recommend these algorithms to apply in a project to identifying, tracking and counting people in real time, if not what technique would you recommend me?
Thanks in advance.
Oscar Mejia
I forgot mentioning that the project is for doing it on a raspberry pi 3 b+.
Thanks.
Adrian Rosebrock
Take a look at my tutorial on building an OpenCV people counter. I include suggestions on how to adapt it to the Raspberry Pi.
Steve
Hi,
After running the yolo_video.py, it does’t display the video window, Why?
Adrian Rosebrock
The YOLO video file does not display the frame on your screen, it just writes it to disk. To display the frame to your screen you can use
cv2.imshow
Aiwenj
Hello ,Adrian.Thank you for your post!and i have a question that i want to know the number of people in an image.how to do it using YOLO?
Adrian Rosebrock
You would loop over the number of detected objects, use an “if” statement to check if it’s a person, and then increment a counter.
Abhijeet
hi am getting this error while running your code No such file or directory: ‘yolo-coco\\coco.names please reply me
Adrian Rosebrock
Make sure you are using the “Downloads” section of this blog post to download the source code and example models. It sounds like you don’t have them downloaded on your system yet.
Abhijeet
Thanks man code is really understandable.. please tell my purpose of unconnectedoutput layer (ln) in code
Abhishek
Does this work with GIFs? Also I’m getting “error: (-215:Assertion failed) !ssize.empty() in function ‘cv::resize’. Any fixes? maybe the input image seems to be empty but I’m not sure about it
Adrian Rosebrock
No, OpenCV does not support loading GIFs. You’ll want to convert the GIF to a series of JPEG or PNG frames, then feed them through the YOLO object detector.
frank
Hi Adrian, thanks for your great tutorials. I have a question about the line 70 of source code in yolo.py. the length of detection is 85. detection[0:4] represent coordinates ,width and height. detection[5: ] represent the probability of 80 objects. I find the detection[4] is not used. so I want to know what detection[4] stands for.
Daniele Bagni
EXCELLENT TUTORIAL, Adrian as usual from you. Thank you very much for sharing your knowledge!
Adrian Rosebrock
Thanks so much Daniele!
Guille Lopez
Hi Adrian. Great tutorial! Extending your code I’ve been able to add the SORT algorithm to create a first approach to a traffic counter. I was thinking on open sourcing the code on Github (link removed by spam filter). Is it ok for you if I do that? I will cite your tutorial, pointing back to this page.
Adrian Rosebrock
Hi Guille — congratulations on building a traffic counter, awesome job! Yes, feel free to open source the project, please just link back to the PyImageSearch blog from the GitHub readme page. Thank you!
Dnyaneshwar
hi Adrian ,
i am new to deep learning computer vision , i have downloaded the source code from your site.
please let me know what set up is required and steps to run the program on windows
Adrian Rosebrock
Please note that I only support Linux and macOS on this blog. I do not officially support Windows nor do I provide Windows install tutorials. I would suggest you follow my OpenCV install guides on either Linux or macOS to get up and running.
Dnyaneshwar
HI Adrian ,
i am new to deep learning computer vision. can you help me how to get started building application for object detection using intel openvino toolkit.please provide the steps to create application and run it
Adrian Rosebrock
At this time I do not have any tutorials on Intel’s OpenVINO toolkit. I will consider it for the future but I cannot guarantee if/when I may write about it.
Nahael
Hi Adrian, thank you for the tutorial,I;ve been following your post for a while now,I would like to know if its possible to traiin my own dataset to detect violent scenes in videos,it will be very kind if you could help us with that,thank you again.
Adrian Rosebrock
What you are referring to is called “activity recognition”. I don’t have any tutorials regarding activity recognition (yet) but I do have a chapter inside Deep Learning for Computer Vision with Python which does show you how to detect and recognize weapons in images and video. That may be a good starting point for your project.
Chris
Cheers for this Adrian, it’s been exactly what I needed.
I’ve now adapted the code to work with my home CCTV!
Before my CCTV would FTP a short video when motion was detected to my server, and then I would use python to split the video into frames and email a picture of frames from 1sec, 3sec and 5sec to my email address, so where ever I am in the world I get an image of what triggered the motion sensor.
Problem is, that I kept getting images of cats, birds, heavy rain etc.
What I’ve done now is edit your code, so now when I get images from 1, 3 and 5secs. I run them through the code, check if the label is “person”, “car”, “truck” etc. and if so then attach the images to the email and send it.
No more false alerts!!
Thanks again and I love the Guru course too, having some real fun with that
Adrian Rosebrock
Awesome, congratulations on adapting the code to your own project Chris!
Jammula
hello,
Adrian Rosebrock,
Thank you for great tutorial ,I am facing problem while executing the yolo.py for object detection in images through terminal in Jupiter notebook.
Issue:
I am getting “cannot connect to X SERVER ” error
My server details:
i am using Nvidia GeForce GTX 1080 Ti with 11173 MiB memory
Thank You in advance.
Adrian Rosebrock
What line of code is throwing that error?
Kunal Gupta
Thanks for the post Adrian!
I’d like to know that it’s said that YOLO is faster than SSD’s so technically, on real time video feed, it should outperform them?
But, when I ran it, I got 15fps in MobileNet SSD’s and around, 3fps in YOLO.
What can be the issue?
Thanks!
Adrian Rosebrock
You are correct that YOLO should be faster than SSD but as you found out and as I noted in the “Limitations and drawbacks of the YOLO object detector” section of the guide YOLO appears to be slower. I’m not sure why that is.
Ramar
adrian
how to downloads the yolomodule file.and how is it open for the windows os system
Adrian Rosebrock
You can use the “Downloads” section of the blog post to download the YOLO model. The code will work on Windows.
Sanjay Swami
Hello Mr. Adrian Rosebrock
I am very glad that I found your blog and I have started tutorials given by you.
I am using this tutorial in my project. My project is to find the ball and track it. So will this program work on “Raspberry Pi” ?
Thank you in advance.
Adrian Rosebrock
No, the object detector will run far, far too slow on the Raspberry Pi. Is your ball a colored one? If so, follow this tutorial on simple color thresholding and you’ll be able to complete your project.
Sanjay Swami`
Thank you for your information.
Mike
Hi Adriian,
You have mentioned that the Raspberry Pi is too slow for object detection…does the newer Pi hardware perform any better? I’ve thought about using your tutorial to build a weapon detector using a Pi B+ with your pre trained dataset.
Would this be too under powered?
Thank you,
Mike
Adrian Rosebrock
No, the new hardware is still too underpowered. You should look into using the Movidius NCS.
Mansoor alam
Hello sir, hope you will be fine.
sir, when I run this code I always get this error as below.
yolo.py: error: the following arguments are required: -i/–image, -y/–yolo
Sir, could you please tell me that what I am doing wrong and what is the solution to this error?
Adrian Rosebrock
If you are new to command line arguments, no worries, just make sure you read this tutorial first.
Mansoor alam
thank you sir. it worked 🙂
Mansoor alam
hello, sir hope you will be fine.
sir, I used ur code which is ok and working for images but when I run the code for video it works ok and all results displayed like total time and frame time etc but at the end, the video does not display in which detection occur.
so, what should I do???
Adrian Rosebrock
The output of the video is not displayed to your screen, it’s instead written to disk as an output video file. Check your output video file.
smalldroid
Hi Mr Adrian, thanks for your great tutorial about YOLO? Are you going to make addtional tutorial about how to train YOLOv3 model with Cocodataset (using Keras, Pytorch)?
Adrian Rosebrock
I actually show you how to train your own custom object detectors inside Deep Learning for Computer Vision with Python.
Aqsa
Hi Adrian,
Firstly, amazing tutorial. Great help (y)
I want to run yolo on imagenet dataset. I downloaded weights and configuration files for imagenet YOLO from darknet project website. Then I plugged them in the code (weightspath, cofigpath, lines 30, 31 respectively). I also edited line 21 as required.
But it did not work for me. Is there anything additional that I should be doing?
Thank you in advance.
Daniel Spencer
Hey Adrian! ,
Great work , I just want to ask something real quick. It appears that you’ve trained yolo to detect a wide variety of objects which is amazing, but it takes somewhat of a long time on my computer to run!. The problem is , I’m only interested in detecting cats and dogs in my project ( using the coco dataset as well ) , so what would the procedure be in order to train a YOLO model similar to the one you did for these two specific classes.
Thanks!
Adrian Rosebrock
Instead of training your network from scratch I would instead recommend performing fine-tuning. I cover how to train your own custom object detectors, including fine-tuning them, inside my book, Deep Learning for Computer Vision with Python.
Shivam
Hello Sir,
This is a great tutorial. Can you please elaborate more on Real time Object Detection using my own laptop’s webcam?
Adrian Rosebrock
Sure, see this tutorial.
Sannan
Hy Adrian it’s good tutorial. I want to detect objects from my webcam(Live Video). So how can i implement this code?
Adrian Rosebrock
This method will be too slow to detect methods in real-time on a GPU. For real-time object detection on a CPU I would recommend you follow this tutorial.
RJ
Why does this code not display the output video..but only prints the result for video????…Can you help me with how can I get the output video on screen with bounding boxes..
Adrian Rosebrock
This code takes the output detections and writes them to a video file. You can use
cv2.imshow
to display the results to your screen.OzgurG
Hi Adrian,
Do you think is there a way to calculate the speed of the vehicles from a fixed camera using YOLO or other modern CNN algorithms while doing vehicle tracking?
It is like to replace the traditional traffic speed cameras with CNN…
Adrian Rosebrock
Yes, I will be covering speed calculation in my computing Computer Vision with Raspberry Pi book. Stay tuned!
Bruce Dai
Great post, I never know CV2 has a dnn.py module that is so useful. I tried this script on some test images, but it seems that YOLO is doing poorly recognizing cars. It mislabeled some of the cars as cell-phones in my images. 🙁
Adrian Rosebrock
Hey Bruce — you may want to try a more accurate object detector such Single Shot Detectors or Faster R-CNNs. In practice I’ve found that both SSDs and Faster R-CNN perform better “in the wild”.
Maning
Hello! Thanks for the tutorial! I followed it but used a model that I trained on yolov3-tiny-obj.cfg instead. However, the results are different compared to the results I get when I run the detector using the command line. Any idea what could be the cause of it?
adel
i use pycharm , anaconda , visual studio and google colab . i need all of them .
in vs and pycharm : where should i address the parametes ?
–image baggage_claim.jpg –yolo yolo-coco
i just can tun the cod in pycharm and i have error in othe enviroment
tnx for your great contents
Adrian Rosebrock
You’ll need to set the command line arguments via PyCharm. Make sure you read this tutorial which includes an example.
Eduardo
Hi! this was a great tutorial! I was wondering if there could be some way to use the same code but with the YOLO9000 .cfg and .names file. I have already tryed on my own but the program crashes.
Shaon
Is YOLO object detection available with OpenCV.js ?
Mohammd
Hi Adrian,
I run that your python script and ./darknet detect …. from original website YOLO, but the result of them not same , why ? the result of original command website YOLO is very correct.
why ?????
Adrian Rosebrock
Are you sure you’re using the same YOLO model versions? If so, it could be a difference in the NMS parameters. I’m not sure what the default DarkNet NMS parameters are, you may need to refer to the documentation/source code.
Jean-Michel
Hello,
I’m comparing the results obtained with (1) yolov3 built from https://github.com/jaskarannagi19/yolov3 with (2) the results obtained with your code.
I’ve noted that the results are not the same at all (and not only the probability). For example, in some cases, the 1st yolov3 detects a car while the « dnn » yolov3 detects nothing.
For the 1st case, the command is :
./darknet detect cfg/yolov3.cfg yolov3.weights data/test.jpg -i 0 -thresh 0.25
For the 2nd case, the command is :
python yolo.py –image images/test.jpg –yolo yolo-coco –confidence 0.25
It should be noted that the config files (yolov3.cfg) are strictly the same. The weights files (yolov3.weights) too.
Best Regards.
sidrah
nonetype attribute error
Adrian Rosebrock
Your path to the input image is incorrect and “cv2.imread” is returning “None”. Double-check your input image path. You can read more about NoneType errors, including how to resolve them, here.
karan
Hi Adrian. Detection[0:4] return centerX, centerY, width, height. Detection[5:] returns the probability score for each classes. What does Detection[4] mean?
Adrian Rosebrock
The “detection[4]” would be the height of the bounding box.
leo
Hi Adrian, isn’t Detection[0:4] are the centerX, centerY, w, h, and Detection[4] should be the ‘confidence’ (objectness) of that box? Thanks!
yoming
how can i use centroidtracker in yolo-detection,the coordinates are always wrong,i use “simple object tracking” to modify.
Edouard
Hi Adrian and all,
I am training a Densenet model on X-Ray pictures. I would like to localize some indications on these X-Ray pictures. Do you have hints to adapt Yolo or any advice ?
Thanks
Adrian Rosebrock
That’s much more of a complicated problem but absolutely doable. I would recommend referring to Deep Learning for Computer Vision with Python where I suggest how to train your own custom deep learning object detectors and instance segmentation networks.
I would also be very curious to know which dataset you are using. I like medical image datasets 🙂
Manohar Sonwan
Hi Adrian,
This tutorial was very helpful.
I have been working on object collision problem where I want to detect the collision between tennis racket and ball. I have used “YOLO.h5” model to detect those two object and I have achieved that detection part in my code, but I want to detect the collision and pause the video whenever a collision happens.
Adrian Rosebrock
I would recommend you instead perform instance segmentation with Mask R-CNN. You can then check and see if the two masks overlap (use bitwise operations for that). If you’re new to image processing make sure you read through Practical Python and OpenCV so you can learn the basics first, including bitwise operations.
Elena
Hello Adrian,
Thank you for the post. I have tested this method on my video, but as you mentioned this method does not use GPU and it is not real time.
My project is about detecting objects through live and stream camera. I should use YOLO V3 for that. I think OpenCV would be useless for my project since it does not support GPU.
what frameworks should I use to get this project done?
Could you please guide me? I am quite desperate.
Adrian Rosebrock
Take a look at “darknet”, the author’s implementation of YOLO. It is capable of running on a GPU.
Elena
Thanks for the reply. I think I did not clearly ask my question.
My question is:
Does “dnn” module of OpenCV support GPU now? How we can run OpenCV on GPU?
Adrian Rosebrock
It really depends on what type of GPU you are using. The most popular types of GPUs are NVIDIA GPUs which OpenCV’s “dnn” module does not yet support (hopefully soon though!)
Elena
I am a bit confused. I run your method, and I am giving 30sec video as an input and the output (processed video) is also 30 sec. I expected to get a longer video in the output since the code is running in CPU. How this is possible that the motions in the video are not sluggish and the video is smooth.
Adrian Rosebrock
The output video doesn’t care how long it takes for a new frame to be added to it. We just specify the output video FPS and that is the FPS rate that the video plays back at.
Rishabh Kachhwaha
Hey Adrian,
My yolo_video.py file is running without any error but there is a bug the output file is of few a milliseconds, instead of being of full lenght as of the original video.
Help needed..
Akbar
how to implement the video stream using flask, so the video stream will play on web? not just using the command line only
thanks
Adrian Rosebrock
I’ll be showing how to stream the output of the YOLO object detector to the web using Flask in my upcoming Computer Vision + Raspberry Pi book, stay tuned!
mhadi
Hi
could you please explain or reference the YoLo architecture ?
Adrian Rosebrock
I provide links to the YOLO papers in the guide, you can refer to them for details on the YOLO architecture.
Vasilis
Hello,
Thank you for this tutorial but unfortunately when I run it in terminal I get this error:
writer.release()
AttributeError: ‘NoneType’ object has no attribute ‘release’
I saw some other people had similar problem on this forum so I went to the ‘Nonetype’ tutorial you suggested, but the causes described there do not match my issue.
I downloaded the source code so the input file path should not have a problem.
The webcam is accessible via OpenCV because I tried it in other scripts.
‘Not having the proper video codecs installed’? Not sure what this is but I successfully ran your face detection tutorial so maybe that is not the case either. Any ideas please?
Kind regards
Amir
Hi
this issue appear when the packages of openCV did not install correctly.
therefore, you have to install openCV and all its corresponding packages again and with more accurate.
Khushboo Katariya
can we applied that code on Live videoes capture at that time by camera.
Adrian Rosebrock
Technically yes, but the FPS is going to be pretty low (in the order of 1-5 FPS on a CPU). OpenCV’s “dnn” module doesn’t yet support NVIDIA GPUs for the “dnn” module.
Amir
Hi
can you introduce some tutorials and codes on Live video capture?
also a tutorial for run that code on NVIDIA GPUs?
tnx
Adrian Rosebrock
I cover how to perform object detection in real-time, including using NVIDIA GPUs, inside my book, Deep Learning for Computer Vision with Python. I would suggest starting there.
Elena
Hello Adrian,
I was wondering if you could tell me why the original YOLO3 (from the original website) is capable of detection more object than the implementation of YOLO3 in Keras and OpenCV with the same weights?
and could you please explain why there is no training in this method that you posted here?
Thanks in advance.
Adrian Rosebrock
1. Do you mean capable of detecting more object classes?
2. The model we are using here is pre-trained. Since it’s pre-trained on the COCO dataset we do not have to train it ourselves.
pavan
This project is very helpful for me
Thanks a lot
Can i have the documentation for this project?
Adrian Rosebrock
I’m not sure what you mean by “documentation”. The entire post and code is documented.
student1
hi,
nice work , i successfully run it on some different videos.
it’s detetcs very accurate.
one question : it’s possible to yolo_video.py to also counting the vehicules and the pedestrians?
best regards.
Adrian Rosebrock
Absolutely. See this tutorial.
James Adams
Brilliant post, Adrian. Thanks for all the effort that goes into making your tutorials so easy to understand and put into practice.
My understanding is that this model (or any other object detection model) can be trained against any (properly labeled and formatted) dataset in order to detect objects not detected by the available pre-trained models, such as those trained against ImageNet or COCO. If so then is the training process fundamentally different from the type of training that’s described in the [tutorial for facial recognition](https://pyimagesearch.com/2018/09/24/opencv-face-recognition/)?
Also is there a way to incorporate [fastai](https://www.fast.ai/) into the process? I ask because fastai seems to be next-level over something like Keras, but I don’t know enough about it yet to know if it’s very relevant or necessary for a model training effort like this.
Kind regards, and thanks again.
Adrian Rosebrock
I’ve only briefly played around with the fast.ai library. I felt that it abstracted Keras/TensorFlow a bit too much. I think Keras is the perfect level to sit on top of TensorFlow. Plus, if you need any additional TF functionality you can drop down into it from Keras.
As for your training question, yes, provided you have the bounding boxes + labels you can train an object detection network. However, exactly how that network is trained can be very different based on the (1) the network itself and (2) the library that has implemented it.
If you or anyone else is interested in learning more about how to train your own custom deep learning-based object detectors you should definitely read through Deep Learning for Computer Vision with Python where I cover them in detail (including code).
CVee
Hi Adrian,
this is an awesome tutorial! Both the image and video implementations work great.
One thing I’m not overly excited about though is the speed issue, especially when the video file you’re testing is long.
Talking about the speed issue, I have a few questions to ask you:
1) What do I need to change in the code to make it work for real-time object detection on CPU without too much of a delay?
2) If I want to detect just a few objects instead of all the 80 classes the model is trained on, can I just replace the yolov3.weights with a pre-trained weight file from other sources?
I’m going to get your book because I want to learn more about computer vision.
Adrian Rosebrock
You can’t use this method of OpenCV + YOLO for true real-time performance on a CPU. Follow this tutorial which uses a SSD with OpenCV to achieve real-time performance.
Nicolò
Hi Adrian,
Thank for the tutorial, i successfully run on some pics and videos.
I don’t understand how you are parsing the output of net.forward().
What is the object ‘detection’?
Where can I found the documentation of this output?
The format is the same for every model or this case refers only to YOLO?
I searched within the opencv reference but I can’t find it.
Thanks
Adrian Rosebrock
The format of the output volume is dependent on what model you are using. If you take a look at the post + code I have documented what the output variables are. Please give it another read as that should clarify your questions.
med
hi ; thnks for the code but it takes long times for video ;i hava a raspberry pi 3 b+ ;
Adrian Rosebrock
The Raspberry Pi 3B+ is not suitable for running YOLO as-is. I’ll be covering how to run YOLO on the Pi in my upcoming book, stay tuned!
Dina
Your tutorial is very great.
I try and succes.
But i have problem, when the INFO Cleanup will show up ?
Open CV will be show up or we must open the output in folder output ?
I try to write cv2.imshow(‘frame’,frame) after write release. But i haven’t get [INFO] Cleanup in my terminal, so i can’t showup my opencv.
Could you please help me ???
Adrian Rosebrock
The “cleaning up” is only executed after the script has finished processing all frames in the video file.
chow
I am trying to learn. Can you please help in making a counter when a car passes and also show speed just above the frame.
Adrian Rosebrock
I’ll be covering that in my upcoming Computer Vision + Raspberry Pi book, stay tuned!
Emma
Hi, Adrian,
I’ve read the article “Python, argparse, and command line arguments” but still had problem executing on Jupyter notebook.
Can you give me some hint, please?
Thanks and have a nice day.
Adrian Rosebrock
There is a section in that post that shows you how to modify the code to work with Jupyter Notebooks. Have you given it a try?
sidra
thankyou sir for sharing code with proper guide. can you please tell me about how to run yolo with open cv in real time.? plz plz, this means alot .it will be a great support .
Adrian Rosebrock
Please refer to the post and the comments in the post. I have already discussed what would be required to run YOLO in real-time.
ArxivfromFrance
Hi,
thank you, i have two questions,
– Where can i find the official documentation of Yolo-OpenCV
– Is your code Open source ? i mean can i use it for my projects in company ?
Thank you and have a nice day 🙂
Erdal.J
Hi Adrian,thanks for sharing this kind of things. I know that also we can count the objects detected in video, but I want to know: How we can know in which seconds of video which objects are detected? I need youre suggestioni,thanks 🙂
Samuel Leong
Hi! This is a great article. I’d like to make some observations:
1) In terms of speed, YOLOv2 is almost 3 times faster than YOLOv3 (~0.15 seconds/frame vs ~0.6 seconds/frame). The difference in accuracy (at least in my testing) isn’t that great compared to YOLOv2, so use that if you cannot take the slow speed. (Of course, the YOLO2/3-tiny versions are the fastest, but also super inaccurate)
2) From the comments above, it’s uncertain if the latest versions of opencv or dlib have GPU support now, but I actually suspect that the answer is YES.
Running YOLO2 on darknet’s repo (supposed to use CUDA/GPU) gave me images processed in around 0.15 seconds. When I ran it with no GPU support, it was almost 10s. This python code gave images processed in about 0.2 seconds, so I think that GPU is supported now. (Not sure tho)
Adrian Rosebrock
The dlib library does have GPU support but I’m not sure why you’re mentioning that in this particular post?
OpenCV’s “dnn” module supports some GPUs but not NVIDIA Ones. Hopefully that will be changing this summer.
kenneth
“On my machine with a 3GHz Intel Xeon W processor, a single forward pass of YOLO took ~0.3 seconds; however, using a Single Shot Detector (SSD) from a previous tutorial, resulted in only 0.03 second detection, an order of magnitude faster!”
But YOLO claims it has a very high FPS (capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get up to 155 FPS.)
Why it takes 0.3s (FPS = 3) now?
Thanks….
Adrian Rosebrock
Correct and I discuss that in the post — I believe it’s an issue related to the OpenCV implementation.
Mik
Hi! Adrian Thank you for this article the YOLO object detection in images works perfectly fine
But I am having a problem with the the YOLO object detection in video streams. I am encountering an Error in line 168
writer.release()
AttributeError: ‘NoneType’ object has no attribute ‘release’
can you provide an explanation?
Adrian Rosebrock
Did you use the “Downloads” section of this post to download the source code? Or did you copy and paste? It sounds like it may be a copy and paste or accidental error inserted into the code. If you’re not writing to video then the “writer” object should be “None”. In this case, somehow, the “writer” object thinks it was instantiated. Go back and check any edits you have made to the code.
Wahyu Istanto Bram W
Mr. Rosebrock, do you know python module to draw in laptop / PC screen?
imagine if we open laptop, open browser, watch Youtube, then…the line show up and people think it come from youtube…?
or watching movie in full screen mode and the line show up to? people think it from the movie, but the truth it’s come from the screen, so we don’t need compute to capture and edit the video , only capture the screen, everything show up in laptop / pc screen will detect (like streaming webcam, then draw the line on the screen.
combine the YOLO…
and so many ways to implement it…CCTV and many others.
the most powerfull if we can make a super mini projector, that can projecting a mini line / retangle on the glass. it’s more great than google glas i think.
Adrian Rosebrock
I would suggest trying to use this method.
Elena
Hi, Adrian! Many thanks – great tutorial!
Just a couple of questions: I get YOLOv3 output images with blue color dominating (everything looks dark+ blue), I understand that opencv transforms images, could you please maybe give a hint on this?
Another question: is it possible to increase output window size?
Thanks a lot in advance!
Elena
Actually I found answers to my questions :), writing it below, maybe it could be useful for someone:
I am working in Jupyter notebooks, using matplotlib to display output images:
1. To convert BGR to RGB, one more line of code is needed:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
2. To resize image display window in jupyter notebook:
plt.rcParams[‘figure.figsize’] = [10, 5] #10,5 – as example only
Thank you, Adrian, for the great tutorial!
Adrian Rosebrock
Congrats on resolving the issue, Elena!
ibrahim
Hi Adrian,
thanks for this post,and its thorough explanation along with your other blog posts! 🙂
i want to ask if cv2.dnn.NMSBoxes() apply Intersection Over Union when selecting the bounding boxes? or if it is applied anywhere in this implementation. if not do you think that applying it would give better results for the bounding boxes and what would be the easiest way to do it?
Palbha
Hi Adrian ,
I am using only CPU and have integrated face detection along with object detection and I know its bound to be slow , is there any way i can limit the objects and might increase the FPS .Please suggest anything that might help increasing the output fps also I want to video to be normal sized like frame = imutils.resize(frame, width=800)
Thanks in advance ,
Palbha
Adrian Rosebrock
If you’re using a CPU the only thing you can really do is resize your image dimensions and make the smaller. Otherwise you should look into using a GPU.
AYOUNI aymen
hello
can you help me with this tasks
how can we count vehicles and class them using YOLO ?
how to measure vehicle speed using YOLO?
Adrian Rosebrock
I’m covering how to detect and compute vehicle speed inside Raspberry Pi for Computer Vision. I would definitely suggest starting there.
AYOUNI aymen
what about counting traffic vehicles ?
I found OpenCV People Counter in your blog but there’s not more information about vehicles ..
can you help me please I really need this
Adrian Rosebrock
All you need to do is change the people counter class from “person” to “vehicle”. Then you can track vehicles.
Adam
Hello
Great work. I am used your sample code and I am able to do detection on my video. However, do you have some sample code or what I need to add to it so it does tracking? I have intersection with cars and I just want to give every car that enters, move and exit its own id. I saw your pople counter example but it does not use Yolo and I was wondering if perhaps you have some code around for that or how I can do it
I am java person, so python is not my strength 🙂 THANK you
Adrian Rosebrock
The people counter example you’re referring to would work here. Just change the class you want to detect from “person” to “vehicle”.
Ayush Sahay
Hey!!
I compiled the video code, and it compiles successfully, but nothing appears on the screen. Basically I wait for 5-6min for code to compiler but after that it terminates and nothing appears on the screen, and I go back to command line(ubuntu). Not really sure what’s going on here.
Adrian Rosebrock
Are you running it on a video file? If so, nothing is supposed to display to your screen. It will generate an output video file for you.
student
Hello Adrian
I want to use two cameras. I want to detect the object from the first camera and then when the object Is passed from the second camera. I want the YOLO to know this object is the same object. (matches the first object).
How can I do so?
I heard that I need to add temporal info. Do you know any reference, that I can read about?
Adrian Rosebrock
Sorry, I don’t have any tutorials for object tracking across multiple cameras. I’ll consider it for a future tutorial but cannot guarantee if/when I will cover it.
rakib
how can I use it for real-time with a raspberry camera?
Adrian Rosebrock
No, the RPi is too underpowered to run YOLO object detection in real-time.
Hanna
Hi adrian, thank you for providing this tutorial. It is really helpful for a newbie like me. I have a request. Could you show tutorial on how to generate caption from traffic images using python?.
Adrian Rosebrock
Thanks for the suggestion. I’ll consider it but I cannot guarantee if/when I will cover it.
mayur
using yolo on video if i want to get the label of the detected object, is it possible ?
plz let me know if anyone know the way to solve it
mayur
in text format.
Adrian Rosebrock
The code already shows you how to do that so perhaps I’m misunderstanding your question?
Edi
Dear Adrian
I’ve implemented your great work on my raspberry pi3 B+.
but it’s extremely slow.it takes near 40s to process only one frame.
Adrian Rosebrock
Yes, the RPi is far too underpowered for YOLO. You would want to use a Movidius NCS or Google Coral USB Accelerator to speed up inference.
Yashu
Hey how good is yolo for custom object detection? … like if i want to build a apparel detection model will it be able to perform well?
Adrian Rosebrock
In my opinion, I would suggest you use SSD or RetinaNet in place of YOLO. YOLO, while fast on a GPU, can be harder to train and is more prone to false-positive detections. If speed is not an issue then Faster R-CNN would be a good choice as well.
I cover how to train each of those networks inside Deep Learning for Computer Vision with Python.
Sambhavi
Hello Adrian,
As usual, another great blog post. I learn so much of Computer Vision from your blogs and books. Thank you so much.
I was able to successfully detect objects in images and videos using your code. To take it further, I custom trained few other objects which is not pre-trained by yolo. My need is, in any given image, I want to detect the objects that I trained as well as few categories already trained by yolo. Is that possible at all? They are 2 different weight files, in such a case how should I go about it? I did some search and couldn’t find any straight forward answer. So I was wondering whether I should re-train those categories which were already trained by yolo along with my custom objects so that my requirement will be handled. You think it makes sense?
Just to make my question clearer, apples and oranges are pre-trained. I trained tomatoes and melons. Now in an image if I want my detector to detect all 4, what would be a better approach.
Thanks in advance.
Adrian Rosebrock
What you want to do is fine-tune the YOLO model. I don’t have any tutorials on that, but I cover how to fine-tune object detectors to recognize classes they were not originally trained on inside Deep Learning for Computer Vision with Python.
Jeff Xu
Hi Adrian,
I had read the fine-turn in your book once. I will read it more times later. I also have the same question. If the pre-trained model can detect 100 classes , after fine-turn to train it on my own data(added two calsses), can it detect 102 classed or just the new 2 classes? Thank you.
Adrian Rosebrock
No, it would only detect the 2 new classes.
Rheza Aditya
Hello Adrian.
This is a great tutorial. However, how can i integrate a live video from my webcam with this method? Thank you in advance!
Adrian Rosebrock
Technically yes, but YOLO is not going to run in real-time using OpenCV’s “dnn” module (it’s just too computationally expensive).
Phat Dao
Thank you very much Adrian <3
Adrian Rosebrock
You are welcome!
Jeff Xu
Hi Adrian,
Lots of thanks for your great posts as well as your patient replays! Here I have 2 questions that hope you can help answer.
Q1: From the comments, I found yolov2 was faster than yolov2, so I downloaded the yolov2 cfg and weights(https://pjreddie.com/darknet/yolov2/). However, when replaced the yolov3 cfg and weight, although the speed is faster, the labels are wrong(for instance, person will be detected as bird), why?
Q2: When I run the polo_video.py, it costed 95% memory of GPU while occupied 99% of CPUS, why?( I added cv2.imshow() and remove the writer() function)
Thanks in advance!
Adrian Rosebrock
1. Sorry, I’m not sure about that.
2. Make sure you’re reading the tutorial and the comments. I’ve mentioned that OpenCV’s “dnn” module does not support NVIDIA GPUs (yet). Hopefully support for them is coming this summer.
student
Hi Adrian,
Thank you so much for this tutorial.
I am trying to apply this approach further by creating a warning system if objects are in a certain area of the screen (ex. creating a region of interest for the left and right half of the screen and outputting that an object is detected on the respective side). Im unsure where/how in the code i can implement this.
Any help would be greatly appreciated!
Adrian Rosebrock
Hey there — I actually cover how to build that exact project inside the PyImageSearch Gurus course. I would definitely suggest starting there.
student
Hello Adrian,
I want to do detection and tracking through video/Image. I will train the model for my own dataset. My project is about a surveillance system. I want to put a camera outside and do detection using camera data. I heard that I would need a raspberry pi or other tools. But I also heard that raspberry pi is not powerful enough to run YOLO in real time. Do you think that I need both an embeded device and pc for training?
I really need an expert to tell me what to buy that it meets my need. I would appreciate if you could tell me what is the best to buy?
What would be your suggestion for choosing devices for my project?
Thanks in advance
Adrian Rosebrock
Hi there — I cover your exact question inside Raspberry Pi for Computer Vision.
Bai
hacker bundle?
Adrian Rosebrock
Both the Hacker Bundle and Complete Bundle of Raspberry Pi for Computer Vision cover the topic.
William
Hello Adrian.
This is a great tutorial! Thanks for your blogs which bring me a brand new world!
For the object detection, I have read your source code and find that:
In a video, each object has a bounding box at each frame so that the object is seemed to be tracked all the time. In another word, a object has many bounding boxes in the video.
So here comes to my question:
How can I count the number of objects appearing in a video ?
Thanks so much!!
Adrian Rosebrock
You can use a basic centroid tracking algorithm.
student
I have a couple of questions and I was wondering if you could answer them.
I have a bunch of images of the cars, side view only. I want to train the model with those images. My objects of interest are 3 types of trucks that have different trailers. I Rarely see two target object at one image(maybe 1/2 in every 1000 images). However, I do see other types of cars that I do not want to detect.
My questions are:
do you think I should tackle this problem as a detection task or classification task? (for example, should I consider multilabel classification or omit those pictures)
should I also include other vehicles that I do not want to detect in my training dataset? let say I do not assign bounding box to them but include them in training dataset just to make the system robust. ( I trained YOLO with 200 images, sometimes the trained model confused and detected the wrong object that is not in any of classes, this happens when training with 2000 images per class? This is due to a small number of dataset or it is because of not including those images with no bounding boxes?)
Thank you in advance!
Adrian Rosebrock
I’d start by running two experiments:
1. Label a small subset of your data (~10%) for classification and then label that same 10% for detection
2. Train a simple classifier on that data.
3. Then train a YOLO detector.
Evaluate both and double-down on the most promising method.
If you need help training your own custom object detectors be sure to refer to Deep Learning for Computer Vision with Python.
sahar Pordeli Behrouz
Hi. How can I use this code for getting the detections for 100 images one time not just only 1 image?
Adrian Rosebrock
Not sure I totally understand what you mean. You could either batch your images or loop over each of your 100 images, one at a time.
Rico Aditya
Hi Adrian, Great tutorial. Thank to share !
I want ask a question. Can i detect just one object in the image / video? ex. I want detect vehicle at the road.
Thanks 🙂
Adrian Rosebrock
Yes. See this tutorial.
veeerho
Hi Ardian, does GPU affect when training only or also affect when we run real time detection ?
will it be same performance if I run detection with data that has been trained with powerful GPU on another PC that has no powerful GPU ?
Adrian Rosebrock
The GPU will have an impact on both. A GPU will lead to faster training and faster object detection.
Debal
hi Adrian
thanks for the nice and well explained tutorial.
i would like to know if a similar model can be trained to separate out mannequin faces from real human faces (only faces since i don’t think bodies can be identified)
In a given image containing both mannequins and humans, the above model identifies both as persons.
Adrian Rosebrock
It’s 100% possible. I would start with liveness detection and go from there.
student
Hello Adrian,
Thank you for the useful blog post.
I want to detect wheels of the vehicle and measure the distance from the wheel to the lane, to detect wheels I am using hough algorithm. However, wheels are not circles and more like ellipses, that is why hough is not working perfectly.
Could you suggest a better method to detect the circles that are like ellipses rather than circles?
Adrian Rosebrock
I would suggest training your own custom wheel detector. I show you how to train your own object detectors inside Deep Learning for Computer Vision with Python.
tabarka
hello,
I was searching for a model that can detect objects like buildings etc so can i use this technique for building detection also ?
Adrian Rosebrock
Take a look at semantic segmentation.
student
Hello again!
Yes, I used your blog post and books to learn how to train custom dataset.
I trained the YOLO model to detect wheels, I am now able to detect wheels by drawing bounding boxes.
But I do not know how to find the point that wheels are touching the ground, I want to measure the distance between wheels and the lane which is on the road.
Could you please advise me?
Adrian Rosebrock
Hmm, if you can detect the wheels then why not just compute the lowest (x, y)-coordinate of the wheel bounding box or mask? Assuming the car isn’t “floating” you will know the point where the tire is touching the road.
Omkar
Hai adrian Im getting only one frame output for videoimage
Shubam Manhas
Hey, Adrian nice tutorial, by the way, I want to ask that can we do this object detection using a camera. If yes how can we do it in an efficient way?
I want to know what prerequisites are required.
Appreciate your help
Adrian Rosebrock
Yes, see this tutorial.
Muhammad Jawwad
Hey Adrian, I really appreciate your level of enthusiasm for this field and you have taught me many things through this medium.
I am facing a problem with this tutorial and I am hoping if you could guide me in right direction.
“yolo.py ” is working perfectly but when I am running the “yolo_ video.py” I am stuck after
INFO] loading YOLO from disk…
[INFO] 350 total frames in video
[INFO] single frame took 5.9135 seconds
[INFO] estimated total time to finish: 2069.7227
after this nothing happens, NOTHING.
Need your expert opinion.
Adrian Rosebrock
The script is running. It will take approximately 2000 seconds to complete. Wait for the script to finish running.
Manoj
Hi Adrian
i want to know about the minimum hardware requirements for implementing yolo and other object detection algorithm for both image and videos.
student
Hey Adrian, is this code running with the default settings witch is the processor or it is running on the GPU.
If it is running on the CPU how can I change it to the GPU ?
Thank you.
Adrian Rosebrock
I’ve already answered this question in the other comments, kindly give them a read.
Karthika
Hi,
How can I use this model to detect only 1 class? (Say, person- for pedestrian detection).
According to you what’s be a good research prospect in this area? Other than improving accuracy, what other research has scope in detection?
I needed some directions to think about for my project.
Adrian Rosebrock
I’ve discussed how to only detect a single class in this post.
MUHAMMAD KK
For my system its taking too much time to processed. How to reduce that problem
Adrian Rosebrock
Try using a GPU instead of your CPU.
Luiz
How would I put it to be processed on the GPU? do I need to change anything in the code?
Adrian Rosebrock
Refer to this tutorial.
Sharon
Hi Adrian. Thank you for the excellent tutorial – much appreciated. Just one question – is it correct that the blob output resolution is 416 x 416 when yolov3.cfg is expecting 608 x 608?
Thanks!
Ricky
Hi, I love what you do! and thanks for sharing it with all of us!
I was wondering if you could tell me the steps to run the yolo_video.py program with the GPU in order to speed up the operation (can I do that? would you have any other suggestions?)
Thank you,
Greetings and keep it up!
Adrian Rosebrock
Hey Ricky — see the comments of this post, I’ve addressed that question a handful of times already. Thank you!
Crea
Hello, Adrian.
Does OpenCV’s “dnn” module support Jetson Nano?
I want to use “yolo_video.py” with Jetson Nano.
I also tried to display the video using “cv2.imshow()” on the CPU.
However, only a black screen was displayed.
What should I do?
raviraj
How to get the detected object coordinates of the bounding box in YOLO object detection?
Adrian Rosebrock
That code already shows you how to do that — kindly read the post and the associated source code.
Fabio
Hi,
great tutorial and really great work with this website! I’m studying YOLO as object detector tool on my 2020 rtx super and it run super fast!
In my case there’s a problem that i don’t understand: i’m tryin to detect objects by a camera stream, the stream is in full-hd (1080p, 25fps) and the detection works well but the x,y,w,h are shaking too much from a frame to an other. i would like to find a way to stabilize this coordinates, already tried to investigate through the yolo’s parameters but nothing…
From what i see you got the same issue in you video examples, the width/height of the bounding boxes are shaking more than normal.
Do you have some input ?
Thank you again and again 🙂
Adrian Rosebrock
Take a look at tracking-based algorithms, including optical flow — those will help stabilize. You may also want to consider a simple rolling average of bounding box coordinates as well.
Aditya Bhatia
will I be able to use the above yolo_video.py code for real time object detection. I want to use my webcam for the input purpose how would I be able to implement that??
Adrian Rosebrock
Hey Aditya, I’ve addressed that question multiple times in other comments. Kindly give them a read.
eddo
Hi Adrian,
Great article, as ever…thanks. I did have a couple of questions on the blobFromImage applied to YOLO:
1) The mean subtraction parameters are all 0, so I’m assuming YOLO doesn’t require mean subtraction?
2) The scale factor is set to 1/255, is that simply so that pixel values are set between 0 and 1?
3) Are we then preprocessing validation images in the same way YOLO preprocesses training images?
4) If I follow the instructions on the YOLO website and train COCO using a 608X608 network…(this seems to be in the cfg file when you clone from github), would I then need to specify the resize (in blobFromImage) to be (608,608)?
Thanks for helping!
Adrian Rosebrock
You are correct across the board.
Nushaine Ferdinand
Why doesn’t YOLO require mean subtraction?
Yogesh
Hi Adrian, it is possible to use this code for yolov3-tiny.cfg.. It is producing error if i do that.. is the code can run only with yolov3.cfg?
Chandan Yadav
Hi Adrian,
Loved this tutorial on YOLO. It’s very informative and very easy for a beginner’s perspective. I wish to know how can we use YOLO on a live streaming IP camera as the input. I wish to detect humans from a live IP Camera.
Adrian Rosebrock
I would recommend using ImageZMQ.
Peter Cunha
Adrian,
Thank you so much for your invaluable work on this article! Using what I learned from you, I was able to build CS:GO aimbot powered by neural networks. If you want to check it out, here it is: https://github.com/petercunha/Pine
I added your name and a link to your blog in the “Special Thanks” section of the readme. PyImageSearch is the gift that keeps on giving — I always learn something new from your posts. Thanks for all your hard work.
Best,
Peter
Adrian Rosebrock
This is so cool, thank you for sharing Peter! I also really appreciate the link/credit in the special thanks section 🙂
Jyoti Prakash
Hello, Adrian! This algorithm works really well. So I had a small doubt. I am combining YOLO v3 with SORT. But i want my code to run on GPU. How can I achieve this?
chris
Hi Mr Adrian, thank you a lot for your tutorial. I just have a question. The yolo_video.py can successfully detect all the labels in the dataset. Can you teach me how to fine-tune your code to detect only specific object?
Adrian Rosebrock
Absolutely! I show you how to fine-tune object detection networks to recognize your own objects inside Deep Learning for Computer Vision with Python.
Bindu
Hi Adrian,
Thanks for the beautiful write up, I was running your code on image where a there is just hand on tap there is no body at all, and its detecting it as a person, anyway to correct it to detect only when the whole person is visible if or the upper body of the person instead of just hands and legs getting detected as human?
Adithya Raj
Thanks Adrian for the great blog post.
For running the code for the custom detection of objects (to run from the last checkpoint saved while training) what changes I have to make inside the script.
Krishna Chamarthi
Hi Adrian, it was really nice article and it was helpful for my studies. Is it possible to detect just one of the categories from the whole list of labels provided in the coco dataset?
I tried modifying the yolo3.names to one category and in yolo3. weights file modified classes=1 and filters =255 to 18. but was not able to get the result. can you please help?
Adrian Rosebrock
Yes, refer to this post.
Dominic Pritham
Hi Adrian:
Which bundle do you cover custom object detector in?
Is it ImageNet bundle? I am interested in using deep learning for custom object detection.
Adrian Rosebrock
Correct, the ImageNet Bundle of Deep Learning for Computer Vision with Python covers how to train custom object detectors.
Nasar Khan
Can we use YOLO algorithm on text detection instead of object detection???
Adrian Rosebrock
I would suggest instead using the EAST text detection model.
Rariwa
Hi Adrian,
Amazing work, I wonder how to return the information about the frames such as number of object detected, count for each object, etc. I tried to modify the code above but is still not working.
thank you
Adrian Rosebrock
1. Initialize a dictionary
2. Loop over the objects detected
3. Grab the label for the particular object
4. Lookup the count in the dictionary
5. Increment the count
Subham
How can we count the total number of vehicles (car+bus+truck+…..) and display it in a video stream?
Adrian Rosebrock
Refer to the previous comment as I’ve already addressed that question.
Rariwa
thank you so much. It works !
Adrian Rosebrock
Awesome, I’m glad it worked for you Rariwa 🙂
Federico
Amazing tutorial! It’s really helping me out for an exercise.
I just wanted to know if it is possibile to implement this code in C++.
Because I have already a piece of code with Kalman Filter and other functions for centroid tracking in C++, and I have some difficulties in translating them in Python.
Do You know a way to turn Your code in C++?
Thank You
Federico
Ok, I did it!!!
For future questions about this, I suggest to look at these links:
https://docs.opencv.org/3.4/d6/d0f/group__dnn.html
https://docs.opencv.org/3.4/db/d30/classcv_1_1dnn_1_1Net.html
https://docs.opencv.org/3.4/d4/db9/samples_2dnn_2object_detection_8cpp-example.html
They helped me a lot!
You just need to follow step by step (with calm) and try to understand what the functions do.
Anyway…thank You Adrian for this tutorial! You have probably saved me a lot of time and pain.
Have a nice day! 😉
Adrian Rosebrock
Thanks Federico 🙂
thao nguyen
Hi Adrian.
I’m doing detection for video, but when camera is moving to fast, image are blur and detection result are not good, can you suggest how to ignore these image
Hanumant
very nice tutorial. I have a question about how I would track the path of the object like the above tutorial I want to trace suspect running path so how would I draw a line when the object moving.
Adrian Rosebrock
You mean something like this?
HANUMANT
yes like that. how can i do this in above code
Adrian Rosebrock
Refer to the tutorial I linked you to in my previous comment. It will show you how to draw the contrail.
Saurabh
Hello Adrian,
Thanks for the blog. I am looking for a image labeling tool (I know labelImg tool). The problem with labelImg tool is that it draws always straight rectangle. In my case, objects are not straight so labelImg is not helpful. However, I looked into labelme tool which allows to draw polygon around an object. But this is limited to image segmentation problem.
Could you please guide me or provide pointer on this?
Thanking you!
Adrian Rosebrock
I really like VGG’s VIA.
Saurabh
Thanks for the pointer!
Nr
Hi Adrian, you mentioned VGG’s VIA, I took a look at it, it has diffrent types of annotations, such as circles, rather than only rectangle one. I was wondering if you could please tell me whether it is possible to do training based on circle labels rather than rectangle? If yes what types of networks are capable of this?
I have always seen rectangle and segmentation, I have never seen circle annotation and training.
Pls advide!
Thanks
Adrian Rosebrock
No, VIA does have rectangle annotations. Make sure you read the documentation associated with VIA.
Nr
Why VIA also offers circle annotation as well? what would be the usage of that circle annotation?
Adrian Rosebrock
Instance segmentation or semantic segmentation — you would use the circle or polygon tool to generate masks.
Nr
Hi Adrian,
I heard OpenCv has added CUDA backend to speed up the inference time.
Is that really correct?
Could we run DNN on GPU now?
Adrian Rosebrock
You are correct. I’ll be covering that exact topic in next week’s blog post 🙂
Nr
Thanks Adrian, I am looking for to reading your new post. 🙂
ronit
Is a real time object detection possible using yolo as our object detector
Adrian Rosebrock
Yes, provided you compile OpenCV with GPI support.
khaled
Hi Dr Adrian,
I am new to the topic so apologies if it is a dump question: Is there any quick way we can change the code to detect only one YOLO class for example ‘person’ and speed up the process or does it require a completely new training?
Thank you so much in advance !
Adrian Rosebrock
Yes. Read this tutorial to learn how.
Sean Tan
Hi there, thank you for this superb tutorial on YOLO. Can i know if it is possible to modify this system into counting vehicles density system? (Vehicles Counter) Thank you.
Sean Tan
Is it possible to convert those bounding box into numbers? I am actually doing traffic light vehicles density analysis system. I do hope those bounding boxes can be converted into numbers and i will be using those numbers to change the traffic light’s timer. Thank you very much in advance.
Adrian Rosebrock
Yes, you can simply loop over the total number of bounding boxes and increment a counter for each object that is labeled as a vehicle.
Arush Jain
Hii Adrian
I am new to OpenCV and currently working on object detection only is it possible that we can customize the code in a way that it detect the image and return the name as the label.
And as I am working on windows so is their any additional requirement required to use YOLO or any deep learning technique.
Mirwan Hakim
hi, I’m trying to make automatic recording when the camera detects people using Yolo. But I faced many problems when trying it. Can you help me. I’m trying to finish my school.
Adrian Rosebrock
It sounds like you need my key event clip writer.
hemanth
Have you used mobile nets or SSD algorithms in this project?
Adrian Rosebrock
I’ve worked with both MobileNets and SSDs, and in fact, both are covered inside Deep Learning for Computer Vision with Python.
Roni
Adrian, please help me, I believe that only you can. My yolov3 model works in detecting the object in the video when the image for training has a resolution – 416×416. And video resolution – 416×416
but now i want use resolution 64×64 for video and for training image, i trained image with resolution 64×64, but without result. How you think, Yolov3 can work with 64×64 resolution or not?
Juan
Hi Adrian.
Thank for sharing this nice tutorial. I made a filter on the classIDs to just detect the soccer ball on the soccer image and it works very well. How can I make a lighter code to just detect the soccer ball and get 30FPS on a Raspberry Pi or similar.
Adrian Rosebrock
I would suggest you take a look at Raspberry Pi for Computer Vision which shows you my tips, suggestions, and best practices to get object detectors to run in real-time or as close to as real-time as possible on the RPi.
Umer
Hello Adrian,
Thanks for the post. It is very helpful. For my project, I want to detect and track a small object (Golf Ball). Would this work for detecting a small object or should I use some other techniques?
Adrian Rosebrock
YOLO isn’t the best for tracking small objects. I would suggest trying Faster R-CNN.
Sai J
Hello, thank you very much for this tutorial, i’m new to the implementation of deep learning, and this tutorial helped a lot. If it’s okay, can you please tell me how can i add new classes to the model, is it a function that i have to add to the one of the two provided codes ? sorry if it’s a really low level question. thank you!
Sai J
Sorry Adrian i only noticed the rules afterwards. I got a close answer in one of the comments. again sorry
Adrian Rosebrock
No worries and thank you for reading the comments!
Student
Hi Adrian,
Is there a way out to store the co-ordinates of the bounding boxes obtained in a file for further processing?
Thanks in advance!
Adrian Rosebrock
Yes, you could use a simple text file, CSV file, JSON, etc. I would suggest you read more about basic Python file I/O.
Utkarsh
Thanks, Adrian for this tutorial,
My yolo_video.py file is running without any error but there is a bug the output file is of few a milliseconds, instead of being of full length as of the original video.
Please Help Me…….
Adrian Rosebrock
Working with video can be a bit of a pain with OpenCV. I suggest following this tutorial to help you get started.
Lotfi
Hello,
I tested your python program and it wrks fine?
I’m looking to detect road pannel. How can I add these objects?
Regards
Adrian Rosebrock
Take a look at the other comments on this post as I have addressed that question multiple times.