Last week we discussed how to use OpenCV and Python to perform pedestrian detection.
To accomplish this, we leveraged the built-in HOG + Linear SVM detector that OpenCV ships with, allowing us to detect people in images.
However, one aspect of the HOG person detector we did not discuss in detail is the detectMultiScale
function; specifically, how the parameters of this function can:
- Increase the number of false-positive detections (i.e., reporting that a location in an image contains a person, but when in reality it does not).
- Result in missing a detection entirely.
- Dramatically affect the speed of the detection process.
In the remainder of this blog post I am going to breakdown each of the detectMultiScale
parameters to the Histogram of Oriented Gradients descriptor and SVM detector.
I’ll also explain the trade-off between speed and accuracy that we must make if we want our pedestrian detector to run in real-time. This tradeoff is especially important if you want to run the pedestrian detector in real-time on resource constrained devices such as the Raspberry Pi.
Looking for the source code to this post?
Jump Right To The Downloads SectionAccessing the HOG detectMultiScale parameters
To view the parameters to the detectMultiScale
function, just fire up a shell, import OpenCV, and use the help
function:
$ python >>> import cv2 >>> help(cv2.HOGDescriptor().detectMultiScale)
You can use the built-in Python help
method on any OpenCV function to get a full listing of parameters and returned values.
HOG detectMultiScale parameters explained
Before we can explore the detectMultiScale
parameters, let’s first create a simple Python script (based on our pedestrian detector from last week) that will allow us to easily experiment:
# import the necessary packages from __future__ import print_function import argparse import datetime import imutils import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-w", "--win-stride", type=str, default="(8, 8)", help="window stride") ap.add_argument("-p", "--padding", type=str, default="(16, 16)", help="object padding") ap.add_argument("-s", "--scale", type=float, default=1.05, help="image pyramid scale") ap.add_argument("-m", "--mean-shift", type=int, default=-1, help="whether or not mean shift grouping should be used") args = vars(ap.parse_args())
Since most of this script is based on last week’s post, I’ll do a more quick overview of the code.
Lines 9-20 handle parsing our command line arguments The --image
switch is the path to our input image that we want to detect pedestrians in. The --win-stride
is the step size in the x and y direction of our sliding window. The --padding
switch controls the amount of pixels the ROI is padded with prior to HOG feature vector extraction and SVM classification. To control the scale of the image pyramid (allowing us to detect people in images at multiple scales), we can use the --scale
argument. And finally, --mean-shift
can be specified if we want to apply mean-shift grouping to the detected bounding boxes.
# evaluate the command line arguments (using the eval function like # this is not good form, but let's tolerate it for the example) winStride = eval(args["win_stride"]) padding = eval(args["padding"]) meanShift = True if args["mean_shift"] > 0 else False # initialize the HOG descriptor/person detector hog = cv2.HOGDescriptor() hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) # load the image and resize it image = cv2.imread(args["image"]) image = imutils.resize(image, width=min(400, image.shape[1]))
Now that we have our command line arguments parsed, we need to extract their tuple and boolean values respectively on Lines 24-26. Using the eval
function, especially on command line arguments, is not good practice, but let’s tolerate it for the sake of this example (and for the ease of allowing us to play with different --win-stride
and --padding
values).
Lines 29 and 30 initialize the Histogram of Oriented Gradients detector and sets the Support Vector Machine detector to be the default pedestrian detector included with OpenCV.
From there, Lines 33 and 34 load our image and resize it to have a maximum width of 400 pixels — the smaller our image is, the faster it will be to process and detect people in it.
# detect people in the image start = datetime.datetime.now() (rects, weights) = hog.detectMultiScale(image, winStride=winStride, padding=padding, scale=args["scale"], useMeanshiftGrouping=meanShift) print("[INFO] detection took: {}s".format( (datetime.datetime.now() - start).total_seconds())) # draw the original bounding boxes for (x, y, w, h) in rects: cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2) # show the output image cv2.imshow("Detections", image) cv2.waitKey(0)
Lines 37-41 detect pedestrians in our image
using the detectMultiScale
function and the parameters we supplied via command line arguments. We’ll start and stop a timer on Line 37 and 41 allowing us to determine how long it takes a single image to process for a given set of parameters.
Finally, Lines 44-49 draw the bounding box detections on our image
and display the output to our screen.
To get a default baseline in terms of object detection timing, just execute the following command:
$ python detectmultiscale.py --image images/person_010.bmp
On my MacBook Pro, the detection process takes a total of 0.09s, implying that I can process approximately 10 images per second:
In the rest of this lesson we’ll explore the parameters to detectMultiScale
in detail, along with the implications these parameters have on detection timing.
img (required)
This parameter is pretty obvious — it’s the image that we want to detect objects (in this case, people) in. This is the only required argument to the detectMultiScale
function. The image we pass in can either be color or grayscale.
hitThreshold (optional)
The hitThreshold
parameter is optional and is not used by default in the detectMultiScale
function.
When I looked at the OpenCV documentation for this function and the only description for the parameter is: “Threshold for the distance between features and SVM classifying plane”.
Given the sparse documentation of the parameter (and the strange behavior of it when I was playing around with it for pedestrian detection), I believe that this parameter controls the maximum Euclidean distance between the input HOG features and the classifying plane of the SVM. If the Euclidean distance exceeds this threshold, the detection is rejected. However, if the distance is below this threshold, the detection is accepted.
My personal opinion is that you shouldn’t bother playing around this parameter unless you are seeing an extremely high rate of false-positive detections in your image. In that case, it might be worth trying to set this parameter. Otherwise, just let non-maxima suppression take care of any overlapping bounding boxes, as we did in the previous lesson.
winStride (optional)
The winStride
parameter is a 2-tuple that dictates the “step size” in both the x and y location of the sliding window.
Both winStride
and scale
are extremely important parameters that need to be set properly. These parameter have tremendous implications on not only the accuracy of your detector, but also the speed in which your detector runs.
In the context of object detection, a sliding window is a rectangular region of fixed width and height that “slides” across an image, just like in the following figure:
At each stop of the sliding window (and for each level of the image pyramid, discussed in the scale
section below), we (1) extract HOG features and (2) pass these features on to our Linear SVM for classification. The process of feature extraction and classifier decision is an expensive one, so we would prefer to evaluate as little windows as possible if our intention is to run our Python script in near real-time.
The smaller winStride
is, the more windows need to be evaluated (which can quickly turn into quite the computational burden):
$ python detectmultiscale.py --image images/person_010.bmp --win-stride="(4, 4)"
Here we can see that decreasing the winStride
to (4, 4) has actually increased our detection time substantially to 0.27s.
Similarly, the larger winStride
is the less windows need to be evaluated (allowing us to dramatically speed up our detector). However, if winStride
gets too large, then we can easily miss out on detections entirely:
$ python detectmultiscale.py --image images/person_010.bmp --win-stride="(16, 16)"
I tend to start off using a winStride
value of (4, 4) and increase the value until I obtain a reasonable trade-off between speed and detection accuracy.
padding (optional)
The padding
parameter is a tuple which indicates the number of pixels in both the x and y direction in which the sliding window ROI is “padded” prior to HOG feature extraction.
As suggested by Dalal and Triggs in their 2005 CVPR paper, Histogram of Oriented Gradients for Human Detection, adding a bit of padding surrounding the image ROI prior to HOG feature extraction and classification can actually increase the accuracy of your detector.
Typical values for padding include (8, 8), (16, 16), (24, 24), and (32, 32).
scale (optional)
An image pyramid is a multi-scale representation of an image:
At each layer of the image pyramid the image is downsized and (optionally) smoothed via a Gaussian filter.
This scale
parameter controls the factor in which our image is resized at each layer of the image pyramid, ultimately influencing the number of levels in the image pyramid.
A smaller scale
will increase the number of layers in the image pyramid and increase the amount of time it takes to process your image:
$ python detectmultiscale.py --image images/person_010.bmp --scale 1.01
The amount of time it takes to process our image has significantly jumped to 0.3s. We also now have an issue of overlapping bounding boxes. However, that issue can be easily remedied using non-maxima suppression.
Meanwhile a larger scale will decrease the number of layers in the pyramid as well as decrease the amount of time it takes to detect objects in an image:
$ python detectmultiscale.py --image images/person_010.bmp --scale 1.5
Here we can see that we performed pedestrian detection in only 0.02s, implying that we can process nearly 50 images per second. However, this comes at the expense of missing some detections, as evidenced by the figure above.
Finally, if you decrease both winStride
and scale
at the same time, you’ll dramatically increase the amount of time it takes to perform object detection:
$ python detectmultiscale.py --image images/person_010.bmp --scale 1.03 \ --win-stride="(4, 4)"
We are able to detect both people in the image — but it’s taken almost half a second to perform this detection, which is absolutely not suitable for real-time applications.
Keep in mind that for each layer of the pyramid a sliding window with winStride
steps is moved across the entire layer. While it’s important to evaluate multiple layers of the image pyramid, allowing us to find objects in our image at different scales, it also adds a significant computational burden since each layer also implies a series of sliding windows, HOG feature extractions, and decisions by our SVM must be performed.
Typical values for scale
are normally in the range [1.01, 1.5]. If you intend on running detectMultiScale
in real-time, this value should be as large as possible without significantly sacrificing detection accuracy.
Again, along with the winStride
, the scale
is the most important parameter for you to tune in terms of detection speed.
finalThreshold (optional)
I honestly can’t even find finalThreshold
inside the OpenCV documentation (specifically for the Python bindings) and I have no idea what it does. I assume it has some relation to the hitThreshold
, allowing us to apply a “final threshold” to the potential hits, weeding out potential false-positives, but again, that’s simply speculation based on the argument name.
If anyone knows what this parameter controls, please leave a comment at the bottom of this post.
useMeanShiftGrouping (optional)
The useMeanShiftGrouping
parameter is a boolean indicating whether or not mean-shift grouping should be performed to handle potential overlapping bounding boxes. This value defaults to False
and in my opinion, should never be set to True
— use non-maxima suppression instead; you’ll get much better results.
When using HOG + Linear SVM object detectors you will undoubtably run into the issue of multiple, overlapping bounding boxes where the detector has fired numerous times in regions surrounding the object we are trying to detect:
To suppress these multiple bounding boxes, Dalal suggested using mean shift (Slide 18). However, in my experience mean shift performs sub-optimally and should not be used as a method of bounding box suppression, as evidenced by the image below:
Instead, utilize non-maxima suppression (NMS). Not only is NMS faster, but it obtains much more accurate final detections:
Tips on speeding up the object detection process
Whether you’re batch processing a dataset of images or looking to get your HOG detector to run in real-time (or as close to real-time as feasible), these three tips should help you milk as much performance out of your detector as possible:
- Resize your image or frame to be as small as possible without sacrificing detection accuracy. Prior to calling the
detectMultiScale
function, reduce the width and height of your image. The smaller your image is, the less data there is to process, and thus the detector will run faster. - Tune your
scale
andwinStride
parameters. These two arguments have a tremendous impact on your object detector speed. Bothscale
andwinStride
should be as large as possible, again, without sacrificing detector accuracy. - If your detector still is not fast enough…you might want to look into re-implementing your program in C/C++. Python is great and you can do a lot with it. But sometimes you need the compiled binary speed of C or C++ — this is especially true for resource constrained environments.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this lesson we reviewed the parameters to the detectMultiScale
function of the HOG descriptor and SVM detector. Specifically, we examined these parameter values in context of pedestrian detection. We also discussed the speed and accuracy tradeoffs you must consider when utilizing HOG detectors.
If your goal is to apply HOG + Linear SVM in (near) real-time applications, you’ll first want to start by resizing your image to be as small as possible without sacrificing detection accuracy: the smaller the image is, the less data there is to process. You can always keep track of your resizing factor and multiply the returned bounding boxes by this factor to obtain the bounding box sizes in relation to the original image size.
Secondly, be sure to play with your scale
and winStride
parameters. This values can dramatically affect the detection accuracy (as well as false-positive rate) of your detector.
Finally, if you still are not obtaining your desired frames per second (assuming you are working on a real-time application), you might want to consider re-implementing your program in C/C++. While Python is very fast (all things considered), there are times you cannot beat the speed of a binary executable.
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Anuj Pahuja
Hi Adrian,
Thanks again for the informative blog post. I had to use HOGDescriptor in OpenCV for one of my projects and it was a pain to use because of no clear documentation. So this was much needed.
The ‘finalThreshold’ parameter is mainly used to select the clusters that have at least ‘finalThreshold + 1’ rectangles This parameter is passed as an argument to groupRectangles() or groupRectangles_meanShift()(when meanShift is enabled) function which rejects the small clusters containing less than or equal to ‘finalThreshold’ rectangles, computes the average rectangle size for the rest of the accepted clusters and adds those to the output rectangle list.
These should help:
1. http://code.opencv.org/projects/opencv/repository/entry/modules/objdetect/src/hog.cpp?rev=2.4.9#L1057
2. http://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html#void%20groupRectangles%28vector%3CRect%3E&%20rectList,%20int%20groupThreshold,%20double%20eps%29
Cheers,
Anuj
Adrian Rosebrock
Thanks so much for sharing the extra details Anuj!
Nrupatunga
Dear Adrian,
Very informative post on HOG detectmultiscale parameters. In fact I appreciate this post very much. I couldn’t find such detailed post on the net with such examples.
Recently, I have trained HOG features(90×160) manually using SVMlight. I had a hard time to make detectmultiscale work with these parameters.
I would like to share few observations while experimentation:
1. doing Hard train reduced my false positives.
2. finalThreshold and the useMeanShiftGrouping.
setting useMeanShiftGrouping to false, gave me good detection bounding box around the person in the image and increasing the final threshold reduced the number of detection(number of bounding boxes).
I am still working on the part of improving the detection rate. I have many images where I still couldn’t detect the person in the image.
I have reduced false positives. I wanted to increase my detection rate as well.
Any inputs on this. I would really appreciate your inputs.
Thanks a lot for this post.
Correction: if I am not mistaken, I think there should be modification in the code
“python detectmultiscale.py –image images/person_010.bmp –scale 1.03′
after the statement
“Meanwhile a larger scale will decrease the number of layers in the pyramid as well as decrease the amount of time it takes to detect objects in an image:”
Adrian Rosebrock
Hey Nrupatunga, thanks for the comment and all the added details, I appreciate it. If you ended up using mean shift grouping, I would suggest applying non-maxima suppression instead, you’ll likely end up getting even better results.
In order to improve your detection rate, be sure to check the ‘C’ parameter of your SVM. Normally this value should be very small, such as C=0.01. This will create a “soft classifier” and help with your detection rate.
Another neat little trick you can do to create more training data is “mirror” your training images. I’m not sure about your particular case, but for pedestrian detection, you the horizontal mirror of an image is still a person, thus you can use that as additional training data as well.
ngapweitham
Thanks for the brilliant explanations. It is much easier to understand than the document of opencv.
Anyone tried to use dlib to do the pedestrian detection?There is a video showing the reuslts(https://www.youtube.com/watch?v=wpmY_5gNbEY), cannot tell the result is good or bad with my knowledge.
Adrian Rosebrock
Thanks for sharing Tham!
Sebastian
Hi Adrian, thanks for the post
I’m trying to do HOG detection but in real time from video camera. I’m working in a raspberry pi 2 board and the code works but the frame rate is too slow.
How can i make the process faster?
Do you think is possible to get good results working with the raspberry pi 2?
Thanks
Adrian Rosebrock
Hey Sebastian — pleas see the “Tips on speeding up the object detection process” section. This section of the post details tricks you can use to speed up the detection process, especially related to the Raspberry Pi 2.
Cam
Hi Sebastian,
could you help me please with the people detection in real time please, I’ve been trying but it doesn’t work, can you give me some ideas to the code, or send to me that part of the code, i really apreciate that.
Thank you.
Adrian Rosebrock
Hey Camilo — please see this followup blog post on tuning
detectMultiScale
parameters for real-time detection.Rish
Hey Adrian – the link leads to the same post. I’m really trying hard to do real time detection. I’m hoping to achieve 20fps (or 25fps if I can get really lucky). I’ve implemented a tracking algorithm that helps quite a bit. However, any tips to speed up the detectMultiScale function as such would be really helpful.
As mentioned in the blogpost, changing scale from 1.20 to 1.05 increases time per 640×480 frame from 55ms to 98ms, however accuracy reduces significantly.
Adrian Rosebrock
Just to clarify, you are trying to obtain 20-25 FPS on the Raspberry Pi?
CodeNinja
Hi Rish…. You can pass alternate frames to classifier. Pass 0th frame for classifier. For 1st frame do only tracking(output obtained from the classifier) skip classifier part for this. Pass the 2nd frame to the classifier… and this cycle repeats.
Vivek
Adrian,
The scale parameter also take another input nLevels
the way it works is this.
Image size is descreased in nLevels
if nLevel=16
scale = 1.05
loop runs 16 times, each time decreasing the size by scale(starts=1)*=scale
So nLevel defines the number of loops not the scale.
Adrian Rosebrock
Thanks for sharing Vivek. So just to clarify, nLevel is used to control the maximum number of layers of the image pyramid? Also, does nLevel work for the Python + OpenCV bindings or just for C++?
Vivek
Hi Adrian,
if you look at hog.cpp file, it will show how the nlevel is used.
our test in python shows that it does work the way it is defined..
if you set nlevel too low say 4 and scale 1.01, you will see no small figures will be detected.
Experiment by changing the nlevel and scale to it how it work.
Here is a code snippet from hog.cpp. I did not see any default value.. so the loop will continue till the size of image becomes smaller than the window.
Adrian Rosebrock
Thanks for the clarification! I’ll be sure to play with this parameter as well. It seems like a nice way to compliment the scale.
Bob Estes
I don’t see that nlevels is exposed at the python level.
If so, you’d have to recompile openCV to change it, right?
Ulrich
Hello Adrian,
Tanks a lot for this blog. I read it carefully and tried out your code with own pictures and videos. It works great!
Do you have also a python HoG implementation which is rotation invariant? I try to detect pedestriants which are not upright (due to moving/ tilted camera).
Adrian Rosebrock
By definition, HOG is not meant to be rotation invariant. If you know how the camera is titled, simply deskew it by rotating it by theta degrees.
Maya2
Please how do apply this code on videos ?
Ulrich
Hello Adrian,
Thanks for your answer, this is a good idea and helps for many cases. But in some special cases I do not know how the angle theta is, due to sliding camera motion.
I guess an other posibility could be to train the SVM with tilted exaple images. But therefor I think a quadratic window would be better than the normaly used upright 128:64 window.
Is it possible to change the window size in the OpenCV SVM database?
Is it possible to add HoG descriptors which are analyzed from tilted examples (pedestriants) into the existing SVM database? Or is it necessary to create your own SVM?
Do you have a blog how to expand a SVM or how to create a own SVM?
Is this possibly described in one of your books?
It would be great when you can give me some answers and hints regarding my problem.
Best regards Ulrich
Adrian Rosebrock
If you decide to create additional data samples by titling your images, then you’ll need to train a HOG detector for each set of rotations. Keep in mind that HOG, by definition, is sensitive to rotation.
The pedestrian detector that OpenCV ships with is pre-trained, so you can’t adjust it. In your case, you would need to train your own SVM.
I detail the steps involved in how to train a custom object detector in this post. You can find the source code implementation of the HOG + Linear SVM detector inside the PyImageSearch Gurus course.
Anthony
Hi Adrian, I really need to thank you for all those amazing posts, it really is a great job!
Because I liked your posts, I tried to reproduce it at home, using a Raspberry Pi 2 with Python 2.7.9 and OpenCV 3.1.0.
I’m doing real-time person detection with this Rpi and it’s working well. My problem is that I can’t find a way to count the number of person when performing hog.detectMultiScale. The return values of this function gives us the location and weights of detected people but not the exact number of these people.
Do you have any idea of implementing it?
Adrian Rosebrock
The value returned by
hog.detectMultiScale
is just a list. This list represents the number of people in the image. Therefore, to get the total number of people in the image, just take thelen(rects)
.Anthony
Thanks you very much, this helped me a lot!
Keep going updating this blog, it’s wonderfull!
Arpit Solanki
thank you for this great post. with your approach i observed that it has a lot of false positives. one example of a false positive is that i tested it on a photo with a man and dog (front view) and it detected both of them as person. can you please help me solving this kind of issue?
Adrian Rosebrock
Since the classifier is pre-trained, you unfortunately cannot apply hard-negative mining as in the HOG + Linear SVM pipeline. Instead, you’ll need to try tuning the parameters of
detectMultiScale
. To start, I would work with the scale factor and try to get that as large as possible without hurting true-positive detection.Imran
Just wondering how to incorporate detection with different postures (sitting, crawling etc) in the framework of HoG descriptors? Any work done in this regard?
Adrian Rosebrock
You would essentially need to train a separate HOG + Linear SVM detector for each of the postures you wanted to detect.
Paulo
Hi Adrian,
Based on your experience, what technique you indicate to detect the pattern of heads and shoulders top view?
Best regards Paulo
Adrian Rosebrock
It really depends on your dataset. I would first examine the images in the dataset and determine how much variance in appearance the heads and shoulders have. Depending on how much variance there is, I’d make a decision. For similar images with low variance I’d likely use HOG + Linear SVM. For lots of variance I’d start to consider more advanced approaches like CNNs.
Paulo
Hi Adrian, Thanks for answering.
In my dataset is between 2 and 6 people walking together to pass through a door with a width of 60 in (1.5m). I tested the Hough transform to detect heads, but the result was not satisfactory. If I use the CNN (Convolutional Neural Network), which of your posts you recommend to start?
Best regards Paulo
Adrian Rosebrock
Is your camera fixed and non-moving? And I assume all images are coming from the same camera sensor? If so, I think HOG + Linear SVM would likely be enough here.
Paulo
Thanks Adrian!
My camera is fixed.
From this code, how do I adapt it for offline training? What should I change?
Thanks…
Adrian Rosebrock
I demonstrate how to train HOG + Linear SVM detectors from scratch inside the PyImageSearch Gurus course. I would suggest starting there.
Wanderson
Dear Adrian,
I would like to make a newbie question. Whenever I see talk about HOG detector, the classifier SVM is involved. The descriptor HOG should always be linked to a classifier? Or I can detect foreground objects with blobs analysis.
Thanks,
Wanderson
Adrian Rosebrock
You typically see SVMs, in particular Linear SVMs, because they are very fast. You could certainly use a different classifier if you wished.
James Brown
Hello, Adrian.
Your post is really amazing.
I have a question about HOG.
Is it possible to extract the human feature by removing background on selected rectangular area?
I am researching the human body recognition project and I really hope you guide me.
Thank you very much.
You are super!
Adrian Rosebrock
You would actually train a custom object detector using the HOG + Linear SVM method. OpenCV actually ships with a pre-trained human detector that you can try out.
John Beale
Thank you for this great blog. In your examples showing the foreground woman and background child, the green bounding box cuts through the woman’s forehead, so I’m assuming the HOG detector found her legs and torso, but missed her head(?) In other cases, the box is well centered and completely contains all the pixels showing the human figure, but includes considerable extra background also. Is this algorithm intrinsically that “fuzzy” about the precise outline, or can it be tuned to more closely match the actual boundaries of the person? Thanks again!
Adrian Rosebrock
The algorithm itself isn’t “fuzzy”, it’s simply the step size of the sliding window and the size of the image pyramid. I would suggest reading this post on the HOG + Linear SVM detector to better understand how the detector works.
Yonatan
a comment regarding the hitThreshold parameter.
It should represent the minimum Euclidean distance between the input HOG features and the classifying plane of the SVM, meaning that only if the SVM result exceeds this threshold, the detection is positive. (and if you set this threshold to small negative values, you get a lot of false positive windows).
Saeed
Hi Adrian,
I read “perform pedestrian detection” and current posts and there is one point that I cannot understand.
Your sliding window’s size is fixed to be 128×64 and all the features are obtained from this window in any scale. However, when the targets are detected the boxes have different sizes. I believe all the boxes should be 128×64 but they are not. Could you please describe what causes this?
Thank you in advance for your comment.
Adrian Rosebrock
You can have different sized bounding boxes in scale space due to image pyramids. Image pyramids allow you to detect object at varying scales of the image, but as you’ll notice each bounding box as the same aspect ratio.
leo
Hi, can explain me please, [ -i ] and [ –images ], I’m new in this area
please help me
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument(“-i”, “–images”, required=True, help=”path to images directory”)
args = vars(ap.parse_args())
Adrian Rosebrock
Hi Leo — I would highly suggest that you spend some time reading up on command line arguments and how they work.
Ashutosh
Dear Adrian,
Very useful blog, Thank you for drafting contents precisely.
I was checking the GPU version of the detectMultiScale at
http://docs.opencv.org/2.4/modules/gpu/doc/object_detection.html#gpu-hogdescriptor-detectmultiscale
But could not understand as to why padding is (0,0).
“padding – Mock parameter to keep the CPU interface compatibility. It must be (0,0).”
In that case, how to detect pedestrian at the edge of frame?
Thanks in advance.
Adrian Rosebrock
The GPU functionality of OpenCV is (unfortunately) only for C/C++. There are not Python bindings for it, so I unfortunately haven’t had a chance to play around with the GPU functions and do not have any insight there.
Rob
Hi Adrian,
if I want to use HOG + SVM as Traffic Sign detector, how should I do it?
Should I train a detector for each sign, or should i build a general detector for all signs and then to distinguish the signs with another method? I want to do it in realtime, do more detectors increase the calculation effort proportionally?
Thanks in advance.
Adrian Rosebrock
You would want to train a detector for each sign. More detectors will increase the amount of time it takes to classify a given image, but the benefit is that your detections will be more accurate. Please see the PyImageSearch Gurus course for an example of building a traffic sign detector.
Rob
I there a way to use detectmultiscale to distinguish between several objectclasses. I also tried to implement my own scaling and slinding windows and then use predict() to recognize the object. For this purpose i trained a linear Svm and labeled the data. Its working fine, but its so much slower than detectmultiscale.
Adrian Rosebrock
HOG + Linear SVM detectors work best when they are binary (i.e., detecting one class label for each detector). The detectMultiScale function in OpenCV only works with one class. You can implement your own method (as you’ve done), but it will be much slower in Python.
Rob
Okay thank you, i’m trying that. When i run detectmultiscale or predict, that code uses only 25% of my processor (on rpi 3). Is multiprocessing possible with these methods? How can I achieve this?
Adrian Rosebrock
As far as detectMultiScale goes, unfortunately there aren’t many optimizations on the Python side of things. If you wanted to code in C++, you could access the GPU via detectMultiScale for added speed.
Nashwan
Hi Adrian;
when i run this code tell me this error
detectmultiscale.py: error: argument -i/–image is required
i use opencv 3.0 and python 2.7 on windows 10
i’m waiting for your help…
Adrian Rosebrock
Please read up on command line arguments and how to use them before you continue.
mukesh
hi Adrian,
i tried
(rects, weights) = hog.detectMultiScale(image, winStride=winStride,
padding=padding, scale=args[“scale”], useMeanshiftGrouping=meanShift)
and when i printed rects and weights i got empty tuples.
i m beginner and need some help.
waiting for your help.
Adrian Rosebrock
If you did not obtain any bounding boxes then the parameters to
.detectMultiScale
need some tuning. Your image might also contain poses that are not suitable for the pre-trained pedestrian detector provided with OpenCV.ramdan
Hi Adrian
How to train the HOG descriptor ?
Adrian Rosebrock
I detail the steps to train a HOG + Linear SVM detector here. I then demonstrate how to implement the HOG + Linear SVM detector inside the PyImageSearch Gurus course.
Sunil
Nice effort put into the article Adrian. Is there any relation between minimum and maximum size possible to detect with the parameters of the hog/svm detector of open cv ?
Adrian Rosebrock
I’m not sure what you mean by minimum/maximum size. Are you referring to the object you’re trying to detect? The HOG window? Keep in mind that we use image pyramids to find objects at varying scales in an image. You might need to upscale your image before applying the image pyramid + sliding window to detect very small objects in the background.
Sunil
Yeah, sorry for not being clear, I was wondering if there is a relation between the max/min object size which can be detected in a given image and the size of HOG window used. Actually I am trying to see whether increasing the resolution by a factor of two in each dimension has some positive effect on object detection.
Adrian Rosebrock
Increasing the resolution will enable you to detect objects that would otherwise be too small for the sliding window to capture. The downside is that the HOG + Linear SVM detector now has more data to process, thus making it substantially slower.
alberto
Hello,
I’ve trained my own HOG detector using the command “opencv_traincascade” with the “-featureType HOG” flag on and it succefully generated a .xml file as a HOG detector.
How can I implement my own XML file on the functions “cv2.HOGDescriptor() hog.setSVMDetector()” ? So I can test my HOGdetector in action.
I have only found working examples of the default people dectector “cv2.HOGDescriptor_getDefaultPeopleDetector()”
Thanks,
Alberto
Claude
Hi Alberto,
I have my SVM in yml, and then use
hog = cv2.HOGDescriptor(
IMAGE_SIZE, BLOCK_SIZE, BLOCK_STRIDE, CELL_SIZE, NR_BINS)
svm = cv2.ml.SVM_load(“trained_svm2.yml”)
hog.setSVMDetector(svm.getSupportVectors())
Maybe it will “just work” with an xml file as well
Mauro
Hi Adrian, nice job!
I’ve used your script to detect uman with picture saved by Motion in Debian..
It work very well, but sometimes Hog detect Cat in the image, i’ve tried with some combinations values of scale, padding and winstrides but without success.
There is a way to set Hog.multidetect to ignore a object too smal than a value?
I’ve played with hitThreshold and finalThreshold, but it dont do exact it..
Thanks a lot,
Best Regards
Mauro
Adrian Rosebrock
I would suggest looping over the
weights
andrects
together and discarding any weight scores that are too low (you’ll likely have to define what “too small” is yourself).ziyi
Hi, i am running the hog descriptor on the exact same image as in your example, with default params and default people descriptor,
but my speed is very slow, 800+ms per frame
my PC is i7 4 core, is there anything wrong i am doing ? i cant see why your one took under 0.09 seconds
Adrian Rosebrock
It sounds like you may have compiled and installed OpenCV without optimization libraries such as BLAS. How did you install OpenCV on your system? Did you follow one of my tutorials here on PyImageSearch?
Irum Zahra
Hi Adrian !! This tutorial is so helpful. You are doing a wonderful job. But I have a question. Can you guide me how can I increase the distance from which it can detect pedestrians? What perimeters should I change? I want to detect pedestrians from 30 feet.
Tjado
Hey Adrian, very nice tutorial, thanks!
I tried to create a function for locating faces, wich calls the detectMultiScaleMethod and returns the result.
Now it should be callable with different parameters, but if i try to hand over the minSize tupel as default parameters the following exception occurs:
TypeError: Argument given by name (‘flags’) and position (4)
def locate_faces(image, scaleFactor=2, minNeighbors=4, minSize=(50, 50)):
face_one = faceDet.detectMultiScale(image, scaleFactor, minNeighbors, minSize,
flags=cv2.CASCADE_SCALE_IMAGE)
return face_one
Why?
Roberto O
Hi Adrian, first of all, thanks for sharing your expertise and knowledge with everyone. In my own experience, learning about computer vision and opencv can be quite a challenge and very frustrating when you can’t find useful information about some topic. Once again: Thanks a lot!
So… I have a question about real-time implementation. I’ve been “playing around” with winStride and Scale parameters and I managed to get real-time video feedback, nevertheless I can’t get ANY detection in any frame. I think I had stumble upon a wall and I can’t figure out how to get this working. If you could give me some tips about tuning those parameters in a way that pedestrian detection can be accomplished for my real-time application, I would appreciate it A LOT. Thanks in advance. See ya!
Adrian Rosebrock
Hey Roberto — thanks for the comment, I’m glad you’re finding the PyImageSearch blog helpful!
To address your question:
Are you using the OpenCV HOG pedestrian detector covered in this post? Or are you using your own model?
Keep in mind that the window stride and scale are important not only for speed, but for obtaining the actual detections as well. You may need to sacrifice speed for accuracy.
GabriellaK
Hi, Your posts are very helpful for me.
I have a question. I’m trying to detect people from realtime webcam and I’m using this code https://programtalk.com/vs2/python/3138/football-stats/test_scripts/people_detect.py/ , but it’s not so good as you said 🙁
But what do you mean with “resizing your image to be as small as possible”? What function have I to use?
imutils.resize() ?
Hope you will answer soon 😀
Adrian Rosebrock
You can use either imutils.resize or cv2.resize to resize your image.
Jason
Hi all,
I’ve learned a lot from these posts and I’ve spent some time trying to write my own implementations of SVM and HOG to gain a better understanding of the concepts. Some parts failed, some works but slow, and some actually ends up being better than the reference I’m using. So I’d like to share with you the “better” part: a vectorized implementation of HOG features extraction using only numpy+scipy.
To put is short: it returns HOG feature vectors in all sliding windows on an image in one go, and tested on an 512×512 image with a window size of 200×200, this speeds up wrt a native sliding window + skimage’s hog function by 20~30 times. For a single image, one can just set the window size same as image size, and still there is a 20 – 30 % speed gain wrt skimage.
The link to the git repo: https://github.com/Xunius/HOG_python.
Any feedback are appreciated.
Adrian Rosebrock
Thanks for sharing, Jason!
David Wilson Romero Guzman
Hey Adrian,
Thank you very much for your post! Really good 🙂
I have a problem with the detectmultiScale() function. I am developing a object recognizer of multiple objects. For this, I trained n binary SVM (object_i/no_object_i). On the test set (with patches of the same size) i get accuracy of around 90%, which is quite ok. However, when I use them to detect objects in a bigger image (i.e. with MultiScale() ) regardless of the model I use, i get a pretty window right in the middle of the image :/
– Do you have any Idea of what could be the issue here?
– I used the detect() function and in this case I get the exact opposite situation. squares everywhere.
Have you any Idea of what could be the issue here?
Best Regards,
David
Adrian Rosebrock
Congrats on training your own model David, that’s great. However, I’m not sure why your detector would be falsely reporting a detection in the middle of the image each and every time. That may be an issue with your training data but unfortunately I’m not sure what the root cause is.
Alex
Hi Adrian,
I`ve tried it with mp4 video, but it doesn’t works, what should I change to detect people on video?? maybe I should make something with imutils.resize()??
Adrian Rosebrock
Hey Alex — could you be a bit more specific when you say “it doesn’t work”? That specifically does not work? Are people not detected? Does the code throw an error?
Alex
people not detected
Adrian Rosebrock
Haar cascades and HOG + Linear SVM detectors are not very good at handling changes to rotation and orientation. My guess is that your input images/frames differ significantly from what the model is trained on. You may want to try a deep learning based object detector.
kritesh
here, the code working well when the person is vertical ,while person sleeping or horizontal this doesn’t work shows zero detection……..
what is perfect algo. satisfies both cases
Adrian Rosebrock
There is no such thing as a “perfect detection algorithm”. Deep learning-based object detectors may help you though.
Vikran
Hi Adrian,
I am trying to detect the object using tensorflow API by training the models and once the object is detected, i am trying to blur that particular detected part. Can we able to pass the detected object as input to cv2.CascadeClassifier?.
Adrian Rosebrock
Hey Vikran — I’m a bit confused by your comment. If you’re using the TensorFlow Object Detection API to detect an object why would you want to further classify it? You already have the detected object bounding box and labels.
roja
hi Adrian,
I have some code that is working well ,if i resize the image …..if i dont resize the image it is not working properly…..is it necessary to resize the image before giving it to HOG ?
Adrian Rosebrock
Yes, you must resize the image prior to passing it into HOG. HOG requires a fixed image/ROI dimensions, otherwise your output feature vector dimensions could be different (depending on your implementation).
silver
Is there any way to improve accuracy when the size of person is very small in the image? It is working very well with a clear and reasonable size of a person, however, my image has low quality and the size of person is very tiny.
Levi
is there a way to combine hog with the last layers of YOLO network to perform object detection
Adrian Rosebrock
No, and there’s not really a reason to do that either. Use one or the other.