Last updated on July 8, 2021.
In this tutorial you will learn how to build a “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time.
Building a person counter with OpenCV has been one of the most-requested topics here on the PyImageSearch and I’ve been meaning to do a blog post on people counting for a year now — I’m incredibly thrilled to be publishing it and sharing it with you today.
Enjoy the tutorial and let me know what you think in the comments section at the bottom of the post!
A dataset of videos with people moving around is crucial for a people counter. It allows the model to learn to detect and track people across frames, thereby counting the number of people accurately.
Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.
Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.
You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.
Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.
To get started building a people counter with OpenCV, just keep reading!
- Update July 2021: Added section on how to improve the efficiency, speed, and FPS throughput rate of the people counter by using multi-object tracking spread across multiple processes/cores.
OpenCV People Counter with Python
In the first part of today’s blog post, we’ll be discussing the required Python packages you’ll need to build our people counter.
From there I’ll provide a brief discussion on the difference between object detection and object tracking, along with how we can leverage both to create a more accurate people counter.
Afterwards, we’ll review the directory structure for the project and then implement the entire person counting project.
Finally, we’ll examine the results of applying people counting with OpenCV to actual videos.
Required Python libraries for people counting
In order to build our people counting applications, we’ll need a number of different Python libraries, including:
Additionally, you’ll also want to access the “Downloads” section of this blog post to retrieve my source code which includes:
- My special
pyimagesearch
module which we’ll implement and use later in this post - The Python driver script used to start the people counter
- All example videos used here in the post
I’m going to assume you already have NumPy, OpenCV, and dlib installed on your system.
If you don’t have OpenCV installed, you’ll want to head to my OpenCV install page and follow the relevant tutorial for your particular operating system.
If you need to install dlib, you can use this guide.
Finally, you can install/upgrade your imutils via the following command:
$ pip install --upgrade imutils
Understanding object detection vs. object tracking
There is a fundamental difference between object detection and object tracking that you must understand before we proceed with the rest of this tutorial.
When we apply object detection we are determining where in an image/frame an object is. An object detector is also typically more computationally expensive, and therefore slower, than an object tracking algorithm. Examples of object detection algorithms include Haar cascades, HOG + Linear SVM, and deep learning-based object detectors such as Faster R-CNNs, YOLO, and Single Shot Detectors (SSDs).
An object tracker, on the other hand, will accept the input (x, y)-coordinates of where an object is in an image and will:
- Assign a unique ID to that particular object
- Track the object as it moves around a video stream, predicting the new object location in the next frame based on various attributes of the frame (gradient, optical flow, etc.)
Examples of object tracking algorithms include MedianFlow, MOSSE, GOTURN, kernalized correlation filters, and discriminative correlation filters, to name a few.
If you’re interested in learning more about the object tracking algorithms built into OpenCV, be sure to refer to this blog post.
Combining both object detection and object tracking
Highly accurate object trackers will combine the concept of object detection and object tracking into a single algorithm, typically divided into two phases:
- Phase 1 — Detecting: During the detection phase we are running our computationally more expensive object tracker to (1) detect if new objects have entered our view, and (2) see if we can find objects that were “lost” during the tracking phase. For each detected object we create or update an object tracker with the new bounding box coordinates. Since our object detector is more computationally expensive we only run this phase once every N frames.
- Phase 2 — Tracking: When we are not in the “detecting” phase we are in the “tracking” phase. For each of our detected objects, we create an object tracker to track the object as it moves around the frame. Our object tracker should be faster and more efficient than the object detector. We’ll continue tracking until we’ve reached the N-th frame and then re-run our object detector. The entire process then repeats.
The benefit of this hybrid approach is that we can apply highly accurate object detection methods without as much of the computational burden. We will be implementing such a tracking system to build our people counter.
Project structure
Let’s review the project structure for today’s blog post. Once you’ve grabbed the code from the “Downloads” section, you can inspect the directory structure with the tree
command:
$ tree --dirsfirst . ├── pyimagesearch │ ├── __init__.py │ ├── centroidtracker.py │ └── trackableobject.py ├── mobilenet_ssd │ ├── MobileNetSSD_deploy.caffemodel │ └── MobileNetSSD_deploy.prototxt ├── videos │ ├── example_01.mp4 │ └── example_02.mp4 ├── output │ ├── output_01.avi │ └── output_02.avi └── people_counter.py 4 directories, 10 files
Zeroing in on the most-important two directories, we have:
pyimagesearch/
: This module contains the centroid tracking algorithm. The centroid tracking algorithm is covered in the “Combining object tracking algorithms” section, but the code is not. For a review of the centroid tracking code (centroidtracker.py
) you should refer to the first post in the series.mobilenet_ssd/
: Contains the Caffe deep learning model files. We’ll be using a MobileNet Single Shot Detector (SSD) which is covered at the top of this blog post in the section, “Single Shot Detectors for object detection”.
The heart of today’s project is contained within the people_counter.py
script — that’s where we’ll spend most of our time. We’ll also review the trackableobject.py
script today.
Combining object tracking algorithms
To implement our people counter we’ll be using both OpenCV and dlib. We’ll use OpenCV for standard computer vision/image processing functions, along with the deep learning object detector for people counting.
We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.
I’ll be including a deep dive into dlib’s object tracking algorithm in next week’s post.
Along with dlib’s object tracking implementation, we’ll also be using my implementation of centroid tracking from a few weeks ago. Reviewing the entire centroid tracking algorithm is outside the scope of this blog post, but I’ve included a brief overview below.
At Step #1 we accept a set of bounding boxes and compute their corresponding centroids (i.e., the center of the bounding boxes):
The bounding boxes themselves can be provided by either:
- An object detector (such as HOG + Linear SVM, Faster R- CNN, SSDs, etc.)
- Or an object tracker (such as correlation filters)
In the above image you can see that we have two objects to track in this initial iteration of the algorithm.
During Step #2 we compute the Euclidean distance between any new centroids (yellow) and existing centroids (purple):
The centroid tracking algorithm makes the assumption that pairs of centroids with minimum Euclidean distance between them must be the same object ID.
In the example image above we have two existing centroids (purple) and three new centroids (yellow), implying that a new object has been detected (since there is one more new centroid vs. old centroid).
The arrows then represent computing the Euclidean distances between all purple centroids and all yellow centroids.
Once we have the Euclidean distances we attempt to associate object IDs in Step #3:
In Figure 3 you can see that our centroid tracker has chosen to associate centroids that minimize their respective Euclidean distances.
But what about the point in the bottom-left?
It didn’t get associated with anything — what do we do?
To answer that question we need to perform Step #4, registering new objects:
Registering simply means that we are adding the new object to our list of tracked objects by:
- Assigning it a new object ID
- Storing the centroid of the bounding box coordinates for the new object
In the event that an object has been lost or has left the field of view, we can simply deregister the object (Step #5).
Exactly how you handle when an object is “lost” or is “no longer visible” really depends on your exact application, but for our people counter, we will deregister people IDs when they cannot be matched to any existing person objects for 40 consecutive frames.
Again, this is only a brief overview of the centroid tracking algorithm.
Note: For a more detailed review, including an explanation of the source code used to implement centroid tracking, be sure to refer to this post.
Creating a “trackable object”
In order to track and count an object in a video stream, we need an easy way to store information regarding the object itself, including:
- It’s object ID
- It’s previous centroids (so we can easily to compute the direction the object is moving)
- Whether or not the object has already been counted
To accomplish all of these goals we can define an instance of TrackableObject
— open up the trackableobject.py
file and insert the following code:
class TrackableObject: def __init__(self, objectID, centroid): # store the object ID, then initialize a list of centroids # using the current centroid self.objectID = objectID self.centroids = [centroid] # initialize a boolean used to indicate if the object has # already been counted or not self.counted = False
The TrackableObject
constructor accepts an objectID
+ centroid
and stores them. The centroids variable is a list because it will contain an object’s centroid location history.
The constructor also initializes counted
as False
, indicating that the object has not been counted yet.
Implementing our people counter with OpenCV + Python
With all of our supporting Python helper tools and classes in place, we are now ready to built our OpenCV people counter.
Open up your people_counter.py
file and insert the following code:
# import the necessary packages from pyimagesearch.centroidtracker import CentroidTracker from pyimagesearch.trackableobject import TrackableObject from imutils.video import VideoStream from imutils.video import FPS import numpy as np import argparse import imutils import time import dlib import cv2
We begin by importing our necessary packages:
- From the
pyimagesearch
module, we import our customCentroidTracker
andTrackableObject
classes. - The
VideoStream
andFPS
modules fromimutils.video
will help us to work with a webcam and to calculate the estimated Frames Per Second (FPS) throughput rate. - We need
imutils
for its OpenCV convenience functions. - The
dlib
library will be used for its correlation tracker implementation. - OpenCV will be used for deep neural network inference, opening video files, writing video files, and displaying output frames to our screen.
Now that all of the tools are at our fingertips, let’s parse command line arguments:
# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-i", "--input", type=str, help="path to optional input video file") ap.add_argument("-o", "--output", type=str, help="path to optional output video file") ap.add_argument("-c", "--confidence", type=float, default=0.4, help="minimum probability to filter weak detections") ap.add_argument("-s", "--skip-frames", type=int, default=30, help="# of skip frames between detections") args = vars(ap.parse_args())
We have six command line arguments which allow us to pass information to our people counter script from the terminal at runtime:
--prototxt
: Path to the Caffe “deploy” prototxt file.--model
: The path to the Caffe pre-trained CNN model.--input
: Optional input video file path. If no path is specified, your webcam will be utilized.--output
: Optional output video path. If no path is specified, a video will not be recorded.--confidence
: With a default value of0.4
, this is the minimum probability threshold which helps to filter out weak detections.--skip-frames
: The number of frames to skip before running our DNN detector again on the tracked object. Remember, object detection is computationally expensive, but it does help our tracker to reassess objects in the frame. By default we skip30
frames between detecting objects with the OpenCV DNN module and our CNN single shot detector model.
Now that our script can dynamically handle command line arguments at runtime, let’s prepare our SSD:
# initialize the list of class labels MobileNet SSD was trained to # detect CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] # load our serialized model from disk print("[INFO] loading model...") net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
First, we’ll initialize CLASSES
— the list of classes that our SSD supports. This list should not be changed if you’re using the model provided in the “Downloads”. We’re only interested in the “person” class, but you could count other moving objects as well (however, if your “pottedplant”, “sofa”, or “tvmonitor” grows legs and starts moving, you should probably run out of your house screaming rather than worrying about counting them! ? ).
On Line 38 we load our pre-trained MobileNet SSD used to detect objects (but again, we’re just interested in detecting and tracking people, not any other class). To learn more about MobileNet and SSDs, please refer to my previous blog post.
From there we can initialize our video stream:
# if a video path was not supplied, grab a reference to the webcam if not args.get("input", False): print("[INFO] starting video stream...") vs = VideoStream(src=0).start() time.sleep(2.0) # otherwise, grab a reference to the video file else: print("[INFO] opening video file...") vs = cv2.VideoCapture(args["input"])
First we handle the case where we’re using a webcam video stream (Lines 41-44). Otherwise, we’ll be capturing frames from a video file (Lines 47-49).
We still have a handful of initializations to perform before we begin looping over frames:
# initialize the video writer (we'll instantiate later if need be) writer = None # initialize the frame dimensions (we'll set them as soon as we read # the first frame from the video) W = None H = None # instantiate our centroid tracker, then initialize a list to store # each of our dlib correlation trackers, followed by a dictionary to # map each unique object ID to a TrackableObject ct = CentroidTracker(maxDisappeared=40, maxDistance=50) trackers = [] trackableObjects = {} # initialize the total number of frames processed thus far, along # with the total number of objects that have moved either up or down totalFrames = 0 totalDown = 0 totalUp = 0 # start the frames per second throughput estimator fps = FPS().start()
The remaining initializations include:
writer
: Our video writer. We’ll instantiate this object later if we are writing to video.W
andH
: Our frame dimensions. We’ll need to plug these intocv2.VideoWriter
.ct
: OurCentroidTracker
. For details on the implementation ofCentroidTracker
, be sure to refer to my blog post from a few weeks ago.trackers
: A list to store the dlib correlation trackers. To learn about dlib correlation tracking stay tuned for next week’s post.trackableObjects
: A dictionary which maps anobjectID
to aTrackableObject
.totalFrames
: The total number of frames processed.totalDown
andtotalUp
: The total number of objects/people that have moved either down or up. These variables measure the actual “people counting” results of the script.fps
: Our frames per second estimator for benchmarking.
Note: If you get lost in the while
loop below, you should refer back to this bulleted listing of important variables.
Now that all of our initializations are taken care of, let’s loop over incoming frames:
# loop over frames from the video stream while True: # grab the next frame and handle if we are reading from either # VideoCapture or VideoStream frame = vs.read() frame = frame[1] if args.get("input", False) else frame # if we are viewing a video and we did not grab a frame then we # have reached the end of the video if args["input"] is not None and frame is None: break # resize the frame to have a maximum width of 500 pixels (the # less data we have, the faster we can process it), then convert # the frame from BGR to RGB for dlib frame = imutils.resize(frame, width=500) rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # if the frame dimensions are empty, set them if W is None or H is None: (H, W) = frame.shape[:2] # if we are supposed to be writing a video to disk, initialize # the writer if args["output"] is not None and writer is None: fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = cv2.VideoWriter(args["output"], fourcc, 30, (W, H), True)
We begin looping on Line 76. At the top of the loop we grab the next frame
(Lines 79 and 80). In the event that we’ve reached the end of the video, we’ll break
out of the loop (Lines 84 and 85).
Preprocessing the frame
takes place on Lines 90 and 91. This includes resizing and swapping color channels as dlib requires an rgb
image.
We grab the dimensions of the frame
for the video writer
(Lines 94 and 95).
From there we’ll instantiate the video writer
if an output path was provided via command line argument (Lines 99-102). To learn more about writing video to disk, be sure to refer to this post.
Now let’s detect people using the SSD:
# initialize the current status along with our list of bounding # box rectangles returned by either (1) our object detector or # (2) the correlation trackers status = "Waiting" rects = [] # check to see if we should run a more computationally expensive # object detection method to aid our tracker if totalFrames % args["skip_frames"] == 0: # set the status and initialize our new set of object trackers status = "Detecting" trackers = [] # convert the frame to a blob and pass the blob through the # network and obtain the detections blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H), 127.5) net.setInput(blob) detections = net.forward()
We initialize a status
as “Waiting” on Line 107. Possible status
states include:
- Waiting: In this state, we’re waiting on people to be detected and tracked.
- Detecting: We’re actively in the process of detecting people using the MobileNet SSD.
- Tracking: People are being tracked in the frame and we’re counting the
totalUp
andtotalDown
.
Our rects
list will be populated either via detection or tracking. We go ahead and initialize rects
on Line 108.
It’s important to understand that deep learning object detectors are very computationally expensive, especially if you are running them on your CPU.
To avoid running our object detector on every frame, and to speed up our tracking pipeline, we’ll be skipping every N frames (set by command line argument --skip-frames
where 30
is the default). Only every N frames will we exercise our SSD for object detection. Otherwise, we’ll simply be tracking moving objects in-between.
Using the modulo operator on Line 112 we ensure that we’ll only execute the code in the if-statement every N frames.
Assuming we’ve landed on a multiple of skip_frames
, we’ll update the status
to “Detecting” (Line 114).
Then we initialize our new list of trackers
(Line 115).
Next, we’ll perform inference via object detection. We begin by creating a blob
from the image, followed by passing the blob
through the net to obtain detections
(Lines 119-121).
Now we’ll loop over each of the detections
in hopes of finding objects belonging to the “person” class:
# loop over the detections for i in np.arange(0, detections.shape[2]): # extract the confidence (i.e., probability) associated # with the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by requiring a minimum # confidence if confidence > args["confidence"]: # extract the index of the class label from the # detections list idx = int(detections[0, 0, i, 1]) # if the class label is not a person, ignore it if CLASSES[idx] != "person": continue
Looping over detections
on Line 124, we proceed to grab the confidence
(Line 127) and filter out weak results + those that don’t belong to the “person” class (Lines 131-138).
Now we can compute a bounding box for each person and begin correlation tracking:
# compute the (x, y)-coordinates of the bounding box # for the object box = detections[0, 0, i, 3:7] * np.array([W, H, W, H]) (startX, startY, endX, endY) = box.astype("int") # construct a dlib rectangle object from the bounding # box coordinates and then start the dlib correlation # tracker tracker = dlib.correlation_tracker() rect = dlib.rectangle(startX, startY, endX, endY) tracker.start_track(rgb, rect) # add the tracker to our list of trackers so we can # utilize it during skip frames trackers.append(tracker)
Computing our bounding box
takes place on Lines 142 and 143.
Then we instantiate our dlib correlation tracker
on Line 148, followed by passing in the object’s bounding box coordinates to dlib.rectangle
, storing the result as rect
(Line 149).
Subsequently, we start tracking on Line 150 and append the tracker
to the trackers
list on Line 154.
That’s a wrap for all operations we do every N skip-frames!
Let’s take care of the typical operations where tracking is taking place in the else
block:
# otherwise, we should utilize our object *trackers* rather than # object *detectors* to obtain a higher frame processing throughput else: # loop over the trackers for tracker in trackers: # set the status of our system to be 'tracking' rather # than 'waiting' or 'detecting' status = "Tracking" # update the tracker and grab the updated position tracker.update(rgb) pos = tracker.get_position() # unpack the position object startX = int(pos.left()) startY = int(pos.top()) endX = int(pos.right()) endY = int(pos.bottom()) # add the bounding box coordinates to the rectangles list rects.append((startX, startY, endX, endY))
Most of the time, we aren’t landing on a skip-frame multiple. During this time, we’ll utilize our trackers
to track our object rather than applying detection.
We begin looping over the available trackers
on Line 160.
We proceed to update the status
to “Tracking” (Line 163) and grab the object position (Lines 166 and 167).
From there we extract the position coordinates (Lines 170-173) followed by populating the information in our rects
list.
Now let’s draw a horizontal visualization line (that people must cross in order to be tracked) and use the centroid tracker to update our object centroids:
# draw a horizontal line in the center of the frame -- once an # object crosses this line we will determine whether they were # moving 'up' or 'down' cv2.line(frame, (0, H // 2), (W, H // 2), (0, 255, 255), 2) # use the centroid tracker to associate the (1) old object # centroids with (2) the newly computed object centroids objects = ct.update(rects)
On Line 181 we draw the horizontal line which we’ll be using to visualize people “crossing” — once people cross this line we’ll increment our respective counters
Then on Line 185, we utilize our CentroidTracker
instantiation to accept the list of rects
, regardless of whether they were generated via object detection or object tracking. Our centroid tracker will associate object IDs with object locations.
In this next block, we’ll review the logic which counts if a person has moved up or down through the frame:
# loop over the tracked objects for (objectID, centroid) in objects.items(): # check to see if a trackable object exists for the current # object ID to = trackableObjects.get(objectID, None) # if there is no existing trackable object, create one if to is None: to = TrackableObject(objectID, centroid) # otherwise, there is a trackable object so we can utilize it # to determine direction else: # the difference between the y-coordinate of the *current* # centroid and the mean of *previous* centroids will tell # us in which direction the object is moving (negative for # 'up' and positive for 'down') y = [c[1] for c in to.centroids] direction = centroid[1] - np.mean(y) to.centroids.append(centroid) # check to see if the object has been counted or not if not to.counted: # if the direction is negative (indicating the object # is moving up) AND the centroid is above the center # line, count the object if direction < 0 and centroid[1] < H // 2: totalUp += 1 to.counted = True # if the direction is positive (indicating the object # is moving down) AND the centroid is below the # center line, count the object elif direction > 0 and centroid[1] > H // 2: totalDown += 1 to.counted = True # store the trackable object in our dictionary trackableObjects[objectID] = to
We begin by looping over the updated bounding box coordinates of the object IDs (Line 188).
On Line 191 we attempt to fetch a TrackableObject
for the current objectID
.
If the TrackableObject
doesn’t exist for the objectID
, we create one (Lines 194 and 195).
Otherwise, there is already an existing TrackableObject
, so we need to figure out if the object (person) is moving up or down.
To do so, we grab the y-coordinate value for all previous centroid locations for the given object (Line 204). Then we compute the direction
by taking the difference between the current centroid location and the mean of all previous centroid locations (Line 205).
The reason we take the mean is to ensure our direction tracking is more stable. If we stored just the previous centroid location for the person we leave ourselves open to the possibility of false direction counting. Keep in mind that object detection and object tracking algorithms are not “magic” — sometimes they will predict bounding boxes that may be slightly off what you may expect; therefore, by taking the mean, we can make our people counter more accurate.
If the TrackableObject
has not been counted
(Line 209), we need to determine if it’s ready to be counted yet (Lines 213-222), by:
- Checking if the
direction
is negative (indicating the object is moving Up) AND the centroid is Above the centerline. In this case we incrementtotalUp
. - Or checking if the
direction
is positive (indicating the object is moving Down) AND the centroid is Below the centerline. If this is true, we incrementtotalDown
.
Finally, we store the TrackableObject
in our trackableObjects
dictionary (Line 225) so we can grab and update it when the next frame is captured.
We’re on the home-stretch!
The next three code blocks handle:
- Display (drawing and writing text to the frame)
- Writing frames to a video file on disk (if the
--output
command line argument is present) - Capturing keypresses
- Cleanup
First we’ll draw some information on the frame for visualization:
# draw both the ID of the object and the centroid of the # object on the output frame text = "ID {}".format(objectID) cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 255, 0), -1) # construct a tuple of information we will be displaying on the # frame info = [ ("Up", totalUp), ("Down", totalDown), ("Status", status), ] # loop over the info tuples and draw them on our frame for (i, (k, v)) in enumerate(info): text = "{}: {}".format(k, v) cv2.putText(frame, text, (10, H - ((i * 20) + 20)), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
Here we overlay the following data on the frame:
ObjectID
: Each object’s numerical identifier.centroid
: The center of the object will be represented by a “dot” which is created by filling in a circle.info
: IncludestotalUp
,totalDown
, andstatus
For a review of drawing operations, be sure to refer to this blog post.
Then we’ll write the frame
to a video file (if necessary) and handle keypresses:
# check to see if we should write the frame to disk if writer is not None: writer.write(frame) # show the output frame cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # increment the total number of frames processed thus far and # then update the FPS counter totalFrames += 1 fps.update()
In this block we:
- Write the
frame
, if necessary, to the output video file (Lines 249 and 250) - Display the
frame
and handle keypresses (Lines 253-258). If “q” is pressed, webreak
out of the frame processing loop. - Update our
fps
counter (Line 263)
We didn’t make too much of a mess, but now it’s time to clean up:
# stop the timer and display FPS information fps.stop() print("[INFO] elapsed time: {:.2f}".format(fps.elapsed())) print("[INFO] approx. FPS: {:.2f}".format(fps.fps())) # check to see if we need to release the video writer pointer if writer is not None: writer.release() # if we are not using a video file, stop the camera video stream if not args.get("input", False): vs.stop() # otherwise, release the video file pointer else: vs.release() # close any open windows cv2.destroyAllWindows()
To finish out the script, we display the FPS info to the terminal, release all pointers, and close any open windows.
Just 283 lines of code later, we are now done ?.
People counting results
To see our OpenCV people counter in action, make sure you use the “Downloads” section of this blog post to download the source code and example videos.
From there, open up a terminal and execute the following command:
$ python people_counter.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \ --input videos/example_01.mp4 --output output/output_01.avi [INFO] loading model... [INFO] opening video file... [INFO] elapsed time: 37.27 [INFO] approx. FPS: 34.42
Here you can see that our person counter is counting the number of people who:
- Are entering the department store (down)
- And the number of people who are leaving (up)
At the end of the first video you’ll see there have been 7 people who entered and 3 people who have left.
Furthermore, examining the terminal output you’ll see that our person counter is capable of running in real-time, obtaining 34 FPS throughout. This is despite the fact that we are using a deep learning object detector for more accurate person detections.
Our 34 FPS throughout rate is made possible through our two-phase process of:
- Detecting people once every 30 frames
- And then applying a faster, more efficient object tracking algorithm in all frames in between.
Another example of people counting with OpenCV can be seen below:
$ python people_counter.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \ --input videos/example_01.mp4 --output output/output_02.avi [INFO] loading model... [INFO] opening video file... [INFO] elapsed time: 36.88 [INFO] approx. FPS: 34.79
Here is a full video of the demo:
This time there have been 2 people who have entered the department store and 14 people who have left.
You can see how useful this system would be to a store owner interested in foot traffic analytics.
The same type of system for counting foot traffic with OpenCV can be used to count automobile traffic with OpenCV and I hope to cover that topic in a future blog post.
Additionally, a big thank you to David McDuffee for recording the example videos used here today! David works here with me at PyImageSearch and if you’ve ever emailed PyImageSearch before, you have very likely interacted with him. Thank you for making this post possible, David! Also a thank you to BenSound for providing the music for the video demos included in this post.
Improving our people counter application
In order to build our OpenCV people counter we utilized dlib’s correlation tracker. This method is easy to use and requires very little code.
However, our implementation is a bit inefficient — in order to track multiple objects we need to create multiple instances of the correlation tracker object. And then when we need to compute the location of the object in subsequent frames, we need to loop over all N object trackers and grab the updated position.
All of this computation would take place in the main execution thread of our script which thereby slows down our FPS rate.
An easy way to improve performance would therefore be to use multi-object tracking with dlib. That tutorial covers how to use multiprocessing and queues such that our FPS rate improves by 45%!
Note: OpenCV also implements multi-object tracking, but not with multiple processes (at least at the time of this writing). OpenCV’s multi-object method is certainly far easier to use, but without the multiprocessing capability, it doesn’t help much in this instance.
Finally, for even higher tracking accuracy (but at the expense of speed without a fast GPU), you can look into deep learning-based object trackers, such as Deep SORT, introduced by Wojke et al. in their paper, Simple Online and Realtime Tracking with a Deep Association Metric.
This method is very popular for deep learning-based object tracking and has been implemented in multiple Python libraries. I would suggest starting with this implementation.
What's next? We recommend PyImageSearch University.
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: April 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In today’s blog post we learned how to build a people counter using OpenCV and Python.
Our implementation is:
- Capable of running in real-time on a standard CPU
- Utilizes deep learning object detectors for improved person detection accuracy
- Leverages two separate object tracking algorithms, including both centroid tracking and correlation filters for improved tracking accuracy
- Applies both a “detection” and “tracking” phase, making it capable of (1) detecting new people and (2) picking up people that may have been “lost” during the tracking phase
I hope you enjoyed today’s post on people counting with OpenCV!
To download the code to this blog post (and apply people counting to your own projects), just enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Hi Adrian ! the tutorial is really great and it’s very helpful to me . however, I was wandering that is this kind of people counting can implement on raspberry pi3 ?
If you want to use just the Raspberry Pi you need to use a more efficient object detection routine. Possible methods may include:
1. Background subtraction, such as the method used in this post.
2. Haar cascades (which are less accurate, but faster than DL-based object detectors)
3. Leveraging something like the Movidius NCS to help you reach a faster FPS throughput
Additionally, for your object tracking you may want to look into using MOSSE (covered in this post) which is faster than correlation filters. Another option could be to explore using Kalman filters.
I hope that helps!
thank you so much! another question , is it possible to combine this people counting algorithm with the method you have post before which was talk inking about Raspberry Pi: Deep learning object detection with OpenCV
Yes, you can, but keep in mind that the FPS throughput rate is going to be very, very low since you’re trying to apply deep learning object detection on the Pi.
Adrian, to get better performance with raspberry pi3, do you need to use all of these methods? Or just a few? For example, you can join background subtraction with Haar Cascade?
Thank you very much!
You can join background subtraction in with a Haar cascade and then only apply the Haar cascade to the ROI regions. But realistically Haar cascades are pretty fast anyway so that may be overkill.
Thank you so much for your work and for sharing it. It’s great.
May you detail a bit more what we are suppose to do to use the software on Raspberry. I’m not very used to it so I don’t understant everything you wrote.
I’ll likely have to write another blog post on Raspberry Pi people counting — it’s too much to detail in a blog post comment.
Seems logic…
Could you give me the URL of a trusted blog where you use to go on which I will be able to find informations ?
I’ve tried the software “Footfall” but it doesn’t work.
And many blogs are just outdated concerning this subject.
Thank you for all 🙂
I don’t know of one, which is why I would have to do one here on PyImageSearch 😉
Looking forward to the Rasberry Pi people counting!
Hi, firstly, thank you for your blog it’s so awesome! Im wondering when that Raspberry Pi counter will be posted? Also can it be made into vehicles? Thank you!
Yes, you can do the same for vehicles, just swap out the “person” class for any other objects you want to detect and track. I’m honestly not sure when the Pi people counter will be posted. I have a number of other blog posts and projects I’m working on though. I hope it will be soon but I cannot guarantee it.
yes please!
Hey Adrian!
So did you write anything related to Pi and Counting algorithms?
I will be covering it in my upcoming Computer Vision and Raspberry Pi book! Make sure you’re on the PyImageSearch email list to be notified when the book goes live in a couple of months.
Hi Adrian. I have a question about Kalman filters. I wanna implement people counter on a Raspberry PI3B and I use background substraction for detection and FindCountours to enclosing in a rectangle the person position and for tracking I need to implement MOSSE o Kalman filter but here is my question. How can I track a person with those algorithms? Because each of those algorithm need to receive the position of the object but I’m detect multiple object so it will be an issue to send the correct coordinate for each object that I need to track
can this code deals with live streaming?
Yes, absolutely. Just use my VideoStream class.
Hey Adrian, awesome post. Thank you for sharing and detailing the steps. Is there a raspberry Pi post in the near future? Would love to see your approach. Thanks again, gonna check out your other stuff.
I’ll actually be covering it in my upcoming Computer Vision + Raspberry Pi book 🙂 Stay tuned, I’ll be announcing it soon.
I can hardly wait for the book. Is there a model that will reliably detect people walking in profile (passing by a camera pointed at the sidewalk)? I haven’t found the haar do this well. The Caffee you have does it well but as you mention it won’t run well on a Pi.
Is there a haar that will detect profile or a low-cost hardware that will run the Caffee?
Again – looking forward to the book!
I’ll actually be showing you how to use deep learning-based object detectors on the Pi! They will be fast enough to run in real-time and be more accurate than Haar detectors.
Great! Awesome job as always. I was trying to improve my tracking part. This is a good reference point for my application.
Thankyou Adrian!
Hi Adrian,
This is by far my Favorite blog post from you.
I was wondering if you could also do a blog/tutorial on people counting in an image and show the gender of the people. That would make up for a really interesting blog and tutorial.
I really liked your blog lesson.. Thanks so much. I’m going to convers caffe model to NCS Movidius and go to Store my friend. Hi is going to count people and recognize (age, gender and maybe emotion). I really like your Blog. I plan to buy your book. Thanks for motivation and good practic.
Thank you for the kind words, I’m happy you liked the post. I wish the best of luck to you and your friend implementing your own person counter for their store!
Sir ,
Great Work. Thanks for Sharing.
Regards,
Anirban
Thanks Anirban!
Hi Adrian, is there any specifc reason to use dlib correlation tracker instead of opencv’s 8 inbuilt trackers.Will any of those trackers will be more precise than dlib tracker?
To quote the blog post:
“We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.”
OpenCV’s CSRT tracker may be more accurate but it will be slower. Similarly, OpenCV’s MOSSE tracker will be faster but potentially less accurate.
Loved your post and with the level of explanation so you have posted hats off to you SIr! I was wandering what if we have to implement it on multiple cameras? or we have separate door/ separate camera for entrance and exit. would like to have your words on these too. Thanks in advance.
This tutorial assumes you’re using a single camera. If you’re using multiple cameras it becomes more challenging. If the viewpoint changes then your object tracker won’t be able to associate the objects. Instead, you might want to look into face recognition or even gate recognition, enabling you to associate a person with a unique ID based on more than just appearance alone.
Yes, the view point do change. As cameras will be placed on certain different places. We would like to tag the person with his face id and recognize around all the cameras using the face recognition and ID. Thank you once again.
Yeah, if the viewpoints are changing you’ll certainly want to explore face recognition and gait recognition instead.
@Adrian Thanks for the post. I would love to see any blog on gait recognition. I am doing some research on this recently. I have tried a gait recognition paper which uses CASIA-B dataset. But getting a silhouette seems to be a difficult task. I am a little bit off the topic but if you read this one, would love to know your views.
Thanks for the suggestion. I will consider covering it but I cannot guarantee if/when that may be.
I liked the article very much. in the new centers on all the inputs to put cameras and on the computer to collect information that all the people came out and no one hid in the interior.
Thanks for sharing this tutorial – last week I was trying to do something similar – do you think you can make a comment/answer on http://answers.opencv.org/question/197182/tracking-multiobjects-recognized-by-haar/ ?!
You can try to “place” a blank region on already detected car. Since the tracking method gives you location of the object in every frame, you could just move the blank region accordingly. Then you can use it to prevent Haar cascade from finding a car there. If you’re worried about overlapping cars, I suggest you adjust the size of blank region.
Does this algorithm works fine with raspberry pi based projects ? If not suggest me a effective algorithm for detecting humman presence sir . I have treid cassade method but it does not make the satisfaction .
Thank you sir , I am awaiting for ur reply
Make sure you’re reading the comments. I’ve already answered your question in my reply to Jay — it’s actually the very first comment on this post.
Thanks. God Job. How to improve the code to detect people very close?
Hey Ando — can you clarify what you mean by “very close”? Do you have any examples?
Thank you very much for these tutorials. I am new to this and I seem to be having issues getting webcam video from imutils.video. Can you provide a short test script to open the video stream from the pi camera using imutils?
Just set:
vs = VideoStream(usePiCamera=True).start()
how to link your videostream class to here? and how to run ?
is the videostream class created in the same file at there or another python file
You first need to install the “imutils” library:
$ pip install imutils
From there you can import it into your Python scripts via:
from imutils.video import VideoStream
Thanks for the wonderful explanation. It was always a pleasure to read your post. I ran your people-counting tracker but getting some random objectID while detection. For me on 2nd example videos there was 20 people going Up and 2 people coming Down. What do you recommend to remove these ambiguities ?
Hey Rohit — that is indeed strange behavior. What version of OpenCV and dlib are you using?
Running the Downloaded scripts with the default parameter values using the same input videos, I was UNABLE to match the sample output videos. I ran into the same issue as Rohit.
I played around with the confidence values and still could NOT match the results. The code is missing some detections and what looks like overstating (false positive detections?) others? Any ideas???
Nvidia TX1
OpenCV 3.3
Python 3.5 (virtual environment)
dlib==19.15.0
numpy==1.14.3
imutils==0.5.0
The videos can be viewed on my google drive:
Video 1: (My Result = Up 3, Down 8) [Actual (ground truth) Up 3 Down 7]
https://drive.google.com/open?id=1rWp-bD7WFyL39sjqiWzLEiVnwu8mF9dM
Video 2: (My Result = Up 20, Down 2) [Actual (ground truth) Up 14 Down 3]
https://drive.google.com/open?id=1T3Vaslk2UawYYD1RVF5frzHtlT30XO2X
Upgrade from OpenCV 3.3 to OpenCV 3.4 or better and it will work for you 🙂 (which I also believe you found it from other comments but I wanted to make sure)
Comment Updated (4/19): I encountered the same issue using OpenCV 3.3, but after I upgraded to OpenCV 3.4.1, my results now match the video on this blog post. I recommend upgrading to OpenCV 3.4 for anyone encountering similar detection/tracking behavior…
Rohit – I encountered the same issue using OpenCV 3.3, but after I upgraded to OpenCV 3.4.1, my results now match the video on this blog post. I recommend upgrading to OpenCV 3.4…
Hi Adrian,
Thanks for the great post!!!!. I have few questions..
1.Will this people counter work on crowded places like Airport or Railway station’s?? Will it give accurate count??
2.Can we use it for mass(crowd) counting?? Does it consider pet’s and babies??
1. Provided you can detect the full body and there isn’t too much occlusion, yes, the method will work. If you cannot detect the full body you might want to switch to face detection instead.
2. See my note above. You should also read the code to see how we filter out non-people detection 🙂
I have come across some app developers using what looks to be custom trained head detection models. Sometimes, the back of the head can be seen, other times the frontal view can be seen. I think the “head count” approach makes sense since that is how humans think when taking class attendance for example. Is head counting a better method for people counting??? Is this even possible and will the method be accurate for the back of heads???
Examples: (VION VISION)
https://www.youtube.com/watch?v=8XQyw9c23dc
https://www.youtube.com/watch?v=DcusyBUV4do
I’m reluctant to say which is “better” as that’s entirely dependent on your application and what exactly you’re trying to do. You could argue that in dense areas a “head detector” would be better than a “body detector” since the full body might not be visible. But on the other hand, having a full body detected can reduce false positives as well. Again, it’s dependent on your application.
Dear Dr Adrian,
I need a clarification please on object detection. How does the object detector distinguish between human and non-human objects.
Thank you,
Anthony of exciting Sydney
The object detector has been pre-trained to recognize a variety of classes. Line 137 also filters out non-person detections.
Thanks for your sharing.
Hi Adrian,
For the detection part, I wanted to try another network. So I went for the ssd_mobilenet_v2_coco_2018_03_29, tensorFlow version and here: https://github.com/opencv/opencv_extra/tree/master/testdata/dnn ).
Problem is I had too much detection boxes, so I used a NMS function to help me sort out things, but even after that I had too much results even with confidence at 0.3 and NMS treshold at 0.2, see an exemple here: https://www.noelshack.com/2018-33-2-1534241531-00059.png (network detection boxes are in red, NMS output boxes are in green)
Do you know why I have got some much results? Is it because I used a TensorFlow model instead of Caffe? Or is it because the network was trained with other parameters? Something changed in SSD MobileNet v2 compared to chuanqi305’s SSD mobileNet?
David
Hey David — I haven’t tested the TensorFlow model you are referring to so I’m honestly not sure why it would be throwing so many false positives like that. Try to increase your minimum confidence threshold to see if that helps resolve the issue.
Hi Adrian,
You write “we utilize our CentroidTracker instantiation to accept the list of rects , regardless of whether they were generated via object detection or object tracking” however as far as I can see, in the Object Detection fase, you don’t actually seem to populate the rects[] variable? I’ve downloaded the source as well, couldn’t find it there either.
Am I missing something?
Very valuable post throughout, looks a lot like what I am trying to achieve for my cat tracker (which you may recall from earlier correspondence).
Hey Roald — we don’t actually have to populate the list during the object detection phase. We simply create the tracker and then allow the tracker to update “rects” during the tracking phase. Perhaps that point was not clear.
I used OpenCV 3.4 for this example. As for using the Raspberry Pi, make sure you read my reply to Jay.
Great article, I have a doubt though, It could potentially be a noob question so please bare with me.
Say I use this in my shop for tracking foot count, now all the new objects are stored in a dictionary right? If i leave the code running perpetually, wont it cause errors with the memory?
If you left it running perpetually, yes, the dictionary could inflate. It’s up to you to add any “business logic” code to update the dictionary. Some people may want to store that information in a proper database as well — it’s not up to me make those decisions for people. This code is a start point for tracking foot count.
Great blog!!! its amazing how you simplify difficult concepts.
I am working on ways to identify each and every individual going through the entrance through image captured in real time using a camera(we have their passport size photos plus other labels e.g., personal identification number, department ,etc).kindly advice on how to include this multi class labels other than the ID notation you used in the example.
Will you be covering the storage of the counted individuals to the database for later retrieval?
You mean something like face recognition? I discuss how you can perform this in an unsupervised manner inside this post on face clustering.
For those who had the following error when running the script:
Traceback (most recent call last):
File “people_counter.py”, line 160, in
rect = dlib.rectangle(startX, startY, endX, endY)
Boost.Python.ArgumentError: Python argument types in
rectangle.__init__(rectangle, numpy.int32, numpy.int32, numpy.int32, numpy.int32)
did not match C++ signature:
__init__(struct _object * __ptr64, long left, long top, long right, long bottom)
__init__(struct _object * __ptr64)
please update line 160 of people_counter.py to
rect = dlib.rectangle(int(startX), int(startY), int(endX), int(endY))
Thanks for sharing, Juan! Could you let us know which version of dlib you were using as well just so we have it documented for other readers who may run into the problem?
i have the same problem with him and my version of dlib is 19.6.0
my dlib version is 19.8.1
i have the same problem with it and i have tried 19.18.0 and 19.6.0, both of them doesn’t work.
Thanks
thanks 🙂
Hi,i meet the same question,do u solve it?
As Juan said, you change Line 160 to:
rect = dlib.rectangle(int(startX), int(startY), int(endX), int(endY))
Thanks a lot! you saved me 🙂
Hi,
Your wonderful work is priceless text book. Unfortunately, my understanding is still not enough to understand the whole code. I tried to execute python files, but have an error.
Can I know how to solve it. Thank you so much
python people_counter.py –prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \
usage: people_counter.py [-h] -p PROTOTXT -m MODEL [-i INPUT] [-o OUTPUT]
[-c CONFIDENCE] [-s SKIP_FRAMES]
people_counter.py: error: argument -m/–model is required
Your error can be solved by properly providing the command line arguments to the script. If you’re new to command line arguments, that’s fine, but you should read up on them first.
remove ‘/’ between the arguments and remove the newline space and provide the 3 lines as 1 liner command
Hi Adrian,
thanks for sharing this great article! It really helps me a lot to understand object tracking.
The CentroidTracker uses two parameters: MaxDisappeared and MaxDistance.
I understand the reason for MaxDistance, but I cannot find the implementation in the source code.
I am running this algorithm on vehicle detection in traffic and the same ID is sometimes jumping between different objects.
How can I implement MaxDistance to avoid that?
Thanks in advance! I really appreciate your work!!
Hey Jan — have you used the “Downloads” section of the blog post to download the source code? If so, take a look at the centroidtracker.py implementation. You will find both variables being used inside the file.
Kindly help me to, Have you resolve the error.
Hi Adrian,
do you think it’s worth to train a deep learning object detector with only the classes I’m interested in (about 15), instead of filtering classes on a pre-trained model, to run it on devices with limited resources(beagleBoard X-15 or similar SBC)?
Thanks
If you train on just the classes you are interested in you may be able to achieve higher accuracy, but keep in mind it’s not going to necessarily improve your inference time that much.
Hi Adrian,
Does this implement the multi-processing you were talking about the week before in https://pyimagesearch.com/2018/08/06/tracking-multiple-objects-with-opencv/ ?
It doesn’t use OpenCV’s implementation of multi-object tracking, but it uses my implementation of how to turn dlib’s object trackers into multi-object trackers.
thank you very much dear adrian for best blog post
This is really nice thank you….
I have developed a people counter using Dlib tracker and SSD detector. you have skipped 30 frames for the detector to save memory usage. but in my case the detection and the tracker run in each of the frames. when there is no detection (when the detector lost the object) I try to initialize the tracker by the previous bounding box of the tracker ( only for two frames). the problem is when there is no object in the video ( object is not lost by the detector but has passed ) the tracker bounding box stack on the screen and it cause a problem when another object came in the view of the video. is there any way to delete the tracker when I need?
I would suggest applying another layer of tracking, this time via centroid tracking like I do in this guide. If the maximum distance between the old object and new object is greater than N pixels, delete the tracker.
Hi Adrian
Again, a great tutorial. Can’t praise it enough. I’ve got my current job because of PyImageSearch and that’s what this site means to me.
I was going through the code, and trying to understand –
–If you are running the object detector every 30 frames, how are you ensuring that an *already detected* person with an associated objectID, does not get re-detected in the next iteration of the object detector after the 30 frame wait-time? For example, if we have a person walking really slowly, or if two people are having a conversation within the bounds of our input frame, how are they not getting re-detected?–
Thanks and Regards,
Aditya
They actually are getting re-detected but our centroid tracker is able to determine if (1) they are the same object or (2) two brand new objects.
Thank you Adrian for another translation of the language of the gods. The combination of graph theory, mathematics, conversion to code and implementation is like ancient Greek and you are the demigod who takes the time to explain it to us mere mortals. Most importantly, you take a stepwise approach. When ‘Rosebrock Media Group’ has more employees, someone in it can even spend more time showing how alternative code snippets behave. In terms of performance, I am just starting to figure out if a CUDA implementation would be of benefit. Of course, there is no ‘Adrian for CUDA coding’. Getting this to run smoothly on a small box would be another interesting project but requires broad knowledge of all the hardware options available – a Xilinx FPGA? an Edison board? a miniiTX pc? a hacked cell phone? (there’s an idea – it’s a camera, a quad core cpu and a gpu in a tidy package but obviously would need a mounting solution and a power source too). Of course to run on an iphone I have to jailbreak the phone and translate the code to swift. But then perhaps it would be better to go to android as the hardware selection is broader and the OS is ‘open’. Do you frequent any specific message boards where someone might pick up this project and get it to work on a cell phone? There are a lot of performance optimizations that could make it work.
Thank you for the kind words, Stefan! Your comment really made my day 🙂 To answer your question — yes, running the object detector on the GPU would dramatically improve performance. While my deep learning books cover that the OpenCV bindings themselves do not (yet). I’m also admittedly not much of an embedded device user (outside of the Raspberry Pi) so I wouldn’t be able to comment on the other hardware. Thanks again!
Hi Adrian, just spotted this…
For information I have successfully implemented this post on a Jetson TX2, replacing the SSD with one that is optimised for TensorRT. I would refer your reader to the blog of JK Jung for guidance.
Performance wise, I am finding that all 6 cores are maxed out at 100% and the GPUs at around 50% depending on the balance of SSD/trackers used. The trackers in particular are very CPU intensive and as you say, the pipieline slows a great deal with multiple objects.
As always, thanks for your huge contribution to the community and congratulations on just getting married!
Chers, Mike
Awesome, thanks so much for sharing, Mike!
I find out the problem for my issue !! it is because I changed the skip_frames to 15 .
so how to set an appropriate number of frames to skip? because maximum frame number to skip will lead to a miss to an object and smaller number of skip_frames will lead to inappropriate assignation of object ID….
As you noticed, it’s a balance. You need to balance (1) skipping frames to help the pipeline to run faster while (2) ensuring objects are not lost or trackings missed. You’ll need to experiment to find the right value for skip frames.
Hi Adrian,
I’ve recently found your blog and I really like the way you explain things.
I’m doing and people counter in a raspberry pi , I’m using background subtration and centroid tracking.
The problem I’m facing is that sometimes objects ID switch as you said in the “simple object tracking with OpenCV” post. Is there something I can do to minimize these errors?
If you have any recommendations feel free to share.
Thanks in advance.
Ps: I’d be really interested if you did a post about people counter in raspberry pi like you mentioned in the first comment
Hey Jaime — there isn’t a ton you can do about that besides reduce the maximum distance threshold.
Thanks Adrian for such a nice tutorial. You have released it on perfect timing, I am working on similar kind of project for tracking the number of people in and out from bus. Some how I am not getting proper result. But this tutorial is very good start and helped me to understand the logic.
Thanks again. Keep rocking!!!
Best of luck with the project, Nilesh!
Hi Adrian,
The camera is fixed to how many meters of the floor (approximately)?
Thank you very much!
To be totally honest I’m not sure how many meters above the ground the camera was. I don’t recall.
Thank you Adrian for inspiring me and introducing me to the world of computer vision.
I started with your 1st edition and followed quite a few of your blog projects, with great success.
I was excited to read this blog, as people counting is something I have wanted to pursue.
However,………………..there’s a problem.
.When I execute the runtime, I get,
[INFO] loading model…
[INFO] opening video file…
the sample video does open up, plays for about 1 second (The lady doesn’t reach the line), and then, boom…my computer crashes! and Python quits!
I have tried to increase the –skip-frames, still crashes. I even played with Python3 (thinking my version 2.7 was old) – no joy!
Is it time to say goodbye to my 11 year old Macbook Pro? or could this be something else?
“It’s important to understand that deep learning object detectors are very computationally expensive, especially if you are running them on your CPU.”
Out of interest is there a ballpark guide to minimum spec machines, when delving into this world of OpenCV?
Best Regards,
update:
Reading your /install-dlib-easy-complete-guide/
I noticed you say to install XCode.
I had removed XCode for my homebrew installation as instructed, as it was an old version.
When I installed dlib, I simply did pip install dlib.
Could this be related?
Cheers
Hey Nik — it sounds like you’re using a very old system and if you’ve installed/uninstalled Xcode before then that could very well be an issue. I would advise you to try to use a newer system if at all possible. Otherwise, it would be any number of problems and it’s far too challenging to diagnose at this point.
Hello, Dr. Adrian thank you for your great work. I am a beginner in this field and your webpage is really helping me through. I have a question, I’ve tried to run this code and an error popped out “people_counter.py: error: the following arguments are required: -p/–prototxt, -m/–model” and I really don’t know what to do. I would be grateful if you helped.
Thanks in advance.
If you’re new to the world of Python, that’s okay, but make sure you read up on command line arguments first. From there you’ll be able to execute the script.
Hi Adrian !!
This is the answer you give me my question !!! thank you for that….
August 24, 2018 at 8:56 am
As you noticed, it’s a balance. You need to balance (1) skipping frames to help the pipeline to run faster while (2) ensuring objects are not lost or tracking missed. You’ll need to experiment to find the right value for skip frames.
but balancing will be possible for a video because i have it in my hand….
what do you suggest me for a camera ( do not know when an object will appear to set a skip frame number)
Hi Adrian !!,
How we can evaluate the counting accuracy of this counter ? My mentor asked me for the counting accuracy. Do we need to find some videos as benchmark or is there some libraries for accuracy evaluation ?
Another great post! Thanks so much for your contributions to the community.
One question, I have tried the code provided on a few test videos and it seems like detected people can be counted as moving up or down without having actually crossed the yellow reference line. In the text you mention the fact that people are only counted once they have crossed the line. Is this a behaviour you have seen as well? Is there an approach you would recommend to place a more strict condition that only counts people who have actually crossed from one side of the line to the other? Thanks
Hey Andy — that’s actually a feature, not a bug. For example, say you are tracking someone moving from the bottom to the top of a frame. But, they are not actually detected until they have crossed the actual line. In that instance we still want to track them so we check if they are above the line, and if so, increment the respective counter. If that’s not the behavior you expect/want you will have to update the code.
Hello Adrian,
Thank you for the great post.
I modified the code for horizontal camera as below:
https://youtu.be/BNzTePvbsWE
I noticed that below problems:
1-No response on fast moving object
2-Irrelevant Centroids noise
3-Repeated counting on same person
And I try to solve these problems by introducing face recognition and pose estimation.
Do u have any suggestion/comment on this?
Thanks
Frank
Face recognition would greatly solve this problem but the issue you may have is being unable to identify faces from side/profile views. Pose estimation and gait recognition are actually more accurate than face recognition — they would be worth looking into.
Hi,
Did you solved yout problem?
Thanks.
Bruno Bonela.
Hi Adrian. First, I wanna said thank you for your time to explains each details on your code, Your blog is incredible (the best of the best!).
I have a doubt on CentroidTracker, because it creates a object ID when appears a new person on a video but never destroy that ID, so would be cause any trouble in the future with the memory if I wanna implemented on a Raspberry Pi 3? I followed your person counter code just with a some modifications to run it on the PI
My best regards
Hey Andres — the CentroidTracker actually does destroy the object ID once it has disappeared from a sufficient number of frames.
Thank you very much Adrian. Another question, I have an problem with centroid tracker update, since a person is out of the frame but instantaneously another person comes in, the algorithm thinks that is the same person, doesn’t count it and put he centroid to the person that came in (I change the maxDisappered but not succes) so I check again the code to understand in which line you use the minimum Euclidean distance to put the new position of the old centroid but I couldn’t understand the method that you used to achieve that. Can you give an advice to solve that problem?
It doesn’t happen every time but to rise the success rate.
My best regards
That is an edge case you will need to decide how to handle. If you reduce the “maxDisappared” value too much you could easily register false-positives. Keep in mind that footfall applications are meant to be approximations, they are never 100% accurate, even with a human doing the counting. If it doesn’t happen very often then I wouldn’t worry about it. You will never get 100% accurate footfall counts.
I handled modifying the CentroidTracker, where I put a condition if a distance from the old centroid to the new one is more than 200 in y-axis, continue. Thanks for the answer
Somehow i cant run the code….
I always get the error message:
Can’t open “mobilenet_ssd/MobilenetSSD_deploy.prototxt” in function ‘ReadProtoFromTextFile’
Seems like the program is unable to read the prototxt…
Do you have an idea on how to fix it?
Yes, that does seem to be the problem. Make sure you’ve used the “Downloads” section of the blog post to download the source code + models. From there double-check your paths to the input .prototxt file.
Hi Adrian,
Thank you for a great tutorial. Would it be possible for you to let me know how I can count the people moving from right to left or left to right. I am able to draw the trigger lines but unable to count the objects.
Regards
Harsha J
You’ll need to modify Lines 213 and 220 (the “if” statements) to perform the check based on the width, not the height. You’ll also want to update Line 204 to keep track of the x-coordinates rather than the y-coordinates.
Hi Adrian,
I am a beginner in this field and your webpage is really helping me through.
Could you please give me a code?
I’m so confuse how to change it for a while.
Thank in advance
NT
I am happy to hear you are finding the PyImageSearch blog helpful! However, I do not provide custom modifications to my code. I provide over 300+ free tutorials here on PyImageSearch and I do my best to guide you, again, for free, but if you need custom modification you will need to do that yourself.
Hi Adrian,
I’m wondering what does the tracker do when a object doesn’t move (i.e. the object stands in the same position for a few frames). I’m not sure if OpenCV’s trackers are able to handle this situation.
Thanks in advance.
It will keep tracking the object. If the object is lost the object detector will pick it back up.
Hello Adrian, first i want to say thank you for this amazing project it helped me understand quiet a bunch of thing concerning computer visioning.firstly, i have this question which you could help me with, i want to make this project to monitor two doors on my store and i was wondering what changes i might have to do to use two cameras simultaneously
ps: i was working on simple opencv programs since that i’m quiet the noob and i tried to use cap0 = cv2.VideoCapture(0)
cap1 = cv2.VideoCapture(1) however it opens only one camera feed even though the camera indexes are correct!
Thanks for this project again and for taking time to read my comment
Follow this guide and you’ll be able to efficiently access both your webcams 🙂
Hi @Adrian. How can we improve object detection accuracy? As your method is completely based on how good the detection is? Any other model you recommend to use for detection?
That’s a complex question. Exactly how you improve object detection accuracy varies on your dataset, your number of images, and the intended use of the model. I would suggest you read this tutorial on the fundamentals of object detection and then read Deep Learning for Computer Vision with Python to help you get up to speed.
One of the purpose of object tracking is to track people when object detection may fail right? But your tracking algorithm accuracy i if I understand correctly is completely based on whether we detect object in subsequent frames. What is my object just gets detected once, then how should I track him. What modification will be required in your solution.
No, the objects do not need to be detected in subsequent frames — I only apply the object detector every N frames, the object tracker tracks the object in between detections. You only need to detect the object once to perform the tracking.
Hey Adrian, I just downloaded the source code from “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time:
But getting the following error…
usage: people_counter.py [-h] -p PROTOTXT -m MODEL [-i INPUT] [-o OUTPUT]
[-c CONFIDENCE] [-s SKIP_FRAMES]
people_counter.py: error: the following arguments are required: -p/–prototxt, -m/–model
If you’re new to command line arguments, that’s fine, but make sure you read this tutorial before continuing. It will help you resolve the error.
Hello Adrian, I’ve been following your blog for a couple of months now and indeed there is no other blog which serves with this much of content and practices. Thanks a lot man.
Currently, I’m working on a project with the same application “Counting people”. I’m using a raspberry pi and a pi cam. Due to some constraints I’ve settled down to a over-head view of the camera. I’m using computationally less expensive practices. A haar-casacade detector (custom trained to detect head from over-head view). The detector is doing a good job. I have also integrated the tracking and counting methods which you have provided. Firstly I encountered low fps. So, I ventured around a bit and came up with the “imutils” library to spped up my fps feed. Now I have achieved a pretty decent fps throughput. And I aslo have tested the codes with a video feed. Its all working good.
BUT.
When I use my live feed from the pi cam. There is a bit of lag at detection and the whole system. How do I get this working at real-time? Is there a way to do this on real-time?
Or Is this just the computational potential of a raspberry pi.
Thanks in advance Adrian!
Curious and eagerly waiting for your reply!
Hi Bharath — thank you for the kind words, I appreciate it 🙂 And congratulations on getting your Pi People Counter this far along, wonderful job! I’d be curious to know where you found a dataset of overhead views of people? I’d like to play around with such a dataset if you don’t mind sharing.
As far as the lag goes, could you clarify a bit more? Where exactly is this “lag”? If you can be a bit more specific I an try to help but my guess is that it’s a limitation of the Pi itself.
Thanks for the reply Adrian!
The dataset was hand-labeld at my University. Let me know if you may need it!
Hey, and by “lag” I mean…
With a pre-captured video feed, the pi was able to achieve about ~160 fps (15s video)
With the live feed from pi-cam, it was able to achieve about ~ 50fps(while there was no detection) and once there is detection, the fps reduces down to around 20 fps. (This all was possible only after the implementation of the “imutils” library).
When tested without the “imutils” library, the fps was around 2fps to 6fps.
So, what I would like to conclude as the key inference is, The system performs at a pretty good accuracy when the subject(head) travels at a slower speed(Slower than the normal pace at which any human can walk).
BUT,
When the head moves at a normal pace(the pace at which anyone normally walks), the system fails to track, even after detection and IDing.
Hope, I made myself clear Adrian!
Please let me know your thought about this!
Hey Bharath, would you mind sending me the dataset? I’d love to play with it!
Also, thanks for the clarification on your question. I think you should read my reply to Jay at the top of this post. You won’t be able to get true real-time performance out of this method using the Pi, it’s just not fast enough.
Hi Bharath, would you mind sharing the overhead image dataset? I am also working on something like your application. Thank you.
Hi Adrian,
Thank you for a great tutorial. i have some questions。
1、Where is the caffe model from? how can i train my own model?
2、Do u test the situation that people hold up an umbrella。My test results is the models can’t detect this situation 。
Do u have some idea?
Thanks very much
Hi Adrian
Can you please make a tutorial with Kalman filter on top of this https://github.com/juanlp/experimenting-with-sort 🙂
DLIB is not very good with fast moving objects.
Thank you.
Hey ardrian,
If I want to capture using pi camera what should I do and what will be the command for it?
I would suggest reading this tutorial to help you get started.
Hi @Adrian. I run this source on Raspberry Pi,but the fps is 3,it’s so slow,slow. Then i change
Raspberry Pi to RK3399,the solution is not better,FPS almost 20.
D u have someidea to imporve the FPS?
Thanks very much.
Make sure you refer to my reply to Jay at the top of the comments section of this post.
Hi Adrian, thanks for this great tutorial! I’m using a rpi 3 B+ with raspbian stretch and I am getting very slow frame rates of about 5 fps with the example file. I have tried not writing the output with same results. Playing the example file with omxplayer works fine at 30 fps. I have tried using another sd card to no avail (write speed is about 17 MB/s and read is 22 MB/s, which I think is not that bad). Do you know what could be happening?
Thanks
This method is going to be too slow on the Raspberry Pi. See the first command to this post (my reply to Jay).
Hi Adrian,
Thanks for this post!
Can you please let me know, the process to use this code to count vehicles?
Thanks
Guru
Hi Guru — I will try to cover vehicle counting in a separate blog post 🙂
Thanks Adrian.
Since I am trying to build one, Can you please enligten me, If I can use time in seconds/milliseconds instead of a centroid to count the object, as the time can be a crucial factor than position … Please let me know your thoughts.
Regards
Guru
Also, Please let me know, If I can use CAP_PROP_POS_MSEC(via imutils) to count vehicles in live CCTV stream based on time.
Hi, I’m trying to swap out the dlib tracker for the OpenCV tracker, since the dlib one is pretty inaccurate. However, when I use the OpenCV, eg, CSRT, the new detections accumulate into a new tracking item, instead of updating and replacing the original ID associated with that object. So in the first cycle, I have one bounding box and tracker with it, and in the next cycle, it will detect a person again, however, it will just create a new tracker and then I’ll have 2 bounding boxes representing the same person. And it keeps adding more trackers each time for the same person. Any idea what I did wrong? Thanks!
It sounds like there is a logic error in your code. My guess is that you’re creating a new CSRT tracker when you should actually be updating the an existing tracker. Keep in mind that we only create the tracker once and from there we only update its position.
Quote from this post:
“We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.”
I think dlib object tracking isn’t accurate enough. Also CSRT isn’t fast enough. What do you think about built-in KCF tracker? Is it more accurate than dlib onject tracking implementation?
Exactly what is “accurate” or not depends on your project. I suggest you try different object trackers, evaluate the results, and choose the one best suited for you that balances speed and accuracy. There is no one “best” tracker. Run experiments and let the empirical results guide you.
Hi Adrian,
thank you so much for this post, was very useful for my research project. What’s the best micro-pc (raspberry pi, asus thinker, ecc.) to implement a good counter or a machine learning system in general. Thanks.
That really depends on your end goal. The Pi is a nice piece of hardware and is very cheap but if you want more horsepower to run deep learning models I highly recommend NVIDIA’s Jetson TX series.
Hi Adrian!
Thank you for this post. I have quick question that confusing me. In earlier post you mentioned that the size parameter in blob = cv2.dnn.blobFromImage should match CNN network dimensions. According the dim in the prototxt the size should be 300 X 300. W and H being supplied to cv2.dnn.blobFromImage in this example 373, 500. Does this effect accuracy?
Thank you for your help.
Steve
Object detection networks are typically fully convolutional, implying that any size image dimensions can be used. However, for the face detector specifically I’ve found that 300×300 input blobs tend to obtain the best results, at least for the applications I’ve built here on PyImageSearch.
Hi Adrian, this is awesome starting point for people like me who are new to algorithms and implementing machine learning !!! Just curious to ask few silly questions :
1. Is it possible to track object left to right and vice versa
2. Is it possible to implement it for live streaming (it appears you have given option, but would like to know more)
Just thinking of implementing this in one of the maker fairs in Mumbai , if possible. Just to give an idea to students about OS technologies and usages of OpenCV.
1. Yes, see my note in the blog post. You’ll want to update the call to “cv2.line” which actually draws the line but more importantly you’ll want to update the logic in the code that handles direction tracking (Lines 199-222).
2. Yes, use the VideoStream class.
Hi Adrian,
Would it be possible to see if a person exists within a defined space in the video frame ?similar to a rectangle of trigger lines . If yes do let me know how I can go about it .
Regards
Harsha j
Yes, you would want to define the (x, y)-coordinates of your rectangle. If a person is detected within that rectangle you can take whatever action necessary.
Hi Adrian,
I am able to get the rectangle but recognition is on the entire frame. Could you please let me know how to restrict detection only with in the rectangle ?
regards
Harsha J
Please take a look at my other replies to your comments, Harsha. I’ve already addressed your question. You can either (1) monitor the (x, y)-coordinates of a given person/objects bounding box and see if they fall into the range of the rectangle or (2) you can simply crop out the ROI of the rectangle and only perform person detection there.
Thanks Adrian,
Will try it out.
Regards
Harsha Jagadish
Hi the code is tracking people. how to make it only count specific kind of objects like a boat in water or car on road etc
You’ll want to take a look at my guide on deep learning-based object detection. Give the guide a read, it will give you a better idea on how to detect and track custom objects.
I’m working in a project where, rather than tracking movement, i need to know the local coordinates of each person, the images come from multiple cameras and each camera (the algorithm running on the pc, actually) should be able to compute the coordinate of each person detected on the corresponding image, my first thought is to train a neural network, but my intuition tells me that would be overkill, and killing a fly with a bazooka sounds disastrous in any context were you have limited resources, which is my case.
By “local coordinates” do you mean just the bounding box coordinates? I’m not sure what you mean here.
I really appreciate for a great work from your end.
I am facing some issue while importing dlib. The code is running without importing dlib but unable to track people and count it. How to install and import dlib.
Import Error: no module named dlib
Please figure out this issue for me.
You first need to install dlib on your system.
hi, I’ve tried to run the people_counter.py however, there this error pop up, ImportError: No module named ‘pyimagesearch’. how do I solve this?
Make sure you use the “Downloads” section of the blog post to download the source code, videos, and the “pyimagesearch” module. From there you’ll be able to execute the code.
Hi,
Thank you for a great tutorial.
The code is only tracking people and even its not counting people when they are moving with greater speed and hence the counting is inaccurate. Even if people comes into the region of interest and moves back without crossing the line , the counter increments. How to avoid false detection and at what minimum height from ground, the webcam should be installed?
Any solution on how to fix this out?
You’re getting those results on the videos supplied with the blog post? Or with your own videos?
I am getting those results on my own videos as fast movement is not detected and getting counted.I am making use of a normal webcam C270 , the count is not accurate.
Hello thank you for the tutorial if you could help me it’s not working with me from the beging and this is the ereur message :
Traceback (most recent call last):
…
ModuleNotFoundError: No module named ‘scipy’
You need to install SciPy on your system:
$ pip install imutils
Hello Adrian, i have tried this with the video of passenger entering the bus and the result is not really good. Is there any way that i can improve the accuracy? Should i fine tune the model with my own data? If so, is there any tutorial or material that i can refer to for fine tuning the model. Thanks a lot.
Hey Gordon — what specifically do you mean by the result not being good? Are people not being detected? Are the trackers themselves failing?
How to increase the accuracy of people count and filter out false detections?
I would suggest training your own object detector on overhead views of people. The object detector we are using here was not trained on such images.
@Adrian Thank you very much for your nice tutorial. I have found a github repository where the object detector is trained on the overhead views. But thing which is not sure for me is how to give the input to the function net = cv2.dnn.readNetFromDarknet(configPath, weightsPath) when the weights is in .weights format. I would appreciate if you could give some advice. Thank you. the repository I am referring here is : https://github.com/aditya-vora/FCHD-Fully-Convolutional-Head-Detector
@Adrian Sorry I made a mistake in the previous comment. But thing which is not sure for me is how to give the input to the function net = cv2.dnn.readNetFromDarknet(configPath, weightsPath) when the weights is in .h5 format.
Thank you for sharing the model, Prashant. I don’t have any experience with that particular model. I’ll check it out and potentially write a blog post about it.
Hello Adrian, the people is not being detected.
Above all, I would like to thank you for these efforts you have been doing ever since you first started sharing these awesome posts.
Now, to the technical part.
You used MobileNetSSD_deploy.prototxt and MobileNetSSD_deploy.caffemodel as your deep learning “CNN-model” and “weights” respectiveley, right?
And you only considered “person” class among the classes available.
Would it be possible to fine-tune the model you used, say, for objects like glasses (that people wear).
It looks like it is trained in Caffe, so could you share your insights on how to train this model for our custom objects? In that case we would be able to exclude non-trackable objects while fine tuning it. Thanks again!
Yes, you could absolutely fine-tune the model to predict classes it was never trained on. I actually discuss how to fine-tune and train your own custom object detectors inside Deep Learning for Computer Vision with Python.
I have tried this awesome code with my own video. The moving is almost similar to the one you had. In my video, however, I encountered some little errors. Let me post them below and if you, Adrian, or some of other guys could help modify particular parts of this code, I’d appreciate it much.
1. up-counter is incorrectly increased (UP is incremented by 1) as soon as object is detected on the upper half of the horizontal visualization line, and then, if that object moves down, down-counter remains the same (though it should increment by 1).
2. tracker ID is sometimes lost (on the upper edge of the frame when trackable object is moving a bit horizontally on the upper half of the line) even though object is still in the frame. This causes one object being identified twice.
Thank you in advance to all who try 😃
1. That is a feature, not a bug. In the event that a person is missed when they are walking up they are allowed to be counted as walking up after (1) they are detected and (2) they demonstrate they are moving up in two consecutive frames. You can modify that behavior, of course.
2. You may want to try a different object tracker to help increase accuracy. I cover various object trackers in this post.
First of all, thanks a lot for this post. Very helpful. I can run your code for no problem.
Question:
I am trying to connect the people detected at the people detection step with the ID assigned to it at the tracking step. Any idea on how to do that? A brutal force I can think about is to match the centroid of the people detected in detection step with the centroid of the ids in the tracking step. Any better solution?
Thanks,
Henry
Thanks for a great post, Adrian!
I have tried to use people counter on video from actual store. First, I tried to input the frames as they are, without rotation and the model performed really poorly. Then, I tried to rotate the image and it performed a little better: https://www.youtube.com/watch?v=nlxKIQWeq2E
Has the model for detecting people from the above been finetuned with the dataset that contains images from that perspective or did you use pretrained model without additional finetuning? I am asking this because I can’t intuitively find an explanation for such a difference in acuracy in case the frames are rotated.
Also, if you have finetuned the model (this or any other), it would be helpful if you could provide any info about the size of finetuning dataset – I am planning to finetune the model in order to detect occluded people behind the exhibited objects as they obviously affect the prediction accuracy for the model out of the box.
The model was not fine-tuned at all. It’s an off-the-shelf model trained. Better accuracy would come from training a model on a top-down view of people. As far as tips, suggestions, and best practices when it comes to training/fine-tuning your own models, I would suggest taking a look at Deep Learning for Computer Vision with Python where I’ve included my suggestions to help you successfully train your models.
Hi, everyone can anyone explain how I can determine minimum system requirements for this particular project. For example, I built .exe file out of .py file and I am able to run it on any PC now. But, I would like to know what the minimum system requirements are.
FYI, the .exe file is ~500MB, and when I use it, task manager shows:
40 % CPU;
800MB Memory;
4.1Mbps Disk usages.
(where my CPU is Intel i7-8700 @ 3.20GHz, RAM is 32 GB ).
Anyway, how could possibly one be able to calculate system requirements for some project to run seamlessly?
Thank you!
Hi Adrian! thanks for this tutorial!
One noob question.. why do you use np.arange to loop over detections (line 124) instead of range?
No real reason — I just felt like using it.
hi adrian, first of all thank you so much,
Do u mind providing me an extra code on how to turn on an led if the ppl going up and down are not equal whereas turn off an led when up and down are equal
thanks in advance
I’m happy to provide the current source code for free so you can learn from it and build your own applications; however, I do not make modifications to it. I request that if I put out the information for free that others build with it, hack with, enjoy it, and engineer their own applications. The code is here for you to use and learn from — now it’s your turn 🙂
hey adrian, thanks a lot for the guidance
if i wan to combine ur tutorial about qr code together with this counter tutorial and run as one … do u have any idea on how to make this happen ? thanks
You want to detect people and QR codes in the same application? Could you explain a bit more of what exactly you’re trying to accomplish?
Hey Adrian, thanks for the reply, i am trying to make an entrance system which require ppl to scan a qr code which will store the data in mysql database whenever the enter a room and at the same time using a counter to monitor the number of ppl going in and out from the room
That’s absolutely do able. How much experience do you have programming and using computer vision? Based on your previous comment I think you may just be getting started learning computer vision. If so, make sure you read through Practical Python and OpenCV to help you learn the basics. You’ll need basic programming experience though so if you don’t have any experience programming make sure you take the time to learn Python first.
Hey adrian! how do i link this with my picamera on live streaming? how to use the video class u made?
If you’re new to working with the VideoStream class make sure you read this tutorial.
Hi Adrian thank you for this tutorial!
How can I get ‘pyimagesearch’ module??
I
Use the “Downloads” section of this post to download the source code (including the “pyimagesearch” module), machine learning models, and example videos.
Dear Adrian
It is working fine if person speed is slow but if person speed is fast it not able to detect the person, Here is my questions
1;- what necessary changes need to be done so it work more accurately?
2;- maxdistance variable you set to 50 , and when it picture, what is the use of this variable
and thanx for sharing your wounder-full knowledge its really helpfull for us…..
You would want to run the actual object detector at a faster rate. For speed we only run it occasionally but for faster moving updates you’ll want to run it more often.
Hi, how to use with network cam?
You can use my VideoStream class which will allow you to access your webcam. I unfortunately do not have any tutorials for using a network cam though.
Thank you Adrian! is it practice to detect people in real time video?
I’m not sure what you mean by “is it practice to detect people”, could you please clarify?
is this worked with pi ? is there any tutorial for pi to make people counting from you ?
You can use this tutorial with the Pi but I would recommend swapping in a faster object detector. Deep learning object detectors are too slow on the Pi. HOG + Linear SVM or Haar cascades would be your best bet.
Actually, how can use Haar cascades model and classifier with this program?
You can use this pre-trained Haar person detector but I honestly don’t think it will work well. You would want to train your own Haar cascade or top-views of people.
How can I count an object (ed: ball’s) from video using this method. can you please help?
What type of ball? Sports ball? Any arbitrary ball? The more details you can share on the project the more likely it will be that I can point you in the right direction.
Hello!. A question that OpenCV version you use and what operating system did you do?
I used OpenCV 3.4.2 for this tutorial. I use both macOS and Ubuntu.
Hello, I’m having an error running. “ImportError: No Module named scipy.spatial”
You need to install the SciPy library:
$ pip install scipy
Hi, I’m installing it on a Raspberry Pi and it the install gets stuck halfway. Is there anyway I can install scipy?
First make sure you’re using my OpenCV install guide to install OpenCV. From there I also suggest compiling with only a single core to ensure your Pi doesn’t get locked up.
Hello Adrian, thank you for the tutorial if you could help me it’s not working with me from the beginning and this is the error message :
…
ImportError: No module named scipy.spatial
You need to install the SciPy library:
$ pip install scipy
Hi Adrian,
Love your work. 1 quick question, the model you used is trained to detect multiple classes. How to make your code to detect multiple classes? Like, detecting “car” and “bus” movements.
I tried changing the line: if CLASSES[idx] != “person”:
But it didn’t work when I put multiple classes.
There are a number of ways you can programmatically achieve the desired change. Try the following:
if CLASSES[idx] not in ("person", "bus"):
Thanks Adrian for good article.
I need to count number of vehicles crossing a particular boundary post. What will be best technique?
I would start by considering how you are detecting the boundary post. Is that something you can pre-label and know the coordinates of before you start the script? Or must the boundary post be detected automatically?
Hi – Boundary post can be check post. As you mentioned
(a) Needs to be detected automatically – this is correct
(b) Preset co-ordinates – unlikely
Please suggest any suitable approach.
I would suggest training a dedicated object detector such as HOG + Linear SVM or Faster R-CNN, SSD, or YOLO to automatically detect your boundary post. If you’re interested in learning how to train your own custom object detectors you’ll want to refer to my book, Deep Learning for Computer Vision with Python.
Thanks for articles.
(a) for frame number statisfying skip_frame condition we are invoking object detection model for instance rcnn,yolo. dlib correlation tracker is invoked and trackers is populated with rectangles.
(b) for frame number not satisfying skip frame condition we are doing tracker update to get new positions of objected detected in (a). But what happens if new objects appear in frame numbers for condition (b). They will not be detected by object detection model and will not be tracked?
Correct, if a new object appears during the frame skip you will not be able to detect or track them. You need to achieve a balance between the two for your own application.
How will you represent that people have been using a particular path more and another path less using heatmap?It is possible.
Dear Adrian
First of all, I have to say Thank You very much for this wonderful tuition.
I am new to Raspberry, OpenCV,people count etc. I have no idea where to start at first,
but your blog help me a lot. Really appreciate it.
I wish if you could give me some adivce on people count program
I successfully run your code on my windows OS(which has high spec) and raspberry PI 3 model B.
However when running on raspberry , the video almost freezes, it took a lot of time to process.
I have read your comments on this page, and it seemed like people counting programming
is very difficult to have a good performance on raspberry.
I just want to know is it possible to use raspberry(with pi camera) to do real time people count? I am purchasing the latest model PI 3 Model B +, and try to run the test
(I know that the image process requires high spec. and actually as long as the accuracy of people count can reach like 80% on raspberry , I am OK with that.)
Any advice would be appreciated
I don’t have any tutorials dedicated to building a people counter on a Raspberry Pi but I will be covering it in my upcoming Raspberry Pi + Computer Vision book. Stay tuned!
hey David can you explain how u run it on windows please?
Hi Adrian,
I want to write/record and save the output video displayed using the imshow() with the two lines and bounding boxes and in/out count locally.How do I do that.
See my guide on writing to video with OpenCV.
If we reduce number of frames to skip say to one fourth of frame rate then it is highly likely that we reduce number of objects to go undetected (of-course with reduce in performance due to invoking object detection more number of times).
Even if we object goes undetected during frame-skip it is likely that object will be detected in next subsequent frames (since detected flag is still not set). I have observed this because some of objects are detected after crossing ROI line (at-least after some frames); Please confirm. Thanks
What happens if object has suddenly stopped moving – I guess once it is counted – flag will be set and even if it is stationary in subsequent frames will not be counted – correct?
You are correct.
Hi.
Thanks for the tutorial & code.
but there is a issue
net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
AttributeError: module ‘cv2’ has no attribute ‘dnn’
how can I fix this?
can I use live ip camera to capture & process real time video?
It’s an urgent issue.
It sounds like you are using a very old version of OpenCV. Make sure you follow my OpenCV install guides to obtain a newer OpenCV version.
Hi
I need to know if this code can well run on Pi zero with camera
Regards
No, the Raspberry Pi Zero is far too underpowered to run the code in this tutorial.
Hi Adrian, im new with this image processing and opencv. I want to create a project that will count the number of car cross the line. I change the “person” to car and it work perfectly with the video with the actual car inside it. BUt when i need to recreate prototype, i want to create by using toy’s car. However it cant detect the toys as a car. Do i need to find another detector?
The model used in this post was not trained to detect toy cars. You will need to train your own custom object detector. I discuss how to train your own custom object detectors (including code) inside Deep Learning for Computer Vision with Python.
Hello, i already solve the “Car” problem. Now i want to edit the code to count the car when it cross from left to right instead of up down. I already edit the line 204 into x axis, but it still count from up down. Did i miss anything?
Hi Adrian,
I also faced same problem from above, is there had any solution? I had done edit all related variable or is there anything I had miss up?
Just an add on for the question, I had rotated the video example_02.mp4 given at output directory to detect left to right instead up down. It is able to detect only total one people form right to left in that video, look for the help from this issue, thanks in advance.
You Are Awesome Adrian .Definitely You Are a Great Person With High Degree of Proficiency . Very Good Luck .
Thank you for the kind words, Saeed.
Hi Adrian
“maxDistance” is it distance between centroids of 2 objects? – euclidean distance – is this distance in terms number of pixels?
Thanks
Correct, that distance is measured in pixels.
Hi Adrain,
Concept of ‘up’/’down’ can this be used for vehicle lane crossing?
Yes, just change the Line 137 to check for a “vehicle” class rather than “person”.
Thank you Adrian , Its really amazing article, thank you very much for your effort and sharing code with us
if you don’t mind, I have two questions,
First, what if detected person show gain in view, for example, person with ID=1 , back on same direction to other direction ID change or has the same number, ID=1
Second, I can apply it in real time and is it slow or fast
1. I’m not sure what you mean here, could you elaborate?
2. Yes, this algorithm can run in real-time on a laptop/desktop.
Hello Mr.Adrian Rosebrock
Could you modify this example to compatible with Movidius NCS?
Many thank.
I will be doing exactly that in my upcoming Raspberry Pi + Computer Vision book 🙂
Great new, when do you release that book ?
Right now I am targeting Q3 of 2019.
Hi
Brilliant work
ive got a question
how can i set my video source from an ip camera? say with a rtsp url.
i tried cv2.VideoCapture, it shows video frame, but nothing else works..
Sorry, I don’t have any tutorials on RTSP URL streaming but I will be covering it in my upcoming book. Stay tuned!
another thing
i have an issue while playing the 1 minute example video, the whole video plays in 35 seconds, looks like playing speed is more than usual, do u have anything in mind why this might be happening?
thank you very much
The goal of OpenCV is real-time image and video processing. OpenCV doesn’t care what FPS the video was recorded it, it simply wants to process the video frames as fast as possible. OpenCV + the people counting algorithm is fast enough that it’s processing the video faster than the original FPS. That’s actually a good thing.
just print your ip address with user name and password like below:
vs = VideoStream(src = “rtsp://username:password@ip_address”)
AttributeError: module ‘cv2’ has no attribute ‘dnn’
Your OpenCV version is tool old. You need OpenCV 3.4 or greater for this tutorial. See this page for my list of OpenCV install guides.
Hey Adrian — thanks for all the tutorials; they’re immensely helpful!
I’m wondering if you have a blog on face detection and tracking using the OpenCV trackers (as opposed to the centroid technique). Thanks!
Sure, see this tutorial.
Do I can use yolov3?
Yes, just follow my YOLO object detection guide.
Hi Adrian! Excellent post, as always!
One question: is it possible to not increment the counter when the same person first enters, then leaves, then enters back again in the department store? Something like a “unique visitor counter” kind of algorithm. Thanks!
Yes, you could do that but that would require quantifying the face of the person. See this tutorial on face recognition for more info.
For scenarios that implies privacy concerns, quantifying the face of the person is prohibited. Is there a way to quantify the body of the person – basically everything except the face: clothes, accessories, shoes etc
If you cannot quantify the face you should look into “gait recognition” (i.e., how someone walks). Gait recognition is more accurate for person identification than even face recognition!
Hi There
i have been wondering how can i optimize your code’s performance and i think using openCV for loading videos and getting frame might be quite resource consuming, do you know any other libs i can use for this part? FFMPEG is a quite awesome thing for these stuff but there are too many wrappers written for python and on the other hand we can use FFMPEG commands directly in python, i am a bit confused, Do you have anything i your mind about these stuff?
I really appreciate it
Sure, he’s a tutorial on how you can use threading to improve your file I/O pipeline.
Hey, i would like to use faster rcnn or yolo v2 for increasing the accuracy can you provide me the weights and the .pb files and guide me please
See this link for the YOLO model.
Hi Adrian!
Thank you for your good tutorial
I tested your code and it seems running well. But when I adjust the skip-frame to smaller value (5 for example), it seems the counter is not working well anymore
The second, in your code, the variable “counted” should move upper after else because after it is marked as counted, we don’t need to calculate direction
Hi Adrian,
After waiting for an hour to install scipy, it ended up with all kind of errors.
Is there any other way to install scipy? followed the tutorial many times over and over..
im using an raspberry pi 3 B 2015 model
python for opencv development
opencv 4.0 for computer vision
Try installing SciPy via:
$ pip install scipy --no-cache-dir
It will take awhile to install but it should work.
Hi Adrian.
Thanks for the tutorial.
I tested the code with the caffemodel trained with my own dataset (around 1.5k of images – labeled the bus passengers’ head walking through the entrance door). But, with this model, the tracking works very poor which i assume could be due to the poor detection of the model (with approximately 30% average confidence and sometimes failed to detect in consecutive frames). Would like to ask is there any way to improve the accuracy of the model? Should i train with more data or is there anything else can i do?
Million thanks in advance.
It sounds like a problem with your model itself. With 30% average confidence and many failed detections it’s likely that the detector is the problem. I would suggest working on building a better object detector. If you need help training an object detector I would suggest you refer to Deep Learning for Computer Vision with Python where I discuss how to train accurate object detectors in detail.
Hi Adrian,
First of all thanks a lot for this much needed tutorial on object tracking.
However, I have some queries I thought of discussing with you:
– Is it possible to do object tracking sing tensorflow instead of caffe framework?
– I am facing a lot of issues while setting up this caffe framework in my CPU based ubuntu system; and it is very much needed to first set up the framework in order to use it later for training your own object as per your specific requirement
– Thus I was wondering if object tracking can be done with tensorflow models(as I have tensorflow installed in my system) and if yes, what changes are needed in the codebase you shared in your tutorial.
Your advice would be very helpful for my situation.
Thanks in advance,
Summa
1. Yes, just load your TensorFlow model instead of the Caffe one. It will require updates to the code since the TensorFlow model needs to be loaded differently and a forward pass likely performed differently (depending on how you exported the model) but that’s something you can resolve.
2 and 3. Caffe can be a bit of a pain. Inside Deep Learning for Computer Vision with Python I teach you how to train your own custom Faster R-CNN, SSD, and RetinaNet object detectors using TensorFlow and Keras. It also includes my code base for object detection in images and video. I would suggest starting there.
Hi Adrian
How many training and tests samples used on caffemodel?
The model was pre-trained on the COCO dataset, you can refer to the COCO dataset for more information on training data.
Hello Adrian,
Hi Adrain,
I have already read some tutorials in your website but thy only for long distance (when there is a long distance between camera and people and camera captures a vast space).
I am looking for some codes which are suitable for situations, there is a short distance between camera and people, as the below links do not work well under circumstances which camera can not capture large space (short distance between camera and people).
I am looking forward to your answer.
Best
Sajjad
Hello,
I have a problem with dlib. Pycharm doesn’t let me install it and when I try to download it it always clashes with my Python version or something else. Any ideas?
You should follow my dlib install guide rather than trying to install through PyCharm.
This was amazing Adran!!!
Can you suggest a camera for this project
The camera you choose should be based on your environment. Do you need auto-focus? Is the camera supposed to work in both day and night? Consider your environment first, then choose a camera.
Hey Adrian,
I have a question
I want to divide the frame into 4 quadrants and locate the detected object, I want to know in which quadrant does it lie
Please guide me in the right direction
Thank you in advance
To start, simply detect the people in the image and compute their bounding box coordinates. Then compute the center of the bounding box. Since you know (1) the center of their bounding box and (2) the dimensions of the frame you can then determine which quadrant they are in.
Hello,Adrain,
I encountered an error while running the program, “argument -p/–prototxt is required”. I think it was not set to the path of Caffe “deploy” prototxt file. I do not know how to solve the problem to run the program successfully.I just came into contact with it, so maybe my question is naive.
The problem is you are not setting your command line arguments properly. Read this tutorial first and then you’ll understand how to use them.
Hey Adrian,
Great post
I have a doubt though
What if I have to draw the line vertical and then count if detected object is on right hand side or left
What changes do I have to make to the code ?
See the comment thread with Harsha. I’ll also be covering that question in detail inside my upcoming Computer Vision + Raspberry Pi book.
Thank you for the reply
I made the changes and it works great
I have another doubt
What if I have to count the total Objects Detected in frame?
Irrespective of them crossing the line
You would loop over the number of detected objects in increment an integer each type an object passes the minimum confidence/probability threshold.
Hello Adrian
This is a really great post. I have one question.
Suppose i want to create a program to count road capacity of a road in real time, how can i utilize the dlib package? Or should i use something else?
Thank you in advance.
road capacity here is in percentage, so that i can see at what percentage is the capacity of the road
You would use semantic segmentation which will return a pixel-wise mask, assigning each pixel of an input image to a class. You can then count the number of pixels that belong to a certain class and derive your percentage.
Hello Adrian
i wann help how i can count people standing in queue on shop billing counter.
You can simply apply the object detector covered here. Loop over the detections, check to see if the minimum probability/confidence is reached, and then increment a counter.
Dear Adrian
Once again, fantastic post. My journey with opencv and deep learning has started with you and I am enjoying every moment of it. I would really love to learn to count automobile traffic with OpenCV and I’m looking forward to your next blog post.
Thank you
Lyron
Thanks Lyron, I really appreciate the kind words. I’ll absolutely be covering traffic counting in my upcoming Computer Vision + Raspberry Pi book. Stay tuned!
is there a way to run in windows?
You can certainly run it on Windows, just make sure you have the proper pre-reqs installed. OpenCV and dlib you should be configured and installed. I would suggest starting there.
This is extremely helpful. Is there a way to send an mjpeg stream as the video source instead of a file or the default webcam?
Yes, but you need to specify the IP address of the camera (and potentially other parameters) to the “src” of the VideoStream.
I changed the script to read:
vs = VideoStream(src=”http://192.168.10.1/media/?action=stream”).start()
and it returned with an error of:
OpenCV(3.4.1) Error: Assertion failed (scn == 3 || scn == 4) in cvtColor
It sounded like that sometimes might mean that the libjpeg module was installed at the time of compiling, however I did “sudo apt-get install libjpeg-dev” before compiling.
Maybe there’s an mjpeg module I’m missing too? Curious if you had any insight.
Thanks!
No, the error is because the object returned from the stream is “NoneType”, implying that a frame could not be successfully read from the stream.
Hi Adrian,
Could you please tell me how can modify the detection part to put bounding boxes and confidence level as long as Id numbers(counting them)?
like this post but with counter
https://pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/
You can use the “cv2.rectangle” and “cv2.putText” functions to draw bounding boxes and integer IDs. If you are new to computer vision and image processing I would reading Practical Python and OpenCV so you can learn the basics.
Thank you,
Yes, I am new to OpenCV, just started reading your books.
Could you please also guide me how can I change the detection part to YOLO3? I want confidence level, the name of the object, as well as the bounding boxes.
sorry forgot to tell, I want it to be real-time. DNN module does have access to GPU. How can I count in realtime though?
https://pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/
While I’m happy to provide these tutorials for free I cannot modify the code for you. The blog posts explain the code rigorously, the code itself is well documented as well. I would suggest you download the code and start playing with it. Building your own projects is the best way to learn. I believe in you Elena, good luck! And if you need more help you can find me inside the PyImageSearch Gurus course.
Hey, thanks for the tutorial. I have a question which is a bit off topic. I am working on a project where I detect animals from a video and I want to be able to identify them so I can track whether they leave a certain area and for how much time they stay out of it. Individual animals are not that important and it is fine for me if IDs are switched. Centroid tracking was fine up to the point I started checking whether the animals are outside of a certain area by checking whether their centroid is in the area. Unfortunately, this is not good enough for me and I would like to also keep track of the startX,startY,endX,endY coordinates of their bounding boxes. Maybe it is obvious but I cannot figure out how to do this with your implementation of Centroid tracking. Can you give me so advice on how to do it if it is possible or additionally is there another way that I can use to identify objects? I can’t seem to find much on the topic of identification on the web.
How is the “area” defined? Is the area within view of the camera? Or is it out of view? It would likely be easier to provide a suggestion if you had an example video of what you’re working with.
I don’t know how to send you an image over here but I will try to explain how a frame of the video looks as well as possible. The camera that shot the video is put on top of a microscope which and it films bacteria in a petri dish so the area I am talking about is circular. I think I managed to save the coordinates of the bounding boxes by registering by saving objects in the centroid tracking with their bboxes coordinates as well as centroid coordinates and then I slice inputCentroids and objectCentroids to get the centroid coordinates and compare the distance between them.
I think I understand the problem now. Your goal is to maintain a history of all their bounding box coordinates, correct? If so, all you need is Python’s “deque” data structure. This tutorial will show you how to use it. It uses centroids rather than bounding boxes but you can update it to use the bounding boxes.
Hello, i want to process this code in my desktop, but my camera from pi is built other way and just to send the frame to my desktop how can i do this? any suggestion?
I’ll be covering that very question in my upcoming Computer Vision + Raspberry Pi book, stay tuned!
I have a plan to get a copy of your book like i get the deep learning starter bundle…. i expect it all in one
Thanks Amare 🙂
why we need the self.maxDisappeared to be 50…. is it not too much ? it is equivalent to 50 frames. since the centroid of the lost object is same through out after it get lost, it has less chance of getting a match with the current centroids. or may be the centroid will match with the wrong object and that will be a problem for assigning wrong object id…. normally it has no effect on the counting …… i checked it out……
hello Adrian , i tried to use your code in order to count people with custom video , but i remarked that the processing of video is very fast . can i reduce the speed of processing ?
OpenCV’s job is to process the frames as quickly as possible. You can “slow down” the processing by calling “time.sleep” in the body of the “while” loop.
Hi if i am to live stream the video from the pi from a remote location to my PC, what would i need to change in the source code to allow me to listen in to the IP of the remote camera that is doing it’s live stream?
How can i configure this? “vs = VideoStream(src=0).start()” to accept the live stream directly and process it remotely?
I’ll be covering remote streaming in my upcoming Computer Vision + Raspberry Pi book, stay tuned!
How can you incorporate timestamps with each person ( the entry time into the frame)?
I would suggest modifying the “TrackableObject” to include an “entry_time” instance variable. That variable should be set with the current timestamp with the object is first detected.
I was able to get it to count people in real time using the example video, but it chokes when the video has more people. Like for instance, this video https://www.youtube.com/watch?v=P0oAN4Cumsc … its about 6 times slower than real-time on a PC.
Is it possible to speed things up even with video of a lot of people?
Try using this post where I show how to distribute the object trackers over multiple processes.
Hello!
Thanks for your sharing.
I’m working with some other videos using your source code, but I’m not sure how I can make it better than now. Do I need to share my videos?
Thanks.
Hello Adrian,
Thanks for the wonderful code. So far everything was going fine,but while running the code when the lady just touches the yellow line the video is getting closed. Is this a problem with the model or something else.
That is quite strange. Is there an error printed to your terminal? Double-check your terminal output.
The current code uses the centroid position to compare if its > or < the bar. How do I get the rectangle (top, left, right, bottom) of the "to" object that is being compared instead? I'd prefer to check if the rectangle intersects the line as the centroid may be above it. Thx
I wouldn’t recommend trying to check if the centroid perfectly intersects the line. Due to quality of the video stream, artifacts, or simply a fast moving person and a slower processing pipeline, you cannot always guarantee the centroid will perfectly intersect the line.
I might have not been clear in my ask. I am actually looking to use centroids to calculate direction, but use “intersection” (I use shapely) of the bounding box of the object to my line that marks in vs. out. I solved the problem by storing the bounding boxes in the tracked objects as well during creation/update. Thanks!
Hi i might have overlooked however it would be great if you could help me with drawing rectangles around each person.
i saw a post regarding this using cv2.rectangle and cv2.putText however cant figure it out…
Kind Regards
It’s okay if you are new to the world of computer vision and OpenCV but if you need help learning the basics, such as using drawing functions, you’ll definitely want to read through Practical Python and OpenCV where I teach you the fundamentals. Definitely give it a look, I’m absolutely confident it will help you.
Can this be created as a web page, suppose if you click a button on the web page the output video must be shown on that web page
You mean like stream the output of the script to a web page?
Thank you very much for creating this tutorial, i found it immensely helpful, it would be very helpful if I get to know how to change the writer so that I can create a web application in such a way that the output video is shown on a web page
I am covering that very topic in my Raspberry Pi for Computer Vision book, stay tuned!
Thank you for the post! I’m working with the same model of yours but I’m getting fake detections of people even if the code is same as yours. Can you help me with the problem?
I only changed dlib.rect() with the arguements type as ‘int’ because without it I’m getting dlib error of types.
Hi Adrian,
first of all thanks for this nice implementation for people counter.
please I would like to know why did you use two object tracking algorithms and can we use only one of them instead?
thanks again
Sorry, I’m not sure what you mean by using “two object tracking algorithms”?
Dlib’s tracking algorithm and centroid trackers are doing different things.
Briefly:
1. DLIB tracking algorithm give us boundary boxes (around detected objects)
2. Centroid tracking algorithms’ role is to identify which boundary boxes belong to the same objects.
Thanks for jumping in there and clarifying, Bahi!
Hi adrian,
can i use a pi camera for this tutorial?
( slow fps is fine for my application)
Change Line 43 to:
vs = VideoStream(usePiCamera=True).start()
Hello How do i hook this up to my pi camera and use it instead of just loading the preview videos?
Please refer to the post as I discuss that question already. Change:
vs = VideoStream(src=0).start()
To:
vs = VideoStream(isPiCamera=true).start()
Hi Adrian,
Thank you very much for the great article.
I am facing below mentioned problems to accurately detect the people count:
1. am just trying this on my camera and what I am observing is that if the detection of a person for some reason fails at the 30th frame, then the count is not accurate. It is missing out few people.
2.When more people are moving together in a group even then I can see wrong count as detection framework is failing to detect all the people in group.
3. When a person is idle for some time (standing still or sitting) and suppose detection is failing intermittently, (identifying sometimes and sometimes not) then the person ID is incrementing and resulting in wrong count.
All these above mentioned problems are related to detection. Do you have any suggestion for them?
Hi I was wondering is there a way in which i can show the video output on a website instead of in the window like produced above
I’m covering that exact topic in my Raspberry Pi for Computer Vision book.
Hi adrian,
Thank you so much for this project. I am currently trying to use this program. However, I got an error from the dlib.rectangle, the error says about Argument error. Can you please help me with this
Sorry, without knowing the exact error I cannot provide a suggestion.
Hi, for what I can see, when the detection occurs, all trackings are lost (trackers = [ ]) and we start from scratch. Suppose a person is halfway through the scene and the detection starts, how could I persist that person’s id?
Basically, is it possible to match detections with already tracked objects?
Thanks!
Forget it, I did not read the part of defining id based on the centroids distance 😀
my apologies
No problem, Rafael!
Hi Adrian
What a great post!
I would like to know the setup for the recording
of the example videos used here, specifically
which are the height and angle of depression
of the camera.
Bests
Sorry, I don’t recall the exact angles.
Hello Adrian ,
Thank you for a great tutorial.
l tried to change your codeto detect left to right instead up down.
How do I make changes on line 204 to keep track of the x-coordinates?
Hey Ricky — I’m covering left-to-right tracking (along with vehicle tracking) inside Raspberry Pi for Computer Vision. Definitely take a look.
This is great! I was wondering if there was a way to track them also going from left to right along with up and down?
I wanted it to work alongside the up/down simultaneously! Would that be possible?
Having the horizontal and vertical line at the same time and tracking up, down, left, and right at the same time
Hey James — I’m covering both left/right and up/down tracking inside Raspberry Pi for Computer Vision. The implementation there will help you.
Hi Adrian, can I ask what is the minimum CPU Requirement for this people counter script ? I have run it using Odroid Xu 4 mini PC and it is quite laggy.
Thank you
Martin.
The Odroid Xu4 CPU is far too underpowered for that script. You would need a laptop/desktop/
I’m actually covering how to optimize the people counter script for embedded devices, like the Odroid Xu4 and RPi, inside Raspberry Pi for Computer Vision.
Hi Adrian,
Your tutorial and code are truly amazing and have helped me get a jump start on this technology. I have run the code successfully on Ubuntu 16.04 on 2 computers — one quite old and another more modern. I can definitely see a difference in performance and the newer laptop works fairly well. To experiment with performance I have installed Intel’s new “Making Python Fly: Accelerate Performance Without Re-coding”
(https://sourceforge.net/articles/making-python-fly-accelerate-performance-without-re-coding/)
I am encountering some errors that seem to be related to “imutils”
Still checking out the possible error sources, but would be interested in the experience of others and any practical tips.
Thanks,
Alan
Hi again Adrian,
Somehow this quote got omitted from my just submitted reply ….
“Intel® Distribution for Python*, which is absolutely free by the way, uses tried-and-true libraries like the Intel® Math Kernel Library (Intel® MKL) and the Intel® Data Analytics Acceleration Library (Intel® DAAL) to make Python code scream right out of the box – no recoding required. Here’s some of the benefits dev teams can expect:
First, number crunching packages like NumPy, SciPy, and scikit-learn are all executed natively, rather than being interpreted. That’s one huge speed boost.”
Hi Adrian, Thanks for the nice tutorial. Are you going to cover this in your book Raspberry Pi for Computer Vision in detail? Would it be explaining the conditions where this code does not work? The code works perfectly fine in the video you have shared but does not work on my own videos in a similar setup. I have an overhead camera for counting people coming in and out of the room. It gives false counting.
Thanks
Yep! I’ll be covering how to optimize the people counter for the RPi inside Raspberry Pi for Computer Vision. I’ll also be explaining why the modifications are required.
Hello Adrian,
That was a wonderful software which you have developed.
I am presently using this on a Raspi, can we by any chance adjust the pi camera resolution from your code, and also let me know how we can decrease the CPU usage which is at about 77% when we run the code.
You can adjust the resolution to the RPi camera module via:
vs = VideoStream(usePiCamera=True, resolution=(300, 200)
Or whatever resolution you want to supply.
Hi Adrian;
First of all thank you for that great tutorial. I have two questions. First, can i change the position of the line that you draw to decide people are going up or down. Second, can i use yolov3 tiny for that code?
Thanks and best regards
Yes, you can do both but you would need to implement yourself. I’m happy to provide this code and tutorial for free but I do not offer custom adjustments. If you need help I would suggest you refer to Practical Python and OpenCV so you can learn the fundamentals first before tackling this project.
Hi Adrian,
I have installed the code on Raspberry Pi and the Raspi is installed on the roof which is 10 feet in height. When people are moving below it the device is not able to detect the same. Should we do any changes in the code? Also, will the performance increase if I install the same code on an orange pi plus 2e which as a 2 GB ram.
Need your inputs.
Thanks,
Vamsi
I’m covering how to build a custom people counter for the RPi inside Raspberry Pi for Computer Vision. I would definitely suggest starting there.
Will this be available by August 2019?
Yes it will!
Hi Adrian,
Thank you very much for the tutorial. I was wondering what would have to be changed in the people_counter.py code in order to make a live stream video display canny edge detection rather than a full video where everything is visible. I ask this because I am only interested in the people in the frame, rather than all of the other details. I know I will use an imutils function somewhere, but Im not sure which part to manipulate.
Thanks again for all of the help!
-Charlie
Hey Charlie — have you taken a look at Practical Python and OpenCV? That book will teach you the basics of computer vision, including how to apply the Canny edge detector. I suggest you learn the fundamentals first, then you’ll be able to apply the detector to the code.
Hi , I am new to python and having difficulty in parsing the arguments, I am using mac, how should I Proceed?
It’s okay if you are new to command line arguments but you should read this tutorial first.
Hi Adrian,
Thank you very much for the precious tutorial. I face a problem in people counting project when I am going to track people though detecting them is not hard.
would you please give me a tutorial about the best tracking methods such as “deep tracking” or other else?
Best
Maryam
thanks upto infinity ….
You are welcome!
Hi Adrian, thanks for the great tutorial it helped me a lot in my traffic monitoring application.
now to my question, how to stop the tracking id from showing during the frames that disappeared? in my application i have a queue and i need maxDisappeared to be high, but i also dont want to the disappeared frame id to just hang on the frame without the detection.
thanks
You could check and see if the ID is close to the boundary of the frame, and if so, delete it from the list of objects to track.
Hi Adrian ,
I have a doubt as to if this can help in counting number of students in a class , and is it possible to count number of children in a photo by headcount without having a large database of children photos availible to train .
Take a look at Deep Learning for Computer Vision with Python. That book will teach you how to label faces based on their age and therefore be able to count the number of children in photos.
it is possible to run it on windows.. i try it on ubunto on my laptop and working..
The code will work on Windows but note that I do not support Windows on the PyImageSearch blog.
Hey,
Using conda for Windows I have issues with dlib and imutils using conflicting versions of scipy. What are the latest versions of all the libraries without dependency issues?
Dlib and imutils do not have conflicting SciPy versions, there’s likely an issue with a different library. Try checking again.
hi, how to get the previous centroids of an object by giving object id
Loop over the previously detected objects and compare the IDs to find the one you’re looking for.
Hi Adrian, Is it possible to run this code on android platform and using the mobile camera ?
You would need to convert the OpenCV + Python code to OpenCV + Java code, but in general yes, it will work on Android.
Thanks Adrian.And do you think is it possible to use this algorithm in real time android application?
Yes, but you would need to implement the algorithm in Java + OpenCV via Android’s bindings (which I don’t have any experience doing).
Hi, how would you create a customized MobileNet model to deploy into this program? Where to start?
You should start with Deep Learning for Computer Vision with Python where I teach you how to train your own custom MobileNet model.
Thanks for the tutorial.
I am actually using this code for a project. The thing is that the output is too slow which means that people don’t get counted. I wonder if its the computational power of raspberry pi 3.
Could you suggest a more powerful processor that would run this code at a higher FPS and accurately count people. Maybe the new Raspberry Pi 4?
I cover how to build a people counter that runs fast on the Raspberry Pi inside Raspberry Pi for Computer Vision. If you need a fast people counter on the RPi, definitely start there.
I did check the website. At the moment the priority is people counter so which bundle do you suggest I go for?
I would suggest you go for the “Hacker Bundle” as that book includes people/object counting not only on the RPi CPU, but also the Movidius NCS as well.
Hi Adrian, I am looking for similar counter feature but with actual LED. I am researching on how to count the LEDs on a device (10 to 20 LEDs like a router). Can I tweak the source code form the post to do LED counter instead?
Hi Adrian, awesome post. Thank you so much. Please tell me How to use this code for custom model? Object i want to detect there is not in Caffe model.
Deep Learning for Computer Vision with Python will teach you how to train your own custom object detector.
Hi Adrian,
This tutorial and the content on deep learning is awesome and really helpful.
I am actually using this code for a project where we need to count each and every person who enters the retail store from a video clip. But the problem I am facing at the moment when any 3 or more person entering the store together with, the count is not correct. The count I am getting as 2 instead of 3. So sometimes it’s skipping the object. can you please help me on this as I don’t have much knowledge of deep learning.
Great tutorial, thanks! Does this procedure work on Windows or do I need to install Linux?
It will work on Windows provided you have OpenCV correctly installed.
Great post Adrian! So is there any ways that I can also record the clock time every times it detects a person up or down? Also, how could I populate the results into a text file or .csv file (i.e 1 row per person). Thank you Adrian!
Hey Kyle — if you need help extending this project I would suggest you refer to Raspberry Pi for Computer Vision with Python where I cover your use case in detail.
greetings from Colombia, instead of the webcam an ip camera, how would it be?
hi sir, thanks for this awsum post..its working perfectly fine….i wanted to count ppl walking from right to left..can u plz tell me how exactly to implement that…
can i use cv2.line(W//2,0) ,(W//2,H)…if yes..will i have to make any changes elsewhre???
See my book, Raspberry Pi for Computer Vision, where I cover other types of tracking and counting (including left-to-right counting).
Hi Adrian, thanks for this wonderful blog post.
I am looking to run this on a saved video file but not real time to get information about history . Objective is get the people inside the store at any given time which the difference between up and down. My video feed has the current time running on it. Any suggestions on how to do it for history video file and map the count to time. Is there any blog that does/aware of?
I would recommend reading Raspberry Pi for Computer Vision where that exact use case is covered.
Hi Adrian,
This is super helpful.
Is it also possible to use rtmp url?
Thanks!
I actually recommend ImageZMQ for network streaming.
Hi Adrian !!!
nice to meet, I am here from Brazil, congratulations for your work contributing to the advancement of research and science, but of course I would like to kindly clarify a doubt, when doing the counting of people in video, which modesty I think amazing, I would like to See if you have an article about whether you can tell other people about how long you stay somewhere, for example a person standing on the street, how long you have been staying at that location, thank you very much and good luck.
You can easily integrate how long a given person/object is tracked by using the “time” or “datetime” Python packages.
Hi Adrian,
Awesome Post! I was just wondering if the runtime data (Status, Up & Down) can be sent to any Webpage?
Let me know your thoughts and how I can proceed with this?
You mean like this?
Hi Adrian, can I use other trackers such as kcf or csrt instead of centroid in this people counter project?
You need centroid tracking to associate centroids. I think you mean if you can replace dlib’s correlation tracking with KCF or CSRT — if so, yes, you absolutely can.
Hi, is it worth to use stereocamera to build depth map and then use object detection?
You don’t need to unless your project specifically requires depth information.
Hi Adrian,
Great tutorial, thank you for passing on such valuable knowledge. I currently have a need to count people crossing a line but only for part of the footage. Is there a way I can make the “cross” line shorter rather than the full width of the image?
Sir, first of all thanks a lot for this blog.
Sir, I want to save history in database that how many persons went through the camera for given duration like during 10:00-11:00 o’clock. means, I want to make it end-to-end project.
You can use the “time” and “datetime” Python packages to determine when a person enters and exits a the view of the camera.
Hi Adrian Rosebrock, you are doing great!
Can I convert this caffe model to tensorflow?
Hello Adrian. Can I combine the source code to count people, how long does it take for 1 person to another person to queue?
Hello Adrian, a nice example of a people counter.
However, I face a problem with counting which depends on pedestrian change of walk direction.
As an example a pedestrian goes “up” and cross the line. and the counter “up” adds +1
While being in the second half of monitored area, the pedestrian is suddenly changing direction and going back down
How to catch such event and fix the counter ?
I am trying to use 4 overhead webcams to track the position of people in a square-shaped room, and wondering if the solution lies in combining this tutorial with your two image stitching tutorials.. If so, would I need to somehow stitch together a 2×2 matrix of webcams feeds and then feed that into the code presented here? Not sure where to begin, so any suggestions would be much appreciated! And thank you for all your great work.
Absolutely. Take a look at my image stitching tutorials.
Hi, I’m a student working on a project that counts people who enter and leave public transport. We’ve recorded some videos, but got low accuracy: it doesn’t count when people enter simultaneously and if they move too quick. Probably, it’s a result of distance between the camera and people (it’s too small). Could you, please, give any advises how it’s possible to increase accuracy? Thanks in advance.
Hi bro..I’m also a student and i’m developing a people counter for public transport too. The plane, the visual field of the cam is also important..and obviously the kind of tracker you use..you can write me and we can interchange a lot of knowledge about this stuffs…
Hi Adrain,
Thanks for this tutorial.
Is it possible to save the frame number and coordinates of the object detected bounding boxes of that frame in a JSON file?
Yes, absolutely. I would recommend you read up on basic file I/O first. Also, take a look at Python’s
json
module which makes it super easy to serialize objects to disk in JSON format.Hello, Adrian
Could you tell me why you use frame[1] in line 80?
line79 frame = vs.read()
line80 frame = frame[1] if args.get(“input”, False) else frame
I think that frame is suitable not frame[1]. Am I wrong?
The correct is correct as it handles if you’re working with either a
VideoStream
orcv2.VideoCapture
object.Hi Adrian, Thanks for this informative blog post. I have task of person detection + person tracking + calculating their time inside the frame and I have achieved it using this post and got quite good accuracy.
I have noticed the overlapping bounding box of person. I am trying to apply non maximum suppression from one of your article https://pyimagesearch.com/2015/02/16/faster-non-maximum-suppression-python/
Do you think it is possible to apply NMS in this person detection. Any idea? Thanks
Absolutely. Just grab the bounding box coordinates from the detected objects and pass them into the NMS algorithm. The object detector here technically applies NMS already but if you need to you can certainly apply NMS again.
how to do this same using YOLO V3 and SORT
Hi Adrian,
Can you tell me from where you got the dataset for detection and tracking?
David McDuffee from the PyImageSearch team provided the example videos.
Kindly provide the dataset to me
You can find it in the “Downloads” section of this tutorial.
Hii Adrian
very nice tutorial. I need little help how would I this exercise using yolo instead of ssd
Dear Adrian,
Thanks for providing such type of knowledge in OpenCV I am trying to save the image which is detected with ID.But it is saving latest image of that ID.I want to save every image as soon as it is detected.
Please help me how to achieve this.
I’m not sure I understand your question. Could you elaborate a bit?
Hi Adrian, Sorry for not being clear.
Based on detected object ID, I am drawing a rectangular boundary box and capture image in rectangle using frame(start x,y and width, height of rectangle)
Issue i am facing is I am unable to get correct start x,y and end x,y of all the identified objects and map it to correct object id
Let me know how to do this.
Dear Adrian,
What a wonderful post! I’m looking for a people counter using OpenCV for a non profit event. Nobody here has dev skill to code our own software within the time frame, but seems you made it! Are we allowed to use it?
From my understanding, Raspi not enough powerful to run this. May I ask you what kind of hardware do you recommend? I have to make it run for 32 hours straight, with 3 meters large line with people crossing the line up and down around 20000 times each.
Is a standard x86 intel desktop computer could be fine? Is the program able to run on multicore CPUs?
Thank you for sharing such a great knowledge!
That’s awesome that you’re interested in using the code! The algorithm provided here won’t run on the RPi. Instead, refer to Raspberry Pi for Computer Vision which includes a people/footfall counter project that can run at 40+ FPS on a RPi 4.
is this project detects the whole human body or only heads of a person
It detects the whole human body.
How if i want to save the output to database? i want to create vehicle counting web apps, and there will be reporting dashboard for every hour how many vehicles counted, can you give some advices. Thank you very much
I cover creating a vehicle counting app inside Raspberry Pi for Computer Vision — I suggest you start there.
Hi Adrian Rosebrock
can multiprocessing or parallel processing will improve the FPS on this code?
Yes, you could assign an object tracker to a specific CPU/core.
For the record, Adrian’s following post covers this topic. https://pyimagesearch.com/2018/10/29/multi-object-tracking-with-dlib/
Does this people counter can be impemented on windows?
if yes,how to install open cv and other libraries on windows?
Yes, this method will work on Windows.
there is a problem occur installing dlib library in pycharm… do i have to install anaconda for including dlib
No, you do not need to install Anaconda. Secondly, try installing dlib via your command line/terminal.
Hi Adrian , great tutorial. Id like to know how to count the time for each tracker object.
Im trying to change the trackableObject and centoridTracker but im doing some mess in the code. Can you send any example or tell me where i must to change to count time for each tracker?
Hey Jason, take a look at Raspberry Pi for Computer Vision. In that book I demonstrate how to associate timestamps with each TrackableObject.
Hello Adrian,
Thanks for sharing interesting blog!
I am looking for the counting people in Gondola lift. So it should count number of people from the top view. I tried to run mobilenet ssd on few images but it doesn’t perform well.
Could you please guide me or provide me any pointer?
Thanking you,
Saurabh
Hi
How can I use it closer? It seems that when camera is closer can’t identify “person”. Thanks. Great job!!
Hi Adrian, I tried to modify your code to detect 2 different classes which is “cow” and “person”. For now, I managed to make the code detect both classes but I am stuck to differentiate the labeling showed in the video frame. Where should I adjust to display the total number for each classes? Can you please help me solve this? Thanks
Refer to the comments section of this post as I’ve addressed that question already.
Hi Adrian
Is the precision for the people counter to be in the low 70% when run in real time with just one person walking back and forth i.e., it fails to count 3 out of 10 people if run live.
Precision increases to 90% when the code is used to count people in offline mode i.e., mp4.
Precision drops even further when a group of people (2-3 people) cross the line.
Is this the best precision we can get?
Some Times it ignore person.(Live Stream)
I used exact your code without any changing from your book (Raspberry Pi for Computer Vision – Hacker Bundle)
What is wrong?
Adnan Fakhar
Hey Adnan — the code should work the same regardless if you are using a live stream or a video file. I think the issue may be the skip frames argument where we skip frames to get the code to run faster on the RPi. Try decreasing your skip frames and have the object detector run more often.
Hi Adrian,
I have a realtime scenario and I don’t know how to counter that, anything that comes in your mind will be helpful.
We want to track an employee from One IP Camera to Another IP camera
Example on Cam1 we will identify the employee by its face but on Cam2 we are seeing the backside of the employee so the face is not an option here. We want to verify that an employee who came in front of Cam1 actually got inside the office and did not turn back again outside.
Hi Adrian, first of all thank you so much creating this site and explaining these concepts end to end with every detail. I am a big admirer of your writing. I have recently encountered a situation where I need to count different types of vehicles (car, bus, truck etc.) from the video. I have used yolov3 as detection model and centroid algorithm as a tracker (which you have explained in this article). so, the counting starts when you loop over the tracked objects which is basically objectid and centroid (there is no class id here). I have already ensured detection is only for car,bus and truck. but as per the approach you mentioned in this article, I am getting the overall total of these 3 classes. Is there any way we can tweak in the code to give us counts by each classes (Car-6, bus-5, truck -1), I would really appreciate your help on this 🙂
1. Initialize a “counts” dictionary
2. Loop over detected objects
3. Grab count for current object label from “counts” dictionary
4. Increment count
5. Store count for label back in dictionary
Hi Adrian,
I just wanted to check which bundle i should purchase to get the people counter-image from your book Raspberry pi for computer vision.
And also which is the best Raspi 4 to have the image installed – 2gb or 4gb so that the raspi will perform without any issues.
Thanks for all the help
You would want the “Hacker Bundle” or “Complete Bundle” of Raspberry Pi for Computer Vision. Both of those bundles contain the people counter implementation.
Secondly, you can use either the 2GB or 4GB version of the RPi. The 4GB will give the best performance but the 2GB one will still work just fine.
Hey, thank you s much for this library and the blog. Really helped me out. We are looking to develop an industrial application and had few doubts:
1) For a single camera mounted at a suitable height what can be the range of the detection. Will I be able to detect a person or a car which 40-50 meters away.
2) Can you suggest some cameras which will go with Raspberry Pi to implement this?
The codes work great on a Jupyter environment.
However I’m doing my testing on Google Collab which doesnt directly enable a “.imshow” or similar arguments.
Would you kindly let me know how to update this code for a Google collab environment (running a python 3 notebook with GPU in there)
You can use matplotlib’s “imshow” image instead of OpenCV’s “cv2.imshow” — just be sure to swap the color channels from BGR to RGB.
Hello, am having some questions about the parameters that you are using like the next:
box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
Why are you using [0,0,i,3/7] and [W,H,W,H]?
blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H), 127.5)
Why are you using .007843 and 127.5 in (frame, 0.007843, (W, H), 127.5)?
Both of those questions can be answered by reading my blobFromImage tutorial as well as my gentle guide to deep learning object detection.
Hello Adrian!
Thank you for sharing an amazing tutorial.
I had a question. Can i use another tensorflow pre-trained models rather than mobile net SSD.
Can you help me where to update the weights file so that the model can identify based on that particular learning. Thank you
Yes, you can use a SSD. In fact, any object detector that returns the bounding box coordinates of a person/object will work.
hi Adrian! im korean student. I have a one question. (My English skill is bad.. sorry…)
I try to people count project with friend in shcool.
Can this code using raspberry pi(with pi camera) in real time?
As is, this tutorial would not be sufficient to run the people counter in real-time on the RPi. For that, you would need to refer to my book, Raspberry Pi for Computer Vision, where I show you how to create a people counter that can run in real-time on the RPi.
If possible the output value store in txt file?
Yes. I would recommend you read up on basic file I/O with Python. CSV files may also be interesting for you to read about.
Thanks so much for this series of tutorials.
I replaced the detector with a model running on a google coral USB stick, following the same method presented in the coral USB + raspberry pi tutorial (used the same model here, mobilenet_ssd_v2).
Running just the video detection example on the RPi4+coral alone, I get excellent frame rates (30+ FPS or so).
When I run the coral detector + the dlib tracker, though, I still get not great results on the RPi4 because the tracker and centroid part seems to still drag it down. Do you expect this?
I do have the Rpi Hacker Bundle and it seems you got really good speed using the NCS+dlib on the RPi4 – I would have thought I could get similar results using the coral.
Thanks!
I did some experimentation using edgetpu and *no* object tracking (i.e. detection every frame, no tracker) and *with* centroid tracking enabled I get 14.4 FPS avg, *without* centroid tracking I get 28 FPS avg. (this is with a 4GB RPi 4).
Actually, I figured out that the new version of OpenCV has some type of bug and downgraded and it worked.
However, I’m still working through the problem of trying to keep track of how long a person is in the video before they disappear (total time of existence of an object). I’ve been using some of the code from your vehicle speed example, but what I’m not quite getting is how to keep track of the start and end times when the main program is not the program in charge of removing centroids once they have been out of frame long enough. Do you have any suggestions or pointers? 🙂
Is that possible that we can track the people from the side angle who are going upward and downward? Like video streaming from the phone camera or a simple RGB camera that is fitted into the wall to see who is coming inside or outside.
Hey Adrian,
Your code is running on my raspberry pi but when i use my own video as the import, the counter was messed up and could only count one or two ppl. Is this code trained to recognize specific background? (like the brick ground in your demo) And how can i fix it?
If you want to create a people counter that can run on the Raspberry Pi I suggest you read my book, Raspberry Pi for Computer Vision, which will teach you how to do exactly that.
Hello Adrian,
Thank you for sharing the amazing work. I wanted to know how can i run this real time on my laptop webcam. Can you please give me some insights into it.
Which part of the code do i need to change
Thanks
This code will run in real-time on your laptop provided it has reasonable specs.
Hi,
Thank you for the awesome template code for people counting. I was wondering if it’s also possible to implement more than one LOI (line of interest) for the sake of counting people in multiple regions (e.g. placing lines in front of door rooms to count people enter/exiting each room).
Looking forward to your reply.
Absolutely. You would want to modify the “TrackableObject” to keep track of which lines were crossed.
Hi Adrian,
Can we use this code for depth camera(depth image/gray scale image) interfaced with ROS?
Thanks
This code assumes you are using a standard 2D camera (i.e., not a depth camera).
hi andrian,thanks alot for such a awesome tutorial,i am using this code to count the vehicles on the road.its working fine,i want to know how can i add label and bounding box on the detected object?i want to add the label like bus car etc..
waiting for your reply
You can use the “cv2.rectangle” and “cv2.putText” function to draw bounding boxes and labels. You can read Practical Python and OpenCV to the learn the basics of using OpenCV and these drawing functions if you need further help.
Hey Adrian, great tutorial!
I’ve implemented this for a license plate recognition project. I made a line in the frame and if a car crosses it then the license plate detection and OCR pipeline is activated. But, this process takes up a lot of time. Any ideas on how I can speed it up?
Hi Adrian, thanks for this tutorial! I’m trying to apply your method to an application where I’d really like to be able to save the count output, ideally per frame, in a data structure so that it can be analyzed numerically and not only showing the count in the outputted video. Is there an easy way to do this? Thanks!
The exact data structure is really dependent on your application. Perhaps just use a simple SQL database?
Hey Adrian!
I am trying to implement the same for my license plate recognition project. I’ve made a line near a gate on the frame and if a vehicle crosses it, I want to recognise its license plate. Now, I don’t want the vehicles moving on the road to be tracked as they are not moving towards the gate. Due to all the vehicles being tracked, the whole processing of the frames takes up a lot of time. Is there any way, I can only detect the vehicles moving in the direction of the gate?
Hi ardian
can you use a camera with live feed as input or not ?
greetings Mark
Yes, you certainly can. See this tutorial for more details.
Thanks !
Also is there a way to turn the detection so the line is vertical without turning the camera 90° ?
We’re working on a people counter but because of cooling we can not turn the raspberry so we are a bit stuck.
Yes, but it’s more than just changing how the line is drawn, you also need to adjust the logic for the “TrackableObject”. Tracking both vertically and horizontally is covered inside Raspberry Pi for Computer Vision.
Hi Adrian, I was wondering if this code can be used for blob detection and tracking in videos ?
Hi Ardian,
Firstly, thanks a lot for creating such great tutorials. your tutorials really helped me understand a lot with no prior knowledge of opencv in general.
I followed this tutorial chain and tried to make this work for a video(the camera is placed on top of the bus door) to count num of people entering and leaving a bus or a van for example.
it failed to detect and track probably because of the skip frames being 30 and its missing detecting new passengers. i reduced the number to 5, the num of people being detected and tracked increased but its still not enough. Cant i run both detection/tracking for each second frame?