Table of Contents
- Object Tracking with YOLOv8 and Python
- YOLOv8: Reliable Object Detection and Tracking
- Understanding YOLOv8 Architecture
- Object Detection and Tracking with YOLOv8
- Object Tracking with YOLOv8 on Video Streams
- Configuring Your Development Environment
- Project Structure
- Summary
Object Tracking with YOLOv8 and Python
In this tutorial, you will learn object tracking and detection with the YOLOv8 model using the Python Software Development Kit (SDK).
To learn how to track objects from video streams and camera footage for monitoring, tracking, and counting (as shown in Figure 1), just keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionYOLOv8: Reliable Object Detection and Tracking
In the rapidly advancing field of computer vision, YOLO (You Only Look Once) models have established themselves as a gold standard for real-time object detection. The latest iteration, YOLOv8, brings significant improvements in accuracy and speed, further pushing the boundaries of what’s possible in object detection and tracking. This blog post delves into the architecture of YOLOv8, how it achieves its impressive performance and provides practical examples using the Ultralytics YOLO Application Programming Interface (API).
A custom, annotated image dataset is vital for training the YOLOv8 object detector. It allows us to train the model on specific objects of interest, leading to a detector tailored to our requirements.
Roboflow offers free tools for each stage of the computer vision pipeline, which will streamline your workflows and supercharge your productivity.
Sign up or Log in to your Roboflow account to access state-of-the-art dataset libraries and revolutionize your computer vision pipeline.
You can start by choosing your own datasets or using our PyimageSearch assorted library of useful datasets.
Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc.), and connect to applications or 3rd-party tools.
Understanding YOLOv8 Architecture
YOLOv8 (architecture shown in Figure 2), Ultralytics’s latest version of the YOLO model, represents a state-of-the-art advancement in computer vision. Building on the success of its predecessors, YOLOv8 introduces new features and improvements that enhance performance, flexibility, and efficiency. This cutting-edge model supports a comprehensive range of vision AI tasks, including detection, segmentation, pose estimation, tracking, and classification. Its versatility enables users to apply YOLOv8’s powerful capabilities across a wide array of applications and domains.
The main features of YOLOv8 include mosaic data augmentation, anchor-free detection, a coarse-to-fine (C2f) module, a decoupled head, and a modified loss function. Let’s delve into each change in more detail.
Mosaic Data Augmentation
Like YOLOv4, YOLOv8 uses mosaic data augmentation that mixes four images to provide the model with better context information. The change in YOLOv8 is that the augmentation stops in the last 10 training epochs to improve performance.
Anchor-Free Detection
YOLOv8 switched to anchor-free detection to improve generalization. In anchor-based detection, predefined anchor boxes slow down learning for custom datasets. Anchor-free detection allows the model to directly predict an object’s center, reducing the number of bounding box predictions. This speeds up Non-Maximum Suppression (NMS), a process that eliminates incorrect predictions.
C2f (Coarse-to-Fine) Module
The model’s backbone now uses a C2f module instead of a C3 module. The key difference is that in C2f, the output of all bottleneck modules is concatenated, while in C3, only the output of the last bottleneck module is used. Bottleneck modules, composed of bottleneck residual blocks, reduce computational costs in deep learning networks, speeding up training and improving gradient flow.
Decoupled Head
The diagram above (Figure 2) shows that the head no longer performs classification and regression together. Instead, these tasks are now performed separately, which increases model performance.
Loss
- Loss Misalignment
- The decoupled head separates classification and regression tasks, potentially causing the model to localize one object while classifying another.
- Solution
- Include a task alignment score to help the model identify positive and negative samples.
- The task alignment score is calculated by multiplying the classification score with the Intersection over Union (IoU) score.
- IoU Score
- Measures the accuracy of a bounding box prediction.
- Based on the Alignment Score
- The model selects the top-k positive samples.
- Computes a classification loss using Binary Cross-Entropy (BCE).
- Computes a regression loss using Complete IoU (CIoU) and Distributional Focal Loss (DFL).
- BCE Loss
- Measures the difference between actual and predicted labels.
- CIoU Loss
- Considers the predicted bounding box’s relation to the ground truth in terms of center point and aspect ratio.
- Distributional Focal Loss (DFL)
- Optimizes the distribution of bounding box boundaries.
- Focuses more on samples that the model misclassifies as false negatives.
Object Detection and Tracking with YOLOv8
Object detection and tracking are critical tasks in many applications, from autonomous driving to video surveillance. YOLOv8 excels in these areas due to its robust architecture and innovative features.
Object Detection
Object detection involves identifying and localizing objects within an image. YOLOv8 achieves this with high accuracy and speed, as demonstrated by its performance metrics on the Common Objects in Context (COCO) dataset:
| Model | Size (pixels) | mAPval 50-95 | Speed CPU ONNX (ms) | Speed A100 TensorRT (ms) | Params (M) | FLOPs (B) | |----------|----------------|--------------|----------------------|--------------------------|------------|-----------| | YOLOv8n | 640 | 37.3 | 80.4 | 0.99 | 3.2 | 8.7 | | YOLOv8s | 640 | 44.9 | 128.4 | 1.20 | 11.2 | 28.6 | | YOLOv8m | 640 | 50.2 | 234.7 | 1.83 | 25.9 | 78.9 | | YOLOv8l | 640 | 52.9 | 375.2 | 2.39 | 43.7 | 165.2 | | YOLOv8x | 640 | 53.9 | 479.1 | 3.53 | 68.2 | 257.8 |
These metrics highlight YOLOv8’s efficiency and effectiveness in object detection tasks, making it suitable for a wide range of applications.
Object Tracking
Object tracking involves following an object across multiple frames in a video. YOLOv8’s architecture supports high-speed, accurate object detection, which is essential for real-time tracking applications. By combining YOLOv8 with tracking algorithms, it’s possible to maintain consistent identities for objects as they move through video frames.
If you need to train YOLOv8 or any other architecture for object detection and need access to 120K+ images curated and labeled with object bounding boxes to train, explore, and experiment with … for free, then head over to Roboflow and get a free account to start accessing high-quality labeled images.
Practical Examples Using Ultralytics YOLO API
The Ultralytics YOLO API simplifies the process of using YOLOv8 for object detection and tracking. Here are some examples to get you started:
Loading a Pre-Trained Model
from ultralytics import YOLO # Load a model model = YOLO("yolov8n.pt") # Load an official model model = YOLO("path/to/best.pt") # Load a custom model
Predicting with the Model
# Predict on an image results = model("https://ultralytics.com/images/bus.jpg") # Predict on an image
Training a Model
Training YOLOv8 on custom datasets is straightforward. Here’s how you can train YOLOv8n on the COCO8 dataset for 100 epochs:
from ultralytics import YOLO # Load a model model = YOLO("yolov8n.yaml") # Build a new model from YAML model = YOLO("yolov8n.pt") # Load a pretrained model (recommended for training) model = YOLO("yolov8n.yaml").load("yolov8n.pt") # Build from YAML and transfer weights # Train the model results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
Dataset Format
YOLOv8 supports a specific dataset format for object detection. To convert your existing dataset from other formats (e.g., COCO) to YOLO format, you can use the JSON2YOLO tool provided by Ultralytics.
Object Tracking with YOLOv8 on Video Streams
Do you need custom images to train or test this pipeline, or simply measure its effectiveness? Then, head over to Roboflow and get a free account to grab these object-detection-in-the-wild images.
Configuring Your Development Environment
To follow this guide, you need to have the ultralytics library installed on your system.
Luckily, ultralytics is pip-installable:
$ pip install ultralytics
Need Help Configuring Your Development Environment?
All that said, are you:
- Short on time?
- Learning on your employer’s administratively locked system?
- Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
- Ready to run the code immediately on your Windows, macOS, or Linux system?
Then join PyImageSearch University today!
Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.
And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!
Project Structure
We first need to review our project directory structure.
Start by accessing this tutorial’s “Downloads” section to retrieve the source code and example images.
From there, take a look at the directory structure:
YOLO-VIDEO/ │ ├── pyimagesearch/ │ ├── __init__.py │ └── yolo_tracking.py │ ├── videos/ │ ├── basket-ball.mp4 │ └── output_tracked_video.mp4 │ ├── demo.py └── main.py
In this section, we will explore how to set up the video tracking project using YOLOv8 with Python. We will go through three key scripts: main.py
, demo.py
, and pyimagesearch/yolo_tracking.py
. Each script plays a crucial role in processing videos, tracking objects, and setting up a user interface for ease of use.
Set-Up
Our main.py
script is the entry point for our video processing. It imports the track_video
function from our yolo_tracking
module. Here’s the code:
from pyimagesearch.yolo_tracking import track_video if __name__ == "__main__": # get the input video path input_video_path = "./videos/basket-ball.mp4" # process the video output_video_path = track_video(input_video_path) print(f"Processed video saved to: {output_video_path}")
In this script, we define the input video path as ./videos/basket-ball.mp4
and call the track_video
function with this path. The processed video is saved, and the output path is printed.
Creating a Gradio Interface
Next, we set up a Gradio interface in demo.py
to provide an easy way to test our video tracking function. Gradio is a handy library for creating web-based interfaces for machine learning models.
The entire section on creating a Gradio space is discussed in detail in the video accompanying this blog post. Alternatively, you can also download the code and see how to set up a Gradio space.
Implementing Video Tracking Functionality
The heart of our project lies in pyimagesearch/yolo_tracking.py
, where we implement the core video tracking functionality. Here is how to achieve that in code:
from collections import defaultdict import cv2 import numpy as np from ultralytics import YOLO def track_video(video_path): # load the model model = YOLO("yolov8n.pt") # open the video file cap = cv2.VideoCapture(video_path) track_history = defaultdict(lambda: []) # get the video properties fps = int(cap.get(cv2.CAP_PROP_FPS)) frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) # define the codec and create VideoWriter object output_path = "output_tracked_video.mp4" # Output video file path out = cv2.VideoWriter( output_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (frame_width, frame_height) ) # loop through the video frames while cap.isOpened(): success, frame = cap.read() if success: results = model.track(frame, persist=True) boxes = results[0].boxes.xywh.cpu() track_ids = ( results[0].boxes.id.int().cpu().tolist() if results[0].boxes.id is not None else None ) annotated_frame = results[0].plot() # plot the tracks if track_ids: for box, track_id in zip(boxes, track_ids): x, y, w, h = box track = track_history[track_id] track.append((float(x), float(y))) # x, y center point if len(track) > 30: # retain 30 tracks for 30 frames track.pop(0) # draw the tracking lines points = np.array(track).astype(np.int32).reshape((-1, 1, 2)) cv2.polylines( annotated_frame, [points], isClosed=False, color=(230, 230, 230), thickness=2, ) # write the annotated frame out.write(annotated_frame) if cv2.waitKey(1) & 0xFF == ord("q"): break else: break # release the video capture object and close the display window cap.release() out.release() cv2.destroyAllWindows() return output_path
In this script, we start by importing the necessary libraries and defining the track_video
function. We load the YOLOv8 model, open the video file, and retrieve the video properties like the frames per second (FPS) and frame dimensions. We also set up a VideoWriter
to save the output video.
We loop through each frame of the video, process it with YOLO to get tracking results, and annotate the frame with bounding boxes and tracking lines. We maintain a history of tracked points for each object to draw tracking lines. Finally, we write each annotated frame to the output video file.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
YOLOv8 is currently the most widely used model for object detection. Its core architecture is a step up from the popular YOLOv5 architecture. Its enhanced model, combined with the ease of use provided by the Ultralytics YOLO API, makes it a powerful tool for both researchers and practitioners.
In this project, we set up a YOLOv8 model for object tracking and image recognition. Object detection is a useful tool in any computer vision engineer’s arsenal.
This setup allows us to process a video, track objects using YOLO, and save the annotated video. Additionally, we can run this functionality through a Gradio interface for easy access and testing. By combining these scripts, we have a robust and user-friendly video tracking application.
Whether you’re working on autonomous vehicles, video surveillance, or any other application requiring real-time object detection and tracking, YOLOv8 is well-equipped to meet your needs.
Citation Information
A. R. Gosthipaty and R. Raha. “Object Tracking with YOLOv8 and Python,” PyImageSearch, P. Chugh, S. Huot, and K. Kidriavsteva, eds., 2024, https://pyimg.co/hqdf0
@incollection{ARG-RR_2024_Object-Tracking-YOLOv8-Python, author = {Aritra Roy Gosthipaty and Ritwik Raha}, title = {Object Tracking with YOLOv8 and Python}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and and Susan Huot and Kseniia Kidriavsteva}, year = {2024}, url = {https://pyimg.co/hqdf0}, }
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.