Today, you will learn how to use OpenCV Selective Search for object detection.
Today’s tutorial is Part 2 in our 4-part series on deep learning and object detection:
- Part 1: Turning any deep learning image classifier into an object detector with Keras and TensorFlow
- Part 2: OpenCV Selective Search for Object Detection (today’s tutorial)
- Part 3: Region proposal for object detection with OpenCV, Keras, and TensorFlow (next week’s tutorial)
- Part 4: R-CNN object detection with Keras and TensorFlow (publishing in two weeks)
Selective Search, first introduced by Uijlings et al. in their 2012 paper, Selective Search for Object Recognition, is a critical piece of computer vision, deep learning, and object detection research.
In their work, Uijlings et al. demonstrated:
- How images can be over-segmented to automatically identify locations in an image that could contain an object
- That Selective Search is far more computationally efficient than exhaustively computing image pyramids and sliding windows (and without loss of accuracy)
- And that Selective Search can be swapped in for any object detection framework that utilizes image pyramids and sliding windows
Automatic region proposal algorithms such as Selective Search paved the way for Girshick et al.’s seminal R-CNN paper, which gave rise to highly accurate deep learning-based object detectors.
Furthermore, research with Selective Search and object detection has allowed researchers to create state-of-the-art Region Proposal Network (RPN) components that are even more accurate and more efficient than Selective Search (see Girshick et al.’s follow-up 2015 paper on Faster R-CNNs).
But before we can get into RPNs, we first need to understand how Selective Search works, including how we can leverage Selective Search for object detection with OpenCV.
To learn how to use OpenCV’s Selective Search for object detection, just keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionOpenCV Selective Search for Object Detection
In the first part of this tutorial, we’ll discuss the concept of region proposals via Selective Search and how they can efficiently replace the traditional method of using image pyramids and sliding windows to detect objects in an image.
From there, we’ll review the Selective Search algorithm in detail, including how it over-segments an image via:
- Color similarity
- Texture similarity
- Size similarity
- Shape similarity
- A final meta-similarity, which is a linear combination of the above similarity measures
I’ll then show you how to implement Selective Search using OpenCV.
Region proposals versus sliding windows and image pyramids
In last week’s tutorial, you learned how to turn any image classifier into an object detector by applying image pyramids and sliding windows.
As a refresher, image pyramids create a multi-scale representation of an input image, allowing us to detect objects at multiple scales/sizes:
Sliding windows operate on each layer of the image pyramid, sliding from left-to-right and top-to-bottom, thereby allowing us to localize where in an image a given object is:
There are a number of problems with the image pyramid and sliding window approach, but the two primary ones are:
- It’s painfully slow. Even with an optimized-for-loops approach and multiprocessing, looping over each image pyramid layer and inspecting every location in the image via sliding windows is computationally expensive.
- They are sensitive to their parameter choices. Different values of your image pyramid scale and sliding window size can lead to dramatically different results in terms of positive detection rate, false-positive detections, and missing detections altogether.
Given these reasons, computer vision researchers have looked into creating automatic region proposal generators that replace sliding windows and image pyramids.
The general idea is that a region proposal algorithm should inspect the image and attempt to find regions of an image that likely contain an object (think of region proposal as a cousin to saliency detection).
The region proposal algorithm should:
- Be faster and more efficient than sliding windows and image pyramids
- Accurately detect the regions of an image that could contain an object
- Pass these “candidate proposals” to a downstream classifier to actually label the regions, thus completing the object detection framework
The question is, what types of region proposal algorithms can we use for object detection?
What is Selective Search and how can Selective Search be used for object detection?
The Selective Search algorithm implemented in OpenCV was first introduced by Uijlings et al. in their 2012 paper, Selective Search for Object Recognition.
Selective Search works by over-segmenting an image using a superpixel algorithm (instead of SLIC, Uijlings et al. use the Felzenszwalb method from Felzenszwalb and Huttenlocher’s 2004 paper, Efficient graph-based image segmentation).
An example of running the Felzenszwalb superpixel algorithm can be seen below:
From there, Selective Search seeks to merge together the superpixels to find regions of an image that could contain an object.
Selective Search merges superpixels in a hierarchical fashion based on five key similarity measures:
- Color similarity: Computing a 25-bin histogram for each channel of an image, concatenating them together, and obtaining a final descriptor that is 25×3=75-d. Color similarity of any two regions is measured by the histogram intersection distance.
- Texture similarity: For texture, Selective Search extracts Gaussian derivatives at 8 orientations per channel (assuming a 3-channel image). These orientations are used to compute a 10-bin histogram per channel, generating a final texture descriptor that is 8x10x=240-d. To compute texture similarity between any two regions, histogram intersection is once again used.
- Size similarity: The size similarity metric that Selective Search uses prefers that smaller regions be grouped earlier rather than later. Anyone who has used Hierarchical Agglomerative Clustering (HAC) algorithms before knows that HACs are prone to clusters reaching a critical mass and then combining everything that they touch. By enforcing smaller regions to merge earlier, we can help prevent a large number of clusters from swallowing up all smaller regions.
- Shape similarity/compatibility: The idea behind shape similarity in Selective Search is that they should be compatible with each other. Two regions are considered “compatible” if they “fit” into each other (thereby filling gaps in our regional proposal generation). Furthermore, shapes that do not touch should not be merged.
- A final meta-similarity measure: A final meta-similarity acts as a linear combination of the color similarity, texture similarity, size similarity, and shape similarity/compatibility.
The results of Selective Search applying these hierarchical similarity measures can be seen in the following figure:
On the bottom layer of the pyramid, we can see the original over-segmentation/superpixel generation from the Felzenszwalb method.
In the middle layer, we can see regions being joined together, eventually forming the final set of proposals (top).
If you’re interested in learning more about the underlying theory of Selective Search, I would suggest referring to the following resources:
- Efficient Graph-Based Image Segmentation (Felzenszwalb and Huttenlocher, 2004)
- Selective Search for Object Recognition (Uijlings et al., 2012)
- Selective Search for Object Detection (C++/Python) (Chandel/Mallick, 2017)
Selective Search generates regions, not class labels
A common misconception I see with Selective Search is that readers mistakenly think that Selective Search replaces entire object detection frameworks such as HOG + Linear SVM, R-CNN, etc.
In fact, a couple of weeks ago, PyImageSearch reader Hayden emailed in with that exact same question:
Hi Adrian, I am using Selective Search to detect objects with OpenCV.
However, Selective Search is just returning bounding boxes — I can’t seem to figure out how to get labels associated with these bounding boxes.
So, here’s the deal:
- Selective Search does generate regions of an image that could contain an object.
- However, Selective Search does not have any knowledge of what is in that region (think of it as a cousin to saliency detection).
- Selective Search is meant to replace the computationally expensive, highly inefficient method of exhaustively using image pyramids and sliding windows to examine locations of an image for a potential object.
- By using Selective Search, we can more efficiently examine regions of an image that likely contain an object and then pass those regions on to a SVM, CNN, etc. for final classification.
If you are using Selective Search, just keep in mind that the Selective Search algorithm will not give you class label predictions — it is assumed that your downstream classifier will do that for you (the topic of next week’s blog post).
But in the meantime, let’s learn how we can use OpenCV Selective Search in our own projects.
Project structure
Be sure to grab the .zip for this tutorial from the “Downloads” section. Once you’ve extracted the files, you may use the tree
command to see what’s inside:
$ tree . ├── dog.jpg └── selective_search.py 0 directories, 2 files
Our project is quite simple, consisting of a Python script (selective_search.py
) and a testing image (dog.jpg
).
In the next section, we’ll learn how to implement our Selective Search script with Python and OpenCV.
Implementing Selective Search with OpenCV and Python
We are now ready to implement Selective Search with OpenCV!
Open up a new file, name it selective_search.py
, and insert the following code:
# import the necessary packages import argparse import random import time import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-m", "--method", type=str, default="fast", choices=["fast", "quality"], help="selective search method") args = vars(ap.parse_args())
We begin our dive into Selective Search with a few imports, the main one being OpenCV (cv2
). The other imports are built-in to Python.
Our script handles two command line arguments:
: The path to your input image (we’ll be testing with--image
dog.jpg
today).--method
: The Selective Search algorithm to use. You have twochoices
— either"fast"
or"quality"
. In most cases, the fast method will be sufficient, so it is set as thedefault
method.
We’re now ready to load our input image and initialize our Selective Search algorithm:
# load the input image image = cv2.imread(args["image"]) # initialize OpenCV's selective search implementation and set the # input image ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation() ss.setBaseImage(image) # check to see if we are using the *fast* but *less accurate* version # of selective search if args["method"] == "fast": print("[INFO] using *fast* selective search") ss.switchToSelectiveSearchFast() # otherwise we are using the *slower* but *more accurate* version else: print("[INFO] using *quality* selective search") ss.switchToSelectiveSearchQuality()
Line 17 loads our --image
from disk.
From there, we initialize Selective Search and set our input image
(Lines 21 and 22).
Initialization of Selective search requires another step — choosing and setting the internal mode of operation. Lines 26-33 use the command line argument --method
value to determine whether we should use either:
- The
"fast"
method:switchToSelectiveSearchFast
- The
"quality"
method:switchToSelectiveSearchQuality
Generally, the faster method will be suitable; however, depending on your application, you might want to sacrifice speed to achieve better quality results (more on that later).
Let’s go ahead and perform Selective Search with our image:
# run selective search on the input image start = time.time() rects = ss.process() end = time.time() # show how along selective search took to run along with the total # number of returned region proposals print("[INFO] selective search took {:.4f} seconds".format(end - start)) print("[INFO] {} total region proposals".format(len(rects)))
To run Selective Search, we simply call the process
method on our ss
object (Line 37). We’ve set timestamps around this call, so we can get a feel for how fast the algorithm is; Line 42 reports the Selective Search benchmark to our terminal.
Subsequently, Line 43 tells us the number of region proposals the Selective Search operation found.
Now, what fun would finding our region proposals be if we weren’t going to visualize the result? Zero fun. To wrap up, let’s draw the output on our image:
# loop over the region proposals in chunks (so we can better # visualize them) for i in range(0, len(rects), 100): # clone the original image so we can draw on it output = image.copy() # loop over the current subset of region proposals for (x, y, w, h) in rects[i:i + 100]: # draw the region proposal bounding box on the image color = [random.randint(0, 255) for j in range(0, 3)] cv2.rectangle(output, (x, y), (x + w, y + h), color, 2) # show the output image cv2.imshow("Output", output) key = cv2.waitKey(0) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break
To annotate our output, we simply:
- Loop over region proposals in chunks of
100
(Selective Search will generate a few hundred to a few thousand proposals; we “chunk” them so we can better visualize them) via the nestedfor
loops established on Line 47 and Line 52 - Extract the bounding box coordinates surrounding each of our region proposals generated by Selective Search, and draw a colored rectangle for each (Lines 52-55)
- Show the result on our screen (Line 59)
- Allow the user to cycle through results (by pressing any key) until either all results are exhausted or the
q
(quit) key is pressed
In the next section, we’ll analyze results of both methods (fast and quality).
OpenCV Selective Search results
We are now ready to apply Selective Search with OpenCV to our own images.
Start by using the “Downloads” section of this blog post to download the source code and example images.
From there, open up a terminal, and execute the following command:
$ python selective_search.py --image dog.jpg [INFO] using *fast* selective search [INFO] selective search took 1.0828 seconds [INFO] 1219 total region proposals
Here, you can see that OpenCV’s Selective Search “fast mode” took ~1 second to run and generated 1,219 bounding boxes — the visualization in Figure 4 shows us looping over each of the regions generated by Selective Search and visualizing them to our screen.
If you’re confused by this visualization, consider the end goal of Selective Search: to replace traditional computer vision object detection techniques such as sliding windows and image pyramids with a more efficient region proposal generation method.
Thus, Selective Search will not tell you what is in the ROI, but it tells you that the ROI is “interesting enough” to passed on to a downstream classifier (ex., SVM, CNN, etc.) for final classification.
Let’s apply Selective Search to the same image, but this time, use the --method quality
mode:
$ python selective_search.py --image dog.jpg --method quality [INFO] using *quality* selective search [INFO] selective search took 3.7614 seconds [INFO] 4712 total region proposals
The “quality” Selective Search method generated 286% more region proposals but also took 247% longer to run.
Whether or not you should use the “fast” or “quality” mode is dependent on your application.
In most cases, the “fast” Selective Search is sufficient, but you may choose to use the “quality” mode:
- When performing inference and wanting to ensure you generate more quality regions to your downstream classifier (of course, this means that real-time detection is not a concern)
- When using Selective Search to generate training data, thereby ensuring you generate more positive and negative regions for your classifier to learn from
Where can I learn more about OpenCV’s Selective Search for object detection?
In next week’s tutorial, you’ll learn how to:
- Use Selective Search to generate object detection proposal regions
- Take a pre-trained CNN and classify each of the regions (discarding any low confidence/background regions)
- Apply non-maxima suppression to return our final object detections
And in two weeks, we’ll use Selective Search to generate training data and then fine-tune a CNN to perform object detection via region proposal.
This has been a great series of tutorials so far, and you don’t want to miss the next two!
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial, you learned how to perform Selective Search to generate object detection proposal regions with OpenCV.
Selective Search works by over-segmenting an image by combining regions based on five key components:
- Color similarity
- Texture similarity
- Size similarity
- Shape similarity
- And a final similarity measure, which is a linear combination of the above four similarity measures
It’s important to note that Selective Search itself does not perform object detection.
Instead, Selective Search returns proposal regions that could contain an object.
The idea here is that we replace our computationally expensive, highly inefficient sliding windows and image pyramids with a less expensive, more efficient Selective Search.
Next week, I’ll show you how to take the proposal regions generated by Selective Search and then run an image classifier on top of them, allowing you to create an ad hoc deep learning-based object detector!
Stay tuned for next week’s tutorial.
To download the source code to this post (and be notified when the next tutorial in this series publishes), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.