Last updated on July 8, 2021.
A couple of days ago, Cameron, a PyImageSearch reader, emailed in and asked about methods to find the distance from a camera to an object/marker in an image. He had spent some time researching, but hadn’t found an implementation.
I knew exactly how Cameron felt. Years ago I was working on a small project to analyze the movement of a baseball as it left the pitcher’s hand and headed for home plate.
Using motion analysis and trajectory-based tracking I was able to find/estimate the ball location in the frame of the video. And since a baseball has a known size, I was also able to estimate the distance to home plate.
It was an interesting project to work on, although the system was not as accurate as I wanted it to be — the “motion blur” of the ball moving so fast made it hard to obtain highly accurate estimates.
My project was definitely an “outlier” situation, but in general, determining the distance from a camera to a marker is actually a very well studied problem in the computer vision/image processing space. You can find techniques that are very straightforward and succinct like the triangle similarity. And you can find methods that are complex (albeit, more accurate) using the intrinsic parameters of the camera model.
In this blog post I’ll show you how Cameron and I came up with a solution to compute the distance from our camera to a known object or marker.
Definitely give this post a read — you won’t want to miss it!
- Update July 2021: Added three new sections. The first section covers improving distance measurement with camera calibration. The second section discusses stereo vision and depth cameras to measure distance. And the final section briefly mentions how LiDAR can be used with camera data to provide highly accurate distance measurements.
Looking for the source code to this post?
Jump Right To The Downloads SectionTriangle Similarity for Object/Marker to Camera Distance
In order to determine the distance from our camera to a known object or marker, we are going to utilize triangle similarity.
The triangle similarity goes something like this: Let’s say we have a marker or object with a known width W. We then place this marker some distance D from our camera. We take a picture of our object using our camera and then measure the apparent width in pixels P. This allows us to derive the perceived focal length F of our camera:
F = (P x D) / W
For example, let’s say I place a standard piece of 8.5 x 11in piece of paper (horizontally; W = 11) D = 24 inches in front of my camera and take a photo. When I measure the width of the piece of paper in the image, I notice that the perceived width of the paper is P = 248 pixels.
My focal length F is then:
F = (248px x 24in) / 11in = 543.45
As I continue to move my camera both closer and farther away from the object/marker, I can apply the triangle similarity to determine the distance of the object to the camera:
D’ = (W x F) / P
Again, to make this more concrete, let’s say I move my camera 3 ft (or 36 inches) away from my marker and take a photo of the same piece of paper. Through automatic image processing I am able to determine that the perceived width of the piece of paper is now 170 pixels. Plugging this into the equation we now get:
D’ = (11in x 543.45) / 170 = 35in
Or roughly 36 inches, which is 3 feet.
Note: When I captured the photos for this example my tape measure had a bit of slack in it and thus the results are off by roughly 1 inch. Furthermore, I also captured the photos hastily and not 100% on top of the feet markers on the tape measure, which added to the 1 inch error. That all said, the triangle similarity still holds and you can use this method to compute the distance from an object or marker to your camera quite easily.
Make sense now?
Awesome. Let’s move into some code to see how finding the distance from your camera to an object or marker is done using Python, OpenCV, and image processing and computer vision techniques.
Finding the distance from your camera to object/marker using Python and OpenCV
Let’s go ahead and get this project started. Open up a new file, name it distance_to_camera.py
, and we’ll get to work:
# import the necessary packages from imutils import paths import numpy as np import imutils import cv2 def find_marker(image): # convert the image to grayscale, blur it, and detect edges gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (5, 5), 0) edged = cv2.Canny(gray, 35, 125) # find the contours in the edged image and keep the largest one; # we'll assume that this is our piece of paper in the image cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) c = max(cnts, key = cv2.contourArea) # compute the bounding box of the of the paper region and return it return cv2.minAreaRect(c)
The first thing we’ll do is import our necessary packages (Lines 2-5). We’ll use paths
from imutils
to load the available images in a directory. We’ll use NumPy for numerical processing and cv2
for our OpenCV bindings.
From there we define our find_marker
function. This function accepts a single argument, image
, and is meant to be utilized to find the object we want to compute the distance to.
In this case we are using a standard piece of 8.5 x 11 inch piece of paper as our marker.
Our first task is to now find this piece of paper in the image.
To do this, we’ll convert the image to grayscale, blur it slightly to remove high frequency noise, and apply edge detection on Lines 9-11.
After applying these steps our image should look something like this:
As you can see, the edges of our marker (the piece of paper) have clearly been revealed. Now all we need to do is find the contour (i.e. outline) that represents the piece of paper.
We find our marker on Lines 15 and 16 by using the cv2.findContours
function (taking care to handle different OpenCV versions) and then determining the contour with the largest area on Line 17.
We are making the assumption that the contour with the largest area is our piece of paper. This assumption works for this particular example, but in reality finding the marker in an image is highly application specific.
In our example, simple edge detection and finding the largest contour works well. We could also make this example more robust by applying contour approximation, discarding any contours that do not have 4 points (since a piece of paper is a rectangle and thus has 4 points), and then finding the largest 4-point contour.
Note: More on this methodology can be found in this post on building a kick-ass mobile document scanner.
Other alternatives to finding markers in images is to utilize color, such that the color of the marker is substantially different from the rest of the scene in the image. You could also apply methods like keypoint detection, local invariant descriptors, and keypoint matching to find markers; however, these approaches are outside the scope of this article and are again, highly application specific.
Anyway, now that we have the contour that corresponds to our marker, we return the bounding box which contains the (x, y)-coordinates and width and height of the box (in pixels) to the calling function on Line 20.
Let’s also quickly to define a function that computes the distance to an object using the triangle similarity detailed above:
def distance_to_camera(knownWidth, focalLength, perWidth): # compute and return the distance from the maker to the camera return (knownWidth * focalLength) / perWidth
This function takes a knownWidth
of the marker, a computed focalLength
, and perceived width of an object in an image (measured in pixels), and applies the triangle similarity detailed above to compute the actual distance to the object.
To see how we utilize these functions, continue reading:
# initialize the known distance from the camera to the object, which # in this case is 24 inches KNOWN_DISTANCE = 24.0 # initialize the known object width, which in this case, the piece of # paper is 12 inches wide KNOWN_WIDTH = 11.0 # load the first image that contains an object that is KNOWN TO BE 2 feet # from our camera, then find the paper marker in the image, and initialize # the focal length image = cv2.imread("images/2ft.png") marker = find_marker(image) focalLength = (marker[1][0] * KNOWN_DISTANCE) / KNOWN_WIDTH
The first step to finding the distance to an object or marker in an image is to calibrate and compute the focal length. To do this, we need to know:
- The distance of the camera from an object.
- The width (in units such as inches, meters, etc.) of this object. Note: The height could also be utilized, but this example simply uses the width.
Let’s also take a second and mention that what we are doing is not true camera calibration. True camera calibration involves the intrinsic parameters of the camera, which you can read more on here.
On Line 28 we initialize our known KNOWN_DISTANCE
from the camera to our object to be 24 inches. And on Line 32 we initialize the KNOWN_WIDTH
of the object to be 11 inches (i.e. a standard 8.5 x 11 inch piece of paper laid out horizontally).
The next step is important: it’s our simple calibration step.
We load the first image off disk on Line 37 — we’ll be using this image as our calibration image.
Once the image is loaded, we find the piece of paper in the image on Line 38, and then compute our focalLength
on Line 39 using the triangle similarity.
Now that we have “calibrated” our system and have the focalLength
, we can compute the distance from our camera to our marker in subsequent images quite easily.
Let’s see how this is done:
# loop over the images for imagePath in sorted(paths.list_images("images")): # load the image, find the marker in the image, then compute the # distance to the marker from the camera image = cv2.imread(imagePath) marker = find_marker(image) inches = distance_to_camera(KNOWN_WIDTH, focalLength, marker[1][0]) # draw a bounding box around the image and display it box = cv2.cv.BoxPoints(marker) if imutils.is_cv2() else cv2.boxPoints(marker) box = np.int0(box) cv2.drawContours(image, [box], -1, (0, 255, 0), 2) cv2.putText(image, "%.2fft" % (inches / 12), (image.shape[1] - 200, image.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 255, 0), 3) cv2.imshow("image", image) cv2.waitKey(0)
We start looping over our image paths on Line 42.
Then, for each image in the list, we load the image off disk on Line 45, find the marker in the image on Line 46, and then compute the distance of the object to the camera on Line 47.
From there, we simply draw the bounding box around our marker and display the distance on Lines 50-57 (the boxPoints
are calculated on Line 50 taking care to handle different OpenCV versions).
Results
To see our script in action, open up a terminal, navigate to your code directory, and execute the following command:
$ python distance_to_camera.py
If all goes well you should first see the results of 2ft.png
, which is the image we use to “calibrate” our system and compute our initial focalLength
:
From the above image we can see that our focal length is properly determined and the distance to the piece of paper is 2 feet, per the KNOWN_DISTANCE
and KNOWN_WIDTH
variables in the code.
Now that we have our focal length, we can compute the distance to our marker in subsequent images:
In the above example, our camera is now approximately 3 feet from the marker.
Let’s try moving back another foot:
Again, it’s important to note that when I captured the photos for this example I did so hastily and left too much slack in the tape measure. Furthermore, I also did not ensure my camera was 100% lined up on the foot markers, so again, there is roughly 1 inch error in these examples.
That all said, the triangle similarity approach detailed in this article will still work and allow you to find the distance from an object or marker in an image to your camera.
Improving distance measurement with camera calibration
In order to perform distance measurement, we first need to calibrate our system. In this post, we used a simple “pixels per metric” technique.
However, better accuracy can be obtained by performing a proper camera calibration by computing the extrinsic and intrinsic parameters:
- Extrinsic parameters are rotation and translation matrices used to convert something from the world frame to the camera frame
- Intrinsic parameters are the internal camera parameters, such as the focal length, to convert that information into a pixel
The most common way is to perform a checkerboard camera calibration using OpenCV. Doing so will remove radial distortion and tangential distortion, both of which impact the output image, and therefore the output measurement of objects in the image.
Here are some resources to help you get started with camera calibration:
- Understanding Lens Distortion
- Camera Calibration using OpenCV
- Camera Calibration (official OpenCV documentation)
Stereo vision and depth cameras for distance measurement
As humans, we take having two eyes for granted. Even if we lost an eye in an accident we could still see and perceive the world around us.
However, with only one eye we lose important information — depth.
Depth perception gives us the perceived distance from where we stand to the object in front of us. In order to perceive depth, we need two eyes.
Most cameras on the other hand only have one eye, which is why it’s so challenging to obtain depth information using a standard 2D camera (although you can use deep learning models to attempt to “learn depth” from a 2D image).
You can create a stereo vision system capable of computing depth information using two cameras, such as USB webcams:
- Introduction to Epipolar Geometry and Stereo Vision
- Making A Low-Cost Stereo Camera Using OpenCV
- Depth perception using stereo camera (Python/C++)
Or, you could use specific hardware that has depth cameras built in, such as OpenCV’s AI Kit (OAK-D).
LiDAR for depth information
For more accurate depth information you should consider using LiDAR. LiDAR uses light sensors to measure the distance between the sensor and any object(s) in front of it.
LiDAR is especially popular in self-driving cars where a camera is simply not enough.
We cover how to use LiDAR for self-driving car applications inside PyImageSearch University.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this blog post we learned how to determine the distance from a known object in an image to our camera.
To accomplish this task we utilized the triangle similarity, which requires us to know two important parameters prior to applying our algorithm:
- The width (or height) in some distance measure, such as inches or meters, of the object we are using as a marker.
- The distance (in inches or meters) of the camera to the marker in step 1.
Computer vision and image processing algorithms can then be used to automatically determine the perceived width/height of the object in pixels and complete the triangle similarity and give us our focal length.
Then, in subsequent images we simply need to find our marker/object and utilize the computed focal length to determine the distance to the object from the camera.
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!