Last updated on July 4, 2021.
4:18am. Alarm blaring. Still dark outside. The bed is warm. And the floor will feel so cold on my bare feet.
But I got out of bed. I braved the morning, and I took the ice cold floor on my feet like a champ.
Because I’m excited.
Excited to share something very special with you today…
You see, over the past few weeks I’ve gotten some really great emails from fellow PyImageSearch readers. These emails were short, sweet, and to the point. They were simple “thank you’s” for posting actual, honest-to-goodness Python and OpenCV code that you could take and use to solve your own computer vision and image processing problems.
And upon reflection last night, I realized that I’m not doing a good enough job sharing the libraries, packages, and code that I have developed for myself for everyday use — so that’s exactly what I’m going to do today.
In this blog post I’m going to show you the functions in my
transform.py module. I use these functions whenever I need to do a 4 point
cv2.getPerspectiveTransform using OpenCV.
And I think you’ll find the code in here quite interesting … and you’ll even be able to utilize it in your own projects.
So read on. And checkout my 4 point OpenCV
- Update July 2021: Added two new sections. The first covers how to automatically find the top-left, top-right, bottom-right, and bottom-left coordinates for a perspective transform. The second section discusses how to improve perspective transform results by taking into account the aspect ratio of the input ROI.
OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.
4 Point OpenCV getPerspectiveTransform Example
You may remember back to my posts on building a real-life Pokedex, specifically, my post on OpenCV and Perspective Warping.
In that post I mentioned how you could use a perspective transform to obtain a top-down, “birds eye view” of an image — provided that you could find reference points, of course.
This post will continue the discussion on the top-down, “birds eye view” of an image. But this time I’m going to share with you personal code that I use every single time I need to do a 4 point perspective transform.
So let’s not waste any more time. Open up a new file, name it
transform.py, and let’s get started.
# import the necessary packages import numpy as np import cv2 def order_points(pts): # initialzie a list of coordinates that will be ordered # such that the first entry in the list is the top-left, # the second entry is the top-right, the third is the # bottom-right, and the fourth is the bottom-left rect = np.zeros((4, 2), dtype = "float32") # the top-left point will have the smallest sum, whereas # the bottom-right point will have the largest sum s = pts.sum(axis = 1) rect = pts[np.argmin(s)] rect = pts[np.argmax(s)] # now, compute the difference between the points, the # top-right point will have the smallest difference, # whereas the bottom-left will have the largest difference diff = np.diff(pts, axis = 1) rect = pts[np.argmin(diff)] rect = pts[np.argmax(diff)] # return the ordered coordinates return rect
We’ll start off by importing the packages we’ll need: NumPy for numerical processing and
cv2 for our OpenCV bindings.
Next up, let’s define the
order_points function on Line 5. This function takes a single argument,
pts , which is a list of four points specifying the (x, y) coordinates of each point of the rectangle.
It is absolutely crucial that we have a consistent ordering of the points in the rectangle. The actual ordering itself can be arbitrary, as long as it is consistent throughout the implementation.
Personally, I like to specify my points in top-left, top-right, bottom-right, and bottom-left order.
We’ll start by allocating memory for the four ordered points on Line 10.
Then, we’ll find the top-left point, which will have the smallest x + y sum, and the bottom-right point, which will have the largest x + y sum. This is handled on Lines 14-16.
Of course, now we’ll have to find the top-right and bottom-left points. Here we’ll take the difference (i.e. x – y) between the points using the
np.diff function on Line 21.
The coordinates associated with the smallest difference will be the top-right points, whereas the coordinates with the largest difference will be the bottom-left points (Lines 22 and 23).
Finally, we return our ordered functions to the calling function on Line 26.
Again, I can’t stress again how important it is to maintain a consistent ordering of points.
And you’ll see exactly why in this next function:
def four_point_transform(image, pts): # obtain a consistent order of the points and unpack them # individually rect = order_points(pts) (tl, tr, br, bl) = rect # compute the width of the new image, which will be the # maximum distance between bottom-right and bottom-left # x-coordiates or the top-right and top-left x-coordinates widthA = np.sqrt(((br - bl) ** 2) + ((br - bl) ** 2)) widthB = np.sqrt(((tr - tl) ** 2) + ((tr - tl) ** 2)) maxWidth = max(int(widthA), int(widthB)) # compute the height of the new image, which will be the # maximum distance between the top-right and bottom-right # y-coordinates or the top-left and bottom-left y-coordinates heightA = np.sqrt(((tr - br) ** 2) + ((tr - br) ** 2)) heightB = np.sqrt(((tl - bl) ** 2) + ((tl - bl) ** 2)) maxHeight = max(int(heightA), int(heightB)) # now that we have the dimensions of the new image, construct # the set of destination points to obtain a "birds eye view", # (i.e. top-down view) of the image, again specifying points # in the top-left, top-right, bottom-right, and bottom-left # order dst = np.array([ [0, 0], [maxWidth - 1, 0], [maxWidth - 1, maxHeight - 1], [0, maxHeight - 1]], dtype = "float32") # compute the perspective transform matrix and then apply it M = cv2.getPerspectiveTransform(rect, dst) warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight)) # return the warped image return warped
We start off by defining the
four_point_transform function on Line 28, which requires two arguments:
image variable is the image we want to apply the perspective transform to. And the
pts list is the list of four points that contain the ROI of the image we want to transform.
We make a call to our
order_points function on Line 31, which places our
pts variable in a consistent order. We then unpack these coordinates on Line 32 for convenience.
Now we need to determine the dimensions of our new warped image.
We determine the width of the new image on Lines 37-39, where the width is the largest distance between the bottom-right and bottom-left x-coordinates or the top-right and top-left x-coordinates.
In a similar fashion, we determine the height of the new image on Lines 44-46, where the height is the maximum distance between the top-right and bottom-right y-coordinates or the top-left and bottom-left y-coordinates.
Note: Big thanks to Tom Lowell who emailed in and made sure I fixed the width and height calculation!
So here’s the part where you really need to pay attention.
Remember how I said that we are trying to obtain a top-down, “birds eye view” of the ROI in the original image? And remember how I said that a consistent ordering of the four points representing the ROI is crucial?
On Lines 53-57 you can see why. Here, we define 4 points representing our “top-down” view of the image. The first entry in the list is
(0, 0) indicating the top-left corner. The second entry is
(maxWidth - 1, 0) which corresponds to the top-right corner. Then we have
(maxWidth - 1, maxHeight - 1) which is the bottom-right corner. Finally, we have
(0, maxHeight - 1) which is the bottom-left corner.
The takeaway here is that these points are defined in a consistent ordering representation — and will allow us to obtain the top-down view of the image.
To actually obtain the top-down, “birds eye view” of the image we’ll utilize the
cv2.getPerspectiveTransform function on Line 60. This function requires two arguments,
rect , which is the list of 4 ROI points in the original image, and
dst , which is our list of transformed points. The
cv2.getPerspectiveTransform function returns
M , which is the actual transformation matrix.
We apply the transformation matrix on Line 61 using the
cv2.warpPerspective function. We pass in the
image , our transform matrix
M , along with the width and height of our output image.
The output of
cv2.warpPerspective is our
warped image, which is our top-down view.
We return this top-down view on Line 64 to the calling function.
Now that we have code to perform the transformation, we need some code to drive it and actually apply it to images.
Open up a new file, call
transform_example.py , and let’s finish this up:
# import the necessary packages from pyimagesearch.transform import four_point_transform import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", help = "path to the image file") ap.add_argument("-c", "--coords", help = "comma seperated list of source points") args = vars(ap.parse_args()) # load the image and grab the source coordinates (i.e. the list of # of (x, y) points) # NOTE: using the 'eval' function is bad form, but for this example # let's just roll with it -- in future posts I'll show you how to # automatically determine the coordinates without pre-supplying them image = cv2.imread(args["image"]) pts = np.array(eval(args["coords"]), dtype = "float32") # apply the four point tranform to obtain a "birds eye view" of # the image warped = four_point_transform(image, pts) # show the original and warped images cv2.imshow("Original", image) cv2.imshow("Warped", warped) cv2.waitKey(0)
The first thing we’ll do is import our
four_point_transform function on Line 2. I decided put it in the
pyimagesearch sub-module for organizational purposes.
We’ll then use NumPy for the array functionality,
argparse for parsing command line arguments, and
cv2 for OpenCV bindings.
We parse our command line arguments on Lines 8-12. We’ll use two switches,
--image , which is the image that we want to apply the transform to, and
--coords , which is the list of 4 points representing the region of the image we want to obtain a top-down, “birds eye view” of.
We then load the image on Line 19 and convert the points to a NumPy array on Line 20.
Now before you get all upset at me for using the
eval function, please remember, this is just an example. I don’t condone performing a perspective transform this way.
And, as you’ll see in next week’s post, I’ll show you how to automatically determine the four points needed for the perspective transform — no manual work on your part!
Next, we can apply our perspective transform on Line 24.
Finally, let’s display the original image and the warped, top-down view of the image on Lines 27-29.
Obtaining a Top-Down View of the Image
Alright, let’s see this code in action.
Open up a shell and execute the following command:
$ python transform_example.py --image images/example_01.png --coords "[(73, 239), (356, 117), (475, 265), (187, 443)]"
You should see a top-down view of the notecard, similar to below:
Let’s try another image:
$ python transform_example.py --image images/example_02.png --coords "[(101, 185), (393, 151), (479, 323), (187, 441)]"
And a third for good measure:
$ python transform_example.py --image images/example_03.png --coords "[(63, 242), (291, 110), (361, 252), (78, 386)]"
As you can see, we have successfully obtained a top-down, “birds eye view” of the notecard!
In some cases the notecard looks a little warped — this is because the angle the photo was taken at is quite severe. The closer we come to the 90-degree angle of “looking down” on the notecard, the better the results will be.
Automatically finding the corners for the transform
In order to obtain our top-down transform of our input image we had to manually supply/hardcode the input top-left, top-right, bottom-right, and bottom-left coordinates.
That raises the question:
Is there a way to automatically obtain these coordinates?
You bet there is. The following three tutorials show you how to do exactly that:
- Building a document scanner with OpenCV
- Bubble sheet multiple choice scanner and test grader using OMR, Python, and OpenCV
- OpenCV Sudoku Solver and OCR
Improving your top-down transform results by computing the aspect ratio
The aspect ratio of an image is defined as the ratio of the width to the height. When resizing an image or performing a perspective transform, it’s important to consider the aspect ratio of the image.
For example, if you’ve ever seen image that looks “squished” or “crunched” it’s because the aspect ratio is off:
On the left, we have our original image. And on the right, we have two images that have been distorted by not preserving the aspect ratio. They have been resized by ignoring the ratio of the width to the height of the image.
To obtain better, more aesthetically pleasing perspective transforms, you should consider taking into account the aspect ratio of the input image/ROI. This thread on StackOverflow will show you how to do that.
What's next? We recommend PyImageSearch University.
82 total classes • 109+ hours of on-demand code walkthrough videos • Last updated: November 2023
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 82 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 82 Certificates of Completion
- ✓ 109+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 524+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
In this blog post I provided an OpenCV
cv2.getPerspectiveTransform example using Python.
I even shared code from my personal library on how to do it!
But the fun doesn’t stop here.
You know those iPhone and Android “scanner” apps that let you snap a photo of a document and then have it “scanned” into your phone?
That’s right — I’ll show you how to use the 4 point OpenCV getPerspectiveTransform example code to build one of those document scanner apps!
I’m definitely excited about it, I hope you are too.
Anyway, be sure to signup for the PyImageSearch Newsletter to hear when the post goes live!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!