Building a document scanner with OpenCV can be accomplished in just three simple steps:

Step 1: Detect edges.
Step 2: Use the edges in the image to find the contour (outline) representing the piece of paper being scanned.
Step 3: Apply a perspective transform to obtain the top-down view of the document.

Really. That’s it.

Only three steps and you’re on your way to submitting your own document scanning app to the App Store.

Sound interesting?

Read on. And unlock the secrets to build a mobile scanner app of your own.

Looking for the source code to this post?

OpenCV and Python versions:
This example will run on Python 2.7/3+ and OpenCV 2.4/3+

How To Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Last week I gave you a special treat — my very own transform.py module that I use in all my computer vision and image processing projects. You can read more about this module here.

Whenever you need to perform a 4 point perspective transform, you should be using this module.

And you guessed it, we’ll be using it to build our very own document scanner.

So let’s get down to business.

Open up your favorite Python IDE, (I like Sublime Text 2), create a new file, name it scan.py , and let’s get started.

# import the necessary packages
from pyimagesearch.transform import four_point_transform
from skimage.filters import threshold_local
import numpy as np
import argparse
import cv2
import imutils

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required = True,
	help = "Path to the image to be scanned")
args = vars(ap.parse_args())

Lines 2-7 handle importing the necessary Python packages that we’ll need.

We’ll start by importing our four_point_transform function which I discussed last week.

We’ll also be using the imutils module, which contains convenience functions for resizing, rotating, and cropping images. You can read more about imutils in my this post. To install imutils , simply:

$ pip install --upgrade imutils

Next up, let’s import the threshold_local function from scikit-image. This function will help us obtain the “black and white” feel to our scanned image.

Note (15 January 2018): The threshold_adaptive function has been deprecated. This post has been updated to make use of threshold_local .

Lastly, we’ll use NumPy for numerical processing, argparse for parsing command line arguments, and cv2 for our OpenCV bindings.

Lines 10-13 handle parsing our command line arguments. We’ll need only a single switch image, --image , which is the path to the image that contains the document we want to scan.

Now that we have the path to our image, we can move on to Step 1: Edge Detection.

Step 1: Edge Detection

The first step to building our document scanner app using OpenCV is to perform edge detection. Let’s take a look:

# load the image and compute the ratio of the old height
# to the new height, clone it, and resize it
image = cv2.imread(args["image"])
ratio = image.shape[0] / 500.0
orig = image.copy()
image = imutils.resize(image, height = 500)

# convert the image to grayscale, blur it, and find edges
# in the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 75, 200)

# show the original image and the edge detected image
print("STEP 1: Edge Detection")
cv2.imshow("Image", image)
cv2.imshow("Edged", edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

First, we load our image off disk on Line 17.

In order to speedup image processing, as well as make our edge detection step more accurate, we resize our scanned image to have a height of 500 pixels on Lines 17-20.

We also take special care to keep track of the ratio of the original height of the image to the new height (Line 18) — this will allow us to perform the scan on the original image rather than the resized image.

From there, we convert the image from RGB to grayscale on Line 24, perform Gaussian blurring to remove high frequency noise (aiding in contour detection in Step 2), and perform Canny edge detection on Line 26.

The output of Step 1 is then shown on Lines 30 and 31.

Take a look below at the example document:

Figure 1: The first step of building a document scanning app. On the left we have the original image and on the right we have the edges detected in the image. — **Figure 1:** The first step of building a document scanning app. On the *left* we have the original image and on the *right* we have the edges detected in the image.

On the left you can see my receipt from Whole Foods. Notice how the picture is captured at an angle. It is definitely not a 90-degree, top-down view of the page. Furthermore, there is also my desk in the image. Certainly this is not a “scan” of any means. We have our work cut out for us.

However, on the right you can see the image after performing edge detection. We can clearly see the outline of the receipt.

Not a bad start.

Let’s move on to Step 2.

Step 2: Finding Contours

Contour detection doesn’t have to be hard.

In fact, when building a document scanner, you actually have a serious advantage…

Take a second to consider what we’re actually building.

A document scanner simply scans in a piece of paper.

A piece of paper is assumed to be a rectangle.

And a rectangle has four edges.

Therefore, we can create a simple heuristic to help us build our document scanner.

The heuristic goes something like this: we’ll assume that the largest contour in the image with exactly four points is our piece of paper to be scanned.

This is also a reasonably safe assumption — the scanner app simply assumes that the document you want to scan is the main focus of our image. And it’s also safe to assume (or at least should be) that the piece of paper has four edges.

And that’s exactly what the code below does:

# find the contours in the edged image, keeping only the
# largest ones, and initialize the screen contour
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]

# loop over the contours
for c in cnts:
	# approximate the contour
	peri = cv2.arcLength(c, True)
	approx = cv2.approxPolyDP(c, 0.02 * peri, True)

	# if our approximated contour has four points, then we
	# can assume that we have found our screen
	if len(approx) == 4:
		screenCnt = approx
		break

# show the contour (outline) of the piece of paper
print("STEP 2: Find contours of paper")
cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)
cv2.imshow("Outline", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

We start off by finding the contours in our edged image on Line 37. We also handle the fact that OpenCV 2.4, OpenCV 3, and OpenCV 4 return contours differently on Line 38.

A neat performance hack that I like to do is actually sort the contours by area and keep only the largest ones (Line 39). This allows us to only examine the largest of the contours, discarding the rest.

We then start looping over the contours on Line 42 and approximate the number of points on Line 44 and 45.

If the approximated contour has four points (Line 49), we assume that we have found the document in the image.

And again, this is a fairly safe assumption. The scanner app will assume that (1) the document to be scanned is the main focus of the image and (2) the document is rectangular, and thus will have four distinct edges.

From there, Lines 55 and 56 display the contours of the document we went to scan.

And now let’s take a look at our example image:

Figure 2: The second step of building a document scanning app is to utilize the edges in the image to find the contours of the piece of paper. — **Figure 2:** The second step of building a document scanning app is to utilize the edges in the image to find the contours of the piece of paper.

As you can see, we have successfully utilized the edge detected image to find the contour (outline) of the document, illustrated by the green rectangle surrounding my receipt.

Lastly, let’s move on to Step 3, which will be a snap using my four_point_transform function.

Step 3: Apply a Perspective Transform & Threshold

The last step in building a mobile document scanner is to take the four points representing the outline of the document and apply a perspective transform to obtain a top-down, “birds eye view” of the image.

Let’s take a look:

# apply the four point transform to obtain a top-down
# view of the original image
warped = four_point_transform(orig, screenCnt.reshape(4, 2) * ratio)

# convert the warped image to grayscale, then threshold it
# to give it that 'black and white' paper effect
warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
T = threshold_local(warped, 11, offset = 10, method = "gaussian")
warped = (warped > T).astype("uint8") * 255

# show the original and scanned images
print("STEP 3: Apply perspective transform")
cv2.imshow("Original", imutils.resize(orig, height = 650))
cv2.imshow("Scanned", imutils.resize(warped, height = 650))
cv2.waitKey(0)

Line 62 performs the warping transformation. In fact, all the heavy lifting is handled by the four_point_transform function. Again, you can read more about this function in last week’s post.

We’ll pass two arguments into four_point_transform : the first is our original image we loaded off disk (not the resized one), and the second argument is the contour representing the document, multiplied by the resized ratio.

So, you may be wondering, why are we multiplying by the resized ratio?

We multiply by the resized ratio because we performed edge detection and found contours on the resized image of height=500 pixels.

However, we want to perform the scan on the original image, not the resized image, thus we multiply the contour points by the resized ratio.

To obtain the black and white feel to the image, we then take the warped image, convert it to grayscale and apply adaptive thresholding on Lines 66-68.

Finally, we display our output on Lines 72-74.

Python + OpenCV document scanning results

And speaking of output, take a look at our example document by running the script:

$ python scan.py --image images/receipt.jpg

Figure 3: Applying step 3 of our document scanner, perspective transform. The original image is on the left and the scanned image on the right. — **Figure 3:** Applying step 3 of our document scanner, perspective transform. The original image is on the *left* and the scanned image on the *right*.

On the left we have the original image we loaded off disk. And on the right, we have the scanned image!

Notice how the perspective of the scanned image has changed — we have a top-down, 90-degree view of the image.

And thanks to our adaptive thresholding, we also have a nice, clean black and white feel to the document as well.

We have successfully built our document scanner!

All in less than 5 minutes and under 75 lines of code (most of which are comments anyway).

More Examples

The receipt example was all well and good.

But will this approach work for normal pieces of paper?

You bet!

I printed out page 22 of Practical Python and OpenCV, a book I wrote to give you a guaranteed quick-start guide to learning computer vision:

$ python scan.py --image images/page.jpg

Figure 4: Applying edge detection to scan a document using computer vision. — **Figure 4:** Applying edge detection to scan a document using computer vision.

You can see the original image on the left and the edge detected image on the right.

Now, let’s find the contour of the page:

Figure 5: Using the detected images to find the contour and outline of the page to be scanned. — **Figure 5:** Using the detected images to find the contour and outline of the page to be scanned.

No problem there!

Finally, we’ll apply the perspective transform and threshold the image:

Figure 6: On the left we have our original image. And on the right, we can see the scanned version. The scan is successful! — **Figure 6:** On the *left* we have our original image. And on the *right*, we can see the scanned version. The scan is successful!

Another successful scan!

Where to Next?

Now that you have the code to build a mobile document scanner, maybe you want to build an app and submit to the App Store yourself!

In fact, I think you should.

It would be a great learning experience…

Another great “next step” would be to apply OCR to the documents in the image. Not only could you scan the document and generate a PDF, but you would be able to edit the text as well!

What's next? We recommend PyImageSearch University.

Course information:
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this blog post I showed you how to build a mobile document scanner using OpenCV in 5 minutes and under 75 lines of Python code.

Document scanning can be broken down into three distinct and simple steps.

The first step is to apply edge detection.

The second step is to find the contours in the image that represent the document we want to scan.

And the final step is to apply a perspective transform to obtain a top-down, 90-degree view of the image, just as if we scanned the document.

Optionally, you can also apply thresholding to obtain a nice, clean black and white feel to the piece of paper.

So there you have it.

A mobile document scanner in 5 minutes.

Excuse me while I call James and collect my money…

Did You Like this Post?

Hey, did you enjoy this post on building a mobile document scanner?

If so, I think you’ll like my book, Practical Python and OpenCV.

Inside you’ll learn how to detect faces in images, recognize handwriting, and utilize keypoint detection and the SIFT descriptors to build a system to recognize the book covers!

Sound interesting?

Just click here and pickup a copy.

And in a single weekend you’ll unlock the secrets the computer vision pros use…and become a pro yourself!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

About the Author

Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.

405 responses to: How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Aaron Altscher

September 2, 2014 at 11:03 am

Very informative and detailed explanation. Everything this blogger publishes is gold!
- Adrian Rosebrock
  
  September 2, 2014 at 12:16 pm
  
  🙂 Thanks, Aaron!
  - Marx
    
    April 24, 2017 at 6:43 am
    
    Hello Adrian. Thanks for your awesome posts! I’m completely blind, and your content has greatly helped me develop a proof of concept prototype in Python for an AI-guided vision system for blind people like me.
    Right now, I’m trying to add a function that will instruct a blind user if his or her camera is properly capturing all four points of an object with the largest contours (hence assuming nearest); and
    I think I can build on the block of code below to print out statements if the object with largest contours does not have all four points, but:
    I’m thinking of ways to do this that will allow me to instruct the blind user whether to move the focus of his or her camera to the left, right, up, or down …
    – I just need approximations. I think this can be done by getting location coordinates of the missing point/s (out of the 4 supposed points) of the object with the largest contours, and using those values to calculate and print out comprehensible instructions for the end user?
    Do you have any suggestions on how this can be done? Thanks in advance! 🙂
    
    # if our approximated contour has four points, then we
    # can assume that we have found our screen
    if len(approx) == 4:
    >>> screenCnt = approx
    >>> break
    - Marx
      
      April 24, 2017 at 6:46 am
      
      P.S. This is for providing blind users with guided instructions on how to properly point their camera at a piece of document with text, in order to run the OCR functions of my software…
      - Adrian Rosebrock
        
        April 24, 2017 at 9:31 am
        
        Hi Marx — this sounds like a wonderful project, thank you for sharing. I think you are on the right track here. Find the largest contour region in the image. If the approximated contour region does not have 4 vertices, tell the end user.
        
        As for determining how the user should move their camera, the angles between the vertices should be (approximately) 90 degrees. Compute the angles between the contours and if the angle is not 90 degrees, you’ll know which corner is missing and then be able to give the user instructions on how to move the paper/camera.
      - Marx
        
        April 24, 2017 at 2:50 pm
        
        Hi Adrian. Thanks for your help! 🙂
        However, I’ve been searching and reading up on how to measure the angles of contour vertices for several hours now, but I just can’t find something that I can comprehend and use. 🙁
        I also read up on how to sort contours from left to right and top to bottom, thinking that I’d be able to identify the missing edge and just tell the user to move his or her camera towards that missing edge for now (until I find something that allows me to give the user an approximation on how much to move the camera towards that direction) …
        …
        Also, I’m thinking if it would be better to implement this for a live camera feed, than for a captured image?
        That’s mainly because before capturing the image for OCR processing, the user will need to know the right time to capture the image, in real time preferrably …
        I’d greatly appreciate your suggestions regarding this matter. Thanks again! 🙂
      - Adrian Rosebrock
        
        April 28, 2017 at 9:59 am
        
        If your goal is to provide real-time feedback to the user, then yes, a live camera feed would be more appropriate. However, this makes the task a bit more challenging due to motion blur. I would instead suggest solving the problem for a single image before you start moving on to real-time video processing.
      - Skaag Argonius
        
        May 11, 2017 at 2:29 pm
        
        You can eliminate motion blur using a really good camera sensor, and manual control of white balance. This is critical since a lot of sensors come with a controller which by default will try to increase the brightness of the image by capturing multiple frames in succession and adding them together. This process is what creates motion blur so you need to simply disable automatic white balance in the controller, and you’ll get clean frames every time. However this also means that in some situations it will be too dark for the sensor to see anything. One way to solve this is to put a large amount of powerful infrared LED lights around or behind the sensor, and remove the infrared filter from the sensor so it becomes sensitive to infrared light. The sensor will not see colors, but for reading text from a page you don’t need colors. This way your sensor will see images even in total “darkness” without blinding the non-blind with a potentially strong white light. Reach out to me if you’re interested and I will send you information about such a sensor that we use in my company.
      - Adrian Rosebrock
        
        May 15, 2017 at 9:04 am
        
        Thanks for sharing Skaag!
  - pratap
    
    July 4, 2017 at 3:52 am
    
    Hai Adrian can you help me to extract text from scanned images these images are very low quality.
    i am using pytessaract module to find out text from scanned image but i am not able to find out. please help me to find out the text from scanned image
    - Adrian Rosebrock
      
      July 5, 2017 at 6:01 am
      
      Hey Pratap — I’ll be covering how to cleanup images before passing them into Tesseract in my next blog post. Be sure to stay tuned!
      - Bhargav
        
        March 22, 2018 at 10:32 pm
        
        Hi Adrian,
        
        I am wondering what is the title of your followup post after this that you mentioned above. Please let me know if you have it on top of your head.
        
        Thanks!
  - Surya
    
    May 1, 2019 at 1:10 am
    
    How long does it usually take to finish the process on a minimum config machine?
    - Adrian Rosebrock
      
      May 1, 2019 at 11:21 am
      
      Less than a second.
  - Cuong Nguyen Hung
    
    May 28, 2019 at 10:42 am
    
    Hello Adrian Rosebrock,
    I am Vietnamese,I am very interested in your blog.
    But I see almost your post is work on MÂC OS ?
    So I am using Windows OS,so your code can’t work on my computer.
    And I am beginner (don’t have programmer knowledge).
    But I want to learn Face recognize,text recognize and object detect/count only.
    Do you think I can ?
    - Adrian Rosebrock
      
      May 30, 2019 at 9:11 am
      
      I recommend using macOS or Linux to run the examples on this blog. Please note that I do not officially support Windows on this blog.
    - Roberto
      
      June 3, 2019 at 4:20 pm
      
      Hello Coung, I made this examples work on windows, you should be able too.
      I work now on mac OS but if you put some effort you should be able to make it work in work in windows.
      Since you are not a programer I would suggest you to start there, with some python tutorials on windows.
    - uchiha tashi
      
      November 25, 2019 at 1:51 pm
      
      yes, you can!
      I m even working on windows only and his program also works on the window OS too. you just need to change a little bit of his code and that’s it.
      even I had worked on face recognition and many more using windows only.
      if you need a personal guide I will help you when i m free! 🙂
Bob

September 2, 2014 at 4:43 pm

how hard would it be to pan parts of the document so it all fits into one panoramic view?
- Adrian Rosebrock
  
  September 3, 2014 at 7:25 am
  
  Awesome question, thanks for asking! Substantially harder, but certainly not impossible. Basically, your first step would be to perform “image stitching” to generate the larger panoramic picture. From there, you could essentially use the same techniques used in this post. Would you be interested in a blog post on image stitching?
  - Lúcio Corrêa
    
    November 6, 2014 at 9:58 am
    
    Would it be better to do stiching after perspective transform and threshold or before?
    - Adrian Rosebrock
      
      November 6, 2014 at 2:40 pm
      
      It depends on what your end goal is, but I would do stitching first, then perform the perspective transform.
Jose Luis

September 4, 2014 at 7:49 pm

Hello Adrian,

An example of image stitching would be great!!

Thanks for your awesome posts
Can

September 10, 2014 at 10:19 am

How can you export to Appstore your applications write with Python?

To my knowledge you cant. You must use c++.
- Adrian Rosebrock
  
  September 10, 2014 at 11:19 am
  
  Hi Can, you are correct. You must first convert it to iOS, but the algorithm is the same. All you have to do is port the code.
  - Alex
    
    December 8, 2014 at 7:21 pm
    
    Or run the python code on a server and upload the image from your phone.
    - Adrian Rosebrock
      
      December 8, 2014 at 7:41 pm
      
      Exactly. And for applications that don’t require real-time processing, I highly recommend doing this. You can update your algorithms on the fly and don’t have to worry about users updating their software.
      - Usama
        
        April 21, 2015 at 12:12 pm
        
        Hi Adrian,
        
        I want to learn how do I run a server so that my application gets process in my computer?
      - Adrian Rosebrock
        
        April 21, 2015 at 1:20 pm
        
        Hi Usama, I don’t have any posts on creating a web API/server for computer vision code yet, but it’s in the queue. I should have a blog post out on it within the next few weeks.
    - raphael
      
      February 16, 2016 at 2:37 pm
      
      Hi, how can you run a python code on a server? Where can I find a step by step on how to do this? thanks!
      - Adrian Rosebrock
        
        February 16, 2016 at 3:34 pm
        
        You can see this tutorial on converting an image processing pipeline to an API accessible via a server.
- bidder
  
  February 9, 2016 at 3:40 pm
  
  use kivy ( is a python framework to build mobile apps ) 🙂
Can

September 10, 2014 at 11:19 am

Also why you need scikit-image Open CV already have adaptive threshold?
- Adrian Rosebrock
  
  September 10, 2014 at 3:13 pm
  
  The scikit-image adaptive threshold function is more powerful than the OpenCV one. It includes more than just Gaussian and mean, it includes support for custom filtering along with median (Although I only use Gaussian for this example). I also found it substantially easier to use than the OpenCV variant. In general, I just (personally) like the scikit-image version more.
  - curtis
    
    July 2, 2017 at 6:57 pm
    
    If somebody wants to use the opencv threshold I think this is an equivalent substitute:
    
    warped = cv2.adaptiveThreshold(warped, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 251, 11)
Atanas Minev

September 16, 2014 at 10:17 am

Cool (post-OCR) improvement would be to recognize receipts and communicate information with budgeting app 😉
- Adrian Rosebrock
  
  September 16, 2014 at 10:21 am
  
  Agreed! That would be a really fantastic improvement. And I think (most) budgeting apps provide an API to interface with the app.
Alex

October 1, 2014 at 12:08 pm

Hello guys,

Great Job, good instructions, perfect !!!
Do you know, if and how this could be work on Android devices ?
- Adrian Rosebrock
  
  October 1, 2014 at 12:57 pm
  
  Absolutely, you would just have to look into using the Java OpenCV bindings.
- Andrea
  
  August 18, 2015 at 7:58 am
  
  This example may be a good starting point: https://github.com/jhansireddy/AndroidScannerDemo
Toni

October 15, 2014 at 9:05 am

Very informative. Great!! Thanks.
Matthew Nichols

October 15, 2014 at 9:46 am

What applications are you using to code? I don’t recognise the icons.

Cheers,

Great post btw
- Adrian Rosebrock
  
  October 15, 2014 at 10:09 am
  
  I use Sublime Text 2 most of the time.
qubodup

October 15, 2014 at 11:42 am

I’m sorry, what constitutes the “mobile” part of a mobile doc scanner?

To me: it runs on a smartphone that runs Android, iOS, WP8 or Whatever the name of the BB OS is.
- Adrian Rosebrock
  
  October 16, 2014 at 7:06 am
  
  You are correct, the “mobile” part of a mobile document scanner is an app that runs on a smartphone device that utilizes the smartphones camera to capture documents and “scan” them.
Sasa

November 5, 2014 at 9:29 am

I would like to see some OCR on it or on just some simple text or numbers.
- Adrian Rosebrock
  
  November 5, 2014 at 10:21 am
  
  Hi Sasa, good point. OCR would be a really great extension. If you’re interested in recognizing text, especially handwritten digits, definitely take a look at my Practical Python and OpenCV book. Inside I cover recognizing handwritten digits. Definitely not the same as OCR, but if you’re interested in recognizing text, it’s a pretty cool place to start.
Gaduc

December 14, 2014 at 9:32 am

Yeah!
I played with that some time ago in order to scan books.
But I faced to a harder problem: pages of a book are not flat but warped.
I was able to isolate the curve the page made from the flatbed (*).
And then?
There is two transformation to achieve:
– first to convert from a warped surface to a flat one ;
– secondly to convert from a perspective surface to a rectangular one (easy as you did it).
How can we do the first conversion? Bilnear formula?

Another problem I used to face to is to progressively cancel the shade that appears as the distance between the page surface and the camera increases.

(*) There are different methods to achieve that:
– take a shot at 45° ;
– use the shade as an approximate distance from the lens.

I you have idea how to compute that …

db
- Manuel
  
  October 16, 2016 at 2:42 pm
  
  I would like to know about this too.
Milo Hyson

December 26, 2014 at 2:00 pm

This is a great approach when dealing with small things like a typical receipt. But unless you’re going to take multiple pictures and stitch them together, the resolution will suffer as the item to be scanned gets larger and you have to pull the camera back to get it all into frame. This is where purpose-built document scanners really shine. They can capture a metre-long receipt at full resolution.
mohammad

February 10, 2015 at 12:54 am

hi
What IDE do I use Python and Open CV? Please download link
- Adrian Rosebrock
  
  February 10, 2015 at 6:43 am
  
  Hi Mohammad. I use Sublime Text 2 and PyCharm. Definitely check them out!
  - mohammad
    
    February 11, 2015 at 4:15 am
    
    Thank
    You’re very cool
Richard

March 18, 2015 at 4:57 pm

Hi great post!! I really like your web and your tutorials are the best! Just a doubt… is it better to use a bilateral filter instead of a gaussian one to smooth the image? If a recall right, the bilateral filter preserves better features like edges…
Great post, keep doing this great job! Thanks.
- Adrian Rosebrock
  
  March 18, 2015 at 6:00 pm
  
  Great observation! A bilateral filter could definitely be used in this situation. However, I choose to use Gaussian smoothing, simply because there were less parameters to tune and get correct. Either way, I think the result would be the same.
  - Richard
    
    March 23, 2015 at 2:34 pm
    
    Thanks!
Jasper

April 20, 2015 at 10:36 am

Hi thanks for the post, really helpful! I noticed the contour detection approach isn’t working to well when part of the document to capture is offscreen. Any idea’s on how to solve this? TIA
- Adrian Rosebrock
  
  April 20, 2015 at 10:44 am
  
  In general, you’ll need the entire document to be in your image, otherwise you won’t be able to perform a reliable perspective transform.
Joe Landau

June 5, 2015 at 1:43 am

This exercise requires scikit-image, which someone who just installed OpenCV and Python on a new Raspberry Pi 2 would not have. Installing scikit-image in turn seems to require scipy, which I am trying to install (slowly using pip install -U scipy) at this very minute. Perhaps a setup step would help.
- Adrian Rosebrock
  
  June 5, 2015 at 6:15 am
  
  Good point Joe. How are you liking your Pi 2?
  - Joe Landau
    
    June 5, 2015 at 1:48 pm
    
    So far the Pi 2 is doing well. The installation of scipy took between 1 and 2 hours (I didn’t time it) and then scikit-image took only minutes. Using the browser thru VNC displaying 1920 x 1080 is a bit slow, I’ll have to work with a smaller screen. I won’t know if the Pi 2 is adequate for my application until I get there–if the application works but is slow I will have to go to a faster system, maybe a Tegra.
    - Adrian Rosebrock
      
      June 5, 2015 at 2:09 pm
      
      If you’re doing most of your work via terminal, I would suggest using SSH forwarding instead of VNC: $ ssh -X pi@your_ip_address. You’ll be able to execute your Python scripts via command line and the OpenCV windows will still show up.
  - Joe Landau
    
    June 5, 2015 at 8:39 pm
    
    I withdraw the information about installing scikit-image. I didn’t realize that the first try had failed. In fact, it took over an hour.
Alexander

June 23, 2015 at 1:49 pm

Hi Adrian,

Thank you so much for the great article and for the rest of your series!

I stumbled with the task of how to correct the document scan of a sheet of paper that has been folded 2 or 4 times?
Could you please take a look at my question here: http://stackoverflow.com/questions/31008791/opencv-transform-shape-with-arbitrary-contour-into-rectangle.

Will appreciate if you could give some direction on how to achieve this.
- Adrian Rosebrock
  
  June 23, 2015 at 6:33 pm
  
  If the paper has been creased or folded, then you’ll want to modify Line 48: if len(approx) == 4: and find a range of values that work well for the contour approximation. From there, you’ll want to find the top-left, top-right, bottom-right, and bottom-left corners of the approximation region, and then finally apply the perspective transform.
  - Alexander
    
    June 24, 2015 at 4:05 pm
    
    Adrian, thank you for the answer!
    But I need to clarify: I’m able to find the corners of a folded\creased paper and perform the proper perspective transform using those four points. In other words, your whole perfect algorithm works fine. What I’m doing is trying to take a step further 🙂
    The step I’m asking about is how to “straighten” (recover rectangular shape of) the paper with OpenCV? I.e. stretch it so that its edges become touching surrounding rectangle.
J

July 15, 2015 at 1:03 am

I’m really not understanding where you put the path to the file that will be scanned. Can you give an example of proper usage of the code on lines 10-13?
- Adrian Rosebrock
  
  July 15, 2015 at 6:39 am
  
  If you have an image named page.png in an images directory, then your example command would look like this:
  
  $ python scan.py --image images/page.jpg
  
  Definitely download the code to the post and give it a try!
fariss

August 13, 2015 at 12:14 pm

hi,

your source code was very helpful,
what’s the functions to do some image processing for the scanned image (contrast, btightness, contrast…) ?, i removed the part that gives the image the ‘black and white’ paper effect…
- Adrian Rosebrock
  
  August 14, 2015 at 7:23 am
  
  I would suggest taking a look at the official OpenCV docs to get you started.
Bhuvesh

August 25, 2015 at 5:22 am

Great work!!
Sir, can i use this on Raspberry Pi B+ or 2 ?
If Yes, than How!
Please guide me I’m working on some related project.
- Adrian Rosebrock
  
  August 25, 2015 at 6:35 am
  
  You could certainly use either, but I would suggest going with the Pi 2. From there, you can follow my OpenCV install guide for the Raspberry Pi. Once you have OpenCV installed on your system, you should be able to download and execute the code in this post.
Mohd Ali

October 8, 2015 at 1:48 am

I keep getting this error at line 37.

(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

ValueError: too many values to unpack

I’ve triend this script on 3-4 images but getting same error. I tried to debug, but didn’t succeed.
- Adrian Rosebrock
  
  October 8, 2015 at 5:55 am
  
  I’ve mentioned the solution to this problem many times on the PyImageSearch blog before. The reason you are getting this error is because you are using OpenCV 3 — this post was written for OpenCV 2.4, well before OpenCV 3 was released. Please see this post for a discussion on how the cv2.findContours return signature changed in OpenCV 3.
- ashish
  
  February 9, 2016 at 1:22 am
  
  try this….
  
  (cnts) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
  - Adrian Rosebrock
    
    February 9, 2016 at 3:58 pm
    
    This is incorrect. For OpenCV 3, it should be:
    
    (_, cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
    - Jorvan Rodrigues Brito
      
      November 22, 2017 at 3:09 pm
      
      Many thanks Adrian, this information solved my problem !!
      - Adrian Rosebrock
        
        November 25, 2017 at 12:43 pm
        
        Awesome, I’m glad to hear it Jorvan 🙂
parviz

October 15, 2015 at 7:56 am

dude! you are so cool 😀 thanks buddy
- Adrian Rosebrock
  
  October 15, 2015 at 12:28 pm
  
  No problem Parviz 🙂
Hacklavya

October 15, 2015 at 6:23 pm

I was very happy to see this tutorial, but then I found that you didn’t tell howto install OpenCV 2.4.X with Python 2.7

So please tell me link to do that.

or more generally there is no date-wise posts in here, so that I see what is all on your website.

and how can I search a particular post on your website?
- Adrian Rosebrock
  
  October 16, 2015 at 6:17 am
  
  There is a search bar at the bottom-right corner of the sidebar on every page on the blog. As far explaining how to install OpenCV 2.4 and Python 2.7, I cover that in this post and in my book.
Hacklavya

October 25, 2015 at 3:28 am
I tried many help from internet to install scikit image,
but this line:
```
from skimage.filter import threshold_adaptive
giving following error
ImportError: No module named skimage.filter
```
please tell us how to fix it.
- Adrian Rosebrock
  
  October 25, 2015 at 6:21 am
  
  Please see the official scikit-image install instructions. All you need to do is let pip install it for you:
  
  $ pip install -U scikit-image
  - Dirk Josefiak
    
    October 13, 2016 at 3:33 pm
    
    thanks a lot for all your work and this tipp also
  - syaifulnizam bin amran
    
    December 9, 2017 at 2:27 am
    
    $ pip install -U scikit-image
    
    while installing, it stuck at
    
    Running setup.py bdist_wheel for scikit-image …
    - Adrian Rosebrock
      
      December 9, 2017 at 7:25 am
      
      It’s likely not stuck. It takes a long time to compile and install scikit-image. If you take a look at your processor usage you’ll see that the processor is very busy compiling and installing scikit-image. What system are you trying to install scikit-image on?
  - nidhi
    
    February 8, 2018 at 12:51 am
    
    this should be installed inside the (CV) or outside???
    - Adrian Rosebrock
      
      February 8, 2018 at 7:51 am
      
      If you are using Python virtual environments you should be installing inside the “cv” virtual environment.
  - Lady
    
    October 4, 2018 at 12:05 pm
    
    Thank you very much for the help you had the same problem, really thank you for sharing your knowledge with us
Cyrus Smith

November 4, 2015 at 11:48 am

Hello, Adrian!
I have downloaded your code and tried to launch it on my computer and I failed.
I use PyCharm 4.5, OpenCV300 and Python 2.7.
I think it’s not versions thing.
There is a line on main code (scan.py) :

from skimage.filter import threshold_adaptive

But there is no skimage folder in your project…
What should I do?
Thanks.
- Adrian Rosebrock
  
  November 4, 2015 at 1:20 pm
  
  The error can be resolved by installing scikit-imagee:
  
  $ pip install -U scikit-image
  
  However, keep in mind that this tutorial is for OpenCV 2.4.X, not OpenCV 3.0, so you’ll need to adjust Line 37 to be:
  
  (_, cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
  - Mike Mehr
    
    November 19, 2015 at 5:06 am
    
    Hi Adrian,
    
    I also found that I had to add parens to the print statement arguments on lines 32, 56, and 73 when running with Python 3.5 (they are optional in v2.7).
    
    There is also a warning that the skimage.filter submodule has had a name change to skimage.filters, so I added the ‘s’ on line 7 and now it runs without any errors or warnings. The warning says the old name is deprecated and will be removed in version 0.13. This occurs for both virtual environments/versions of Python.
    
    It seems like these changes might impact some of your other code on the site as well.
    
    Best regards,
    Mike
    - Adrian Rosebrock
      
      November 19, 2015 at 6:16 am
      
      Thanks for the update Mike. I’m still trying to figure out a decent solution to handle the OpenCV 2.4 vs. OpenCV 3 issue. As a blog post that will come out this Monday will show, many users are still use OpenCV 2.4. (although that number growing and will continue to grow). But since this is a teaching blog, it’s not exactly easy to switch the code from one version to another if tow equally sized groups of people are using different versions.
Teddy

November 21, 2015 at 11:02 pm

do you have code for building in OCR into the Scanner?
- Adrian Rosebrock
  
  November 22, 2015 at 7:15 am
  
  Not yet, but that’s something I would like to cover in a future blog post.
  - Gary King
    
    November 23, 2015 at 8:42 am
    
    Adrian,
    
    I am a novice Pyphon developer, but was wondering if the scanning and OCR reading could have all been achieved via Javascript/HTML5?. I find it very hard trawling the javascript libraries to find something that will mostly do this. I don’t mind glue things together, but would prefer it if most of the hard work were done via some commercial library if possible, and obviously don’t mind paying for such a product. Do you know of any libraries that might fit this requirement?.
    - Adrian Rosebrock
      
      November 23, 2015 at 11:28 am
      
      I don’t use JavaScript often, but I’ve heard of Ocard.js being used for OCR with JavaScript. I haven’t personally tried it out, but it might be worth looking into.
Mouiche

December 3, 2015 at 3:11 am

Dear Adrian and other members
Your job is highly interesting, I have a related project for 3 weeks.
– first, I have to take one scanned page that can be inclined in all directions and do the transformation to have what you did in your code in pdf.
– Secondly, snap two pages of a book, transform the edge’s curves into straight lines and finally have these pages in a rectangle in pdf.
Now, I am trying to use your code as a model but i don’t have openCV in my computer. which module or function can i use? I have “skimage” . Are there other links or documents that can help to solve this problem.
Thank you.
- Adrian Rosebrock
  
  December 3, 2015 at 6:22 am
  
  Having scikit-image is a good start, but I would really recommend getting OpenCV installed as well. I have tutorials detailing how to get OpenCV installed on your system here. I’m not sure if I understand the second part of your question, but if you’re trying to “scan” two pages of a book side-by-side, you can try finding the center spine of the book using edge detection. Once you have it, you can apply perspective transforms like the ones used in this post.
  - Mouiche
    
    December 3, 2015 at 8:38 am
    
    Thank you, I am trying now to install openCV as well.
    In my second task I am trying to scan two pages of a book not side-by-side. So I will be asked to manage the horizontal lines that will look like curves and the middle-line between these pages that will not be well seen after scanning, but i need to manage it in this job
Mouiche

December 10, 2015 at 5:19 am

Please,

can someone help me on how to proceed for the case of two scanned pages of a book. Two pages are scanned together and i need to do the transformation to have the plat (or flat) rectangular form of these pages.
- Adrian Rosebrock
  
  December 10, 2015 at 6:48 am
  
  As I suggested over email, the try to detect the “spine” of the book using the Canny edge detector. If you can determine where the boundaries of the book and the top/bottom intersect, you can essentially split the book in two, determine your four points for the transformation, and apply the perspective warp.
Steve Dyson

December 14, 2015 at 6:14 pm

Does anyone have something similar built for JS? We are working on a document scanner and would love some help on where to get started.

OCR is not needed – we only need the cropping, alignment, and conversion to b&w.

Thank you!

Steve
- Adrian Rosebrock
  
  December 15, 2015 at 6:18 am
  
  Computer vision is slowly starting to expand to the browser (since computers are becoming faster), but currently there is no full port of OpenCV to JavaScript, making it hard to build applications like these.
  
  If you want to do something similar to this in JavaScript, I would suggest wrapping the document scanner code into an API, like this tutorial does. From there you can make requests to the API from JavaScript and display your results.
Sainandan

December 16, 2015 at 1:27 am

Hey Adrian,

This mini project is really handy in terms of usage and fundamental exposure to IP.

But how do you integrate this python code into a mobile android application ?
- Adrian Rosebrock
  
  December 16, 2015 at 6:32 am
  
  I would suggest wrapping your Python code using a web framework such as Django or Flask and then calling it like an API. You can find an example of doing this here.
singh

December 18, 2015 at 10:46 pm

Hi, scan.py hangs after Step 1. I left it running for more than hour but still didn’t finish. I am on Mac OS X El-captain. Using the Opencv 2.49 and Python 2.7. I have all the modules installed. I tried your transform_example.py and that works fine. Am I missing anything here? Do I need to hit a command or something?

Commenting out the following helped. Thanks. (sorry new to Python and OpenCV but loving it so far).
#cv2.waitKey(0)
#cv2.destroyAllWindows()
- Adrian Rosebrock
  
  December 19, 2015 at 7:45 am
  
  Interesting, I’m not sure why it would take so long to process the image during Step 1. The cv2.waitKey call wouldn't matter, provided that you clicked the window and pressed a button to advance the process. The cv2.waitKey method pauses all execution until a key is pressed.
singh

December 18, 2015 at 11:21 pm

How do i save the final image in a new file?
- Adrian Rosebrock
  
  December 19, 2015 at 7:47 am
  
  You can use the cv2.imwrite method:
  
  cv2.imwrite("output.png", warped)
  
  And if you’re just getting started learning Python + OpenCV, you should definitely take a look at Practical Python and OpenCV. This book can get you up to speed with Python + OpenCV very quickly.
Mukundan

January 6, 2016 at 1:26 am

Thanks for this informative article
deepak

January 23, 2016 at 5:55 pm

hi Adrian,
really nice tutorial there.I am currently trying to follow it to build an app of my own. I wanted to ask if its possible to generate a crude 3d wireframe model from a photo with probably the users help in correcting the edges. basically take 2-3 photos from a phone and then process it to create a simple 3d wireframe model.
- Adrian Rosebrock
  
  January 25, 2016 at 4:13 pm
  
  It’s certainly possible, but you’ll realistically need more than 2-3 photos. I would suggest reading up on 3D Reconstruction.
Dino

February 3, 2016 at 4:50 am

Hi Adrian,

In line 37:
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

Does it better to use cv2.RETR_EXTERNAL as second parameter in findContours ?
- Adrian Rosebrock
  
  February 4, 2016 at 9:20 am
  
  For this project, yes, I would suggest using cv2.RETR_EXTERNAL. I should have mentioned that in the original post 🙂 Thanks for noting this!
Linus

February 4, 2016 at 4:37 am

Hi, and thanks for a great tutorial.

To the point: I’m having trouble with the part where you sort the contours according to area, around line 38 in the sample code. The problem is that the contours that correspond to big blobs (such as the receipt border) does not at all end near the top of the list. Instead the largest contour is a contour around the WHOLE FOOD title.

I have checked that the contours are indeed sorted in descending order by the value returned by contourArea(), and sifted through all contours to verify that the contour enclosing the receipt indeed is present in the list. However, the area corresponding to that contour is uncannily small (about 12 pixels).

The issue, I believe, is that findContours() here finds the contours of the canny-edges rather than the enclosed objects. However, why this happens to me and none of you is beyond my comprehension. Maybe I have an unknown ascendency to Murphy.

Anyway, does anyone here have any I idea what might be going on?

Also, I’m using opencv 3.1.0 and python 3.4
- Adrian Rosebrock
  
  February 4, 2016 at 9:10 am
  
  Just to clarify, are you using the same Whole Foods receipt as I am in the blog post? It sounds like there might be an issue segmenting the actual receipt from the image. The likely issue is that that the border of the receipt outline is not “complete”. Can you take a second look at the edge map of the receipt and ensure that it is all one continuous rectangle?
  
  EDIT: Try changing cv2.RETR_LIST to cv2.RETR_EXTERNAL in cv2.findContours and see if that resolves the issue.
  - Linus
    
    February 5, 2016 at 11:06 am
    
    Yes I use the same image as in the post. Well almost. I used gimp to cut out the part containing the receipt.
    
    Anyhow, the problem was a discontinuity in the edge map, just as you suspected. And a solution was to decrease the size of the Gaussian filter ((3,3) worked for me). Maybe the reason that I get this problem, but not anybody else, is that I use an original image of significantly lower resolution. I suspect your mobile camera does better than 496×669?
    
    Thanks for quick response!
    - Adrian Rosebrock
      
      February 6, 2016 at 9:59 am
      
      Indeed, the cropping in Gimp must have caused some sort of issue. My camera is an iPhone, so the resolutions is very high. I actually reduced the resolution of teh original image for this example. In fact, I wouldn’t suggest processing images larger than 640 x 480 pixels if at all possible. It’s normally to resize images prior to processing them. The less data there is to process, the faster the algorithm will run. And most computer vision functions expect smaller image sizes for more accurate results (edge detection and finding contours, for example).
Ghassan

February 14, 2016 at 9:04 am

if we put image scanned from scanner it will show this error

AttributeError: ‘NoneType’ object has no attribute ‘shape’

how to pass it ?
- Adrian Rosebrock
  
  February 14, 2016 at 9:48 am
  
  Anytime you see an error related to an image being NoneType, it’s 99% of the time due to an image not being loaded from disk properly or read from a stream. Make sure the path you supply to cv2.imread is valid. I demonstrate how to execute the Python script via command line (and pass in the image path argument) in this post.
sagar

February 16, 2016 at 2:57 pm

Hi Adrian,

I run your code on Anaconda, Windows. It runs perfectly.
I want to build an mobile android app on android studio, There are many functionalities including document scanning. As user have to get picture of any object, app may be responsible to get result out of it, But this code need some bindings with android code.
How to do this. How to integrate both these together ?
- Adrian Rosebrock
  
  February 16, 2016 at 3:33 pm
  
  I personally don’t use Windows or Visual Studio, nor to I do any coding for the Android environment. That said, you have two options:
  
  1. Convert the code from Python to Java + OpenCV (there are OpenCV bindings available for the Java programming language).
  2. Wrap the document scanner code as computer vision API, then upload the image from the Android app to the API, process it, and then return the results.
  - Reza
    
    March 30, 2016 at 4:50 pm
    
    If the second approach (using HTML5/javascript) can be implemented, then the export to mobile phones as a native application (for instance apk for android) would be very easy using CORDOVA.
    - Adrian Rosebrock
      
      March 30, 2016 at 5:13 pm
      
      Which is the exact same approach I took when building both Chic Engine and ID My Pill 🙂
      
      I also demonstrate how to wrap a computer vision app as an API and access it via PhoneGap/Cordova inside the PyImageSearch Gurus course
Damien Mac Namara

March 14, 2016 at 9:35 pm

Could this be easily modified for video?
- Adrian Rosebrock
  
  March 15, 2016 at 4:35 pm
  
  Absolutely. You just need to wrap the code in a look that access a video stream. This blog post should be a good starting point.
  - Damien Mac Namara
    
    March 22, 2016 at 7:02 pm
    
    Thanks Adrian for your quick response. I cant recommend you enough.
    - Adrian Rosebrock
      
      March 24, 2016 at 5:23 pm
      
      No problem, happy to help 🙂 And thanks for the kind words!
sunchy11

March 16, 2016 at 2:52 am

Hi, Adrain,

I try a lot of different coupons except your sample “whold food “one.
I got the difficulties in finding the four points of the edge.
It looks like the “approx” is not 4 points for some of them.

so the error is ‘screenCnt’ si not defined.

Thank you !
- Adrian Rosebrock
  
  March 16, 2016 at 8:09 am
  
  If the approximated contour does not have 4 points, then you’ll want to play with the percentage percentage parameter of cv2.approxPolyDP Typical values are in the range of 1-5% of the perimeter. You can try working adjusting this value to help with the contour approximation.
  - Daniel Bornman
    
    April 7, 2016 at 4:06 pm
    
    I’m having issues with getting a clean contour that represents a full piece of paper. My paper contour is represented by two separate tuples in the cnts array. One tuple is for the left and bottom edge, and a distance away is the tuple for the top and right edge. Adjusting parameters within the cnts array is too late to find a all encompassing document contour.
    
    I tried changing the parameter in findContours() as suggested above from cv2.RETR_LIST to cv2.RETR_EXTERNAL but that did not fix the problem.
    
    I took a photo with my iphone of a 8×11 piece of paper with regular type against a plain dark background. I intentionally took it at a slight angle to test the transform function. It appears that the assumption of 4 clean points is failing.
    - Adrian Rosebrock
      
      April 8, 2016 at 12:55 pm
      
      If you’re not getting one contour that represents the entire piece of paper, then the issue is likely with the edge map generated by the Canny edge detector. Check the edged image and see if there are any discontinuities along the outline. If so, you’ll want to tune the parameters to cv2.Canny to avoid these problems or use a series of dilations to close the gaps in the outline.
      - Jim
        
        October 17, 2016 at 11:30 am
        
        I got every thing installed, all very smooth, but experiencing the same problem
        
        STEP 2: Find contours of paper
        Traceback (most recent call last):
        File “scan.py”, line 63, in
        cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)
        NameError: name ‘screenCnt’ is not defined
        
        i tried the auto_canny and still have same error.
        
        Certain problem is with me, but not sure where. Thanks
      - Adrian Rosebrock
        
        October 17, 2016 at 2:01 pm
        
        If the screenCnt is None it’s because there are no contours in your image that contain 4 vertices. Take a look at your edge map and explore the contours. You might need to tune the value of the contour approximation.
  - Samvatsar Shastrimath
    
    February 28, 2019 at 5:52 am
    
    Thanks for the awesome and detailed explanation. I am facing the same issue that sunchy11 was facing. I am not able to find the rectangle/ four points. So, it throws the error: name ‘screenCnt’ is not defined. I tried changing the value of perimeter from 1-5%. Still no luck. Can u please let me know what may be the issue.
    - Adrian Rosebrock
      
      February 28, 2019 at 1:36 pm
      
      Are you using the same example images in this tutorial? Or your own image?
K van de Maan

April 9, 2016 at 2:40 pm

thank you
TD

April 13, 2016 at 12:57 am

Hi Adrian,

Great article. I am encountering the issue below when following your instruction. Please advise.
Note that I am using openCV (3.1), python (3.5.1), numpy (1.11.0) & scikit-image (0.12.3)

….site-packages/skimage/filters/thresholding.py”, line 72, in threshold_adaptive
““block_size“ {0} is even.”.format(block_size))
ValueError: The kwarg “block_size“ must be odd! Given “block_size“ 250 is even.

Few steps I revised in order to make it worked.
+ parenthesis for the print command: print (“STEP 1: Edge Detection”)

+ skimage.filters —> skimage.filters

+ comment out the following:
# cv2.waitKey(0)
# cv2.destroyAllWindows()
- Adrian Rosebrock
  
  April 13, 2016 at 6:55 pm
  
  Thanks for sharing TD. It looks like the function signature to threshold_adaptive changed in the latest release of scikit-image. I’ll need to take a closer look at this. I’ll post an update to the blog post when I have resolved the error.
  
  UPDATE: In previous versions of scikit-image (<= 0.11.X) an even block_size was allowed. However, in newer versions of scikit-image (>= 0.12.X), an odd block_size is required. I have updated the code in the blog post (along with the code download) to use the correct block_size so everything should once again work out of the box.
Mickey Friedman

April 20, 2016 at 12:15 pm

Adrian, this is wonderful!

Unfortunately my code doesn’t run past Step 1…it just stops before “(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5]”

Do you know how I can fix this?
- Adrian Rosebrock
  
  April 20, 2016 at 5:59 pm
  
  Hey Mickey — please read the previous comments, specifically my reply to “Ashish”. It details a solution to what is (likely) causing your problem with cv2.findContours.
Chuck

May 3, 2016 at 10:06 am

Hey man,
you’re amazing!!!!

I’m a German student, and I’m working right now with opencv and Python.

I installed the opencv and python with another post of you.

Now, I wanted to scan a receipt, and searched in Internet. And what did I found? A post you wrote. 🙂
thanks a lot!
- Adrian Rosebrock
  
  May 3, 2016 at 5:44 pm
  
  Awesome! I’m glad I could be of help Chuck! 😀
RANVIR

May 25, 2016 at 12:48 am

It is working good with only for the given example but not working in any other image. so please make it dynamic so it can recognize edge of any image ie, in any color any light.
- Adrian Rosebrock
  
  May 25, 2016 at 3:26 pm
  
  It’s hard to ensure guaranteed edges in any color or lighting conditions, but you might want to try the automatic edge detector.
Sid

May 27, 2016 at 7:07 am

Hey Adrian,

My current set up is OpenCV 3.1.0 + RPi Model B + Python2.7.9

I’ve followed your tutorial on installing OpenCV 3 on the Pi. Did that include installing the scikit-learn module ?

I tried running the code the code and got an error : “No module named skimage.filters” on line 7.

What are the changes needed for the code to work on OpenCV 3? Thanks.
- Adrian Rosebrock
  
  May 27, 2016 at 1:27 pm
  You’ll need to install scikit-image on your system:
```
$ workon cv # to access your virtual environment
$ pip install -U scikit-image
```
  - Sid
    
    May 29, 2016 at 8:21 am
    
    Hey Adrian,
    
    I tried the “import Scipy” works out of the cv environment
    
    When I’m in the cv environment it gives the “No module named Scipy” error!!
    
    Is there a way to shift the Scipy folder to the correct path?
    - Adrian Rosebrock
      
      May 29, 2016 at 1:51 pm
      
      You need to install SciPy into your virtual environment:
      
      $ workon cv $ pip install scipy
gigi

June 2, 2016 at 8:14 am

Hey Adrian

Thx for your Kick-Ass-Sample-Code.

One question though:
Line 67: warped = warped.astype(“uint8”) * 255

I don’t really get what’s going on here (and why).
- Adrian Rosebrock
  
  June 3, 2016 at 3:09 pm
  
  After coming out of the threshold_adaptive function, we need to ensure that the image is an 8-bit unsigned integer data type, which is what OpenCV expects.
  - Nirvar
    
    November 11, 2018 at 5:04 pm
    
    I understand that astype casts the complete array to uint8 and uint8 is for a 8 bit unsigned integer. But why are we multiplying it with 255?
    - Adrian Rosebrock
      
      November 13, 2018 at 4:51 pm
      
      Because (warped > T) returns an array of booleans which when translated into integers is either “0” or “1”.
friedman

June 2, 2016 at 5:04 pm

Hi Adrian!

I was wondering if there was a way to adjust the document scanner’s sensitivity to edge detection? I want to detect faint edges on white surfaces. Is there anything i can do about contouring or thresholding?
- Adrian Rosebrock
  
  June 3, 2016 at 3:03 pm
  
  You can control the edge detection process via the two parameters to the cv2.Canny function. You can read more about these parameters here. But in general, you’re going to have a real hard time detecting faint edges against a white surface. Edge detection requires that there be contrast between the background and foreground.
uma

June 3, 2016 at 1:29 am

hi….
can i get certificate after completion of this course?
- Adrian Rosebrock
  
  June 3, 2016 at 3:00 pm
  
  I only offer a Certificate of Completion inside the PyImageSearch Gurus course. A Certificate of Completion is not provided for the free OpenCV/Image Search Engine courses.
abggcv

June 9, 2016 at 12:07 am

This code did not work on the image of a graph I have:
https://www.dropbox.com/s/buexnnip3z4x4sl/2011-01-17.jpg?dl=0

I ran your code and it did not give me the edges as expected. This code is not generic to be used to scan any kind of document. It would be nice if a generic code or approach can be suggested because that is what the professional scanning apps does.

User should not have to play with parameters like size, gaussian blur
- Adrian Rosebrock
  
  June 9, 2016 at 5:16 pm
  
  While there are such things as “generic edge detectors”, they normally require a bit of machine learning to use. In fact, much of computer vision and machine learning is tuning parameters and learning how to tune them properly. Anyway, you might want to give the auto_canny function a try for parameter free edge detection.
Ramon Gajardo

June 16, 2016 at 1:40 pm

Please any idea because the line code the scan documents
warped = threshold_adaptive(warped, 255, offset=11)
take too many time like 30 sec.,

Thank you
- Adrian Rosebrock
  
  June 18, 2016 at 8:20 am
  
  If it’s taking a lot time to process your image, then your image is likely too large. Resize your image and make it smaller. The smaller your image is, the less data there is to process, and thus the faster your program will run.
Jon

June 20, 2016 at 2:57 pm

Small question, in this line:

cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)

What does the brackets does specifically around screenCnt? Also, is there an other tutorial where the functions, algorithms and arguments are explained or we need to look at the OpenCV documentation?
- Adrian Rosebrock
  
  June 20, 2016 at 5:21 pm
  
  The brackets simply wrap the contours array as a list, that’s all. I would suggest either refer to the OpenCV documentation or go through Practical Python and OpenCV for a detailed explantation of cv2.findContours.
Em

June 21, 2016 at 4:24 pm

Hi Adrian,
Beautiful work.
Is there a way to use houghtransform (or some other command) to close open contours? Can you use houghtransform over canny? What would that look like?

sometimes 3 out of four edges of a document come out clearly in pictures, but the fourth is only half detected. Its so close to working perfectly, but i don’t know what to do!

-em
- Adrian Rosebrock
  
  June 23, 2016 at 1:24 pm
  
  If you have open contours, I would suggest using morphological operations to close the gap. A dilation or a closing operation should work well in that case.
John

June 22, 2016 at 2:49 pm

Hi Adrian,

I love your blog!

I am having trouble with the image resolution. I want the output of the image resolution to be similar to the image i am inputing.

Also, some images that i input i get “NameError: name ‘screenCnt’ is not defined”. Does that mean the program does not detect 4 edges.

Thank you,

John
- Adrian Rosebrock
  
  June 23, 2016 at 1:15 pm
  
  Yep, that’s correct! I would insert a print statement into your code in the for loop where you loop over the contours to confirm this.
  
  However, a better way to solve this problem would be to keep track of the ratio of the width of original to the resized image. Perform your edge detection and contour approximation on the resized image. Then, multiply your bounding box region by the ratio — this will give you the coordinates in terms of the original image.
Kalpesh kp

August 3, 2016 at 10:38 am

Hello Adrian Rosebrock,

Your blog is superb and like to do something like your demo. I want to develop app in ios which detects objects from video and want to count total number of objects found in Video.

So i plan to divide video in multiple images and then selecting any image and try to identify objects from image. But i am not getting any help with OpenCv much, but while i am looking at your demo. It can help me.

So can you please tell me how can i use your python code in my objective C code.

Thanks in Advance.
- Adrian Rosebrock
  
  August 4, 2016 at 10:14 am
  
  Your project sounds neat! However, I only provide Python code on this blog post — not Objective-C.
Cristian

August 9, 2016 at 4:55 pm

Hi, Adrian!

Can this tutorial still be implemented in an app for android current versions? If yes, how can I get in touch with somebody that does it?

Thanks
- Adrian Rosebrock
  
  August 10, 2016 at 9:27 am
  
  Yes, you can use this algorithm inside of an Android app, but you would need to port it to Java + OpenCV first. As for as an Android developer, I would suggest using Upwork or Freelancer to find a developer suitable to your needs and budget.
Matthew Montebello

August 24, 2016 at 9:41 am

Had to change Line 37 to:

_, cnts, _= cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

with Python3.5.2 and Opencv3
- Adrian Rosebrock
  
  August 24, 2016 at 12:12 pm
  
  Indeed, this blog post was written well before OpenCV 3 was released. You can read more about the changes to cv2.findContours between OpenCV 2.4 and OpenCV in this post.
John

September 2, 2016 at 12:25 pm

How would apply OCR to the processed image?
- Adrian Rosebrock
  
  September 5, 2016 at 8:10 am
  
  There are many ways to apply OCR to the thresholded image, but to start, you could try Tesseract, the open source solution. I also really like the Google Vision API for OCR as well.
Tahir

October 3, 2016 at 2:17 am

Good explanation,can you tell me where i can use this feature mean to say where to use this docment scanner
- Adrian Rosebrock
  
  October 3, 2016 at 7:11 am
  
  Hey Tahir — can you elaborate more on what you mean by “where to use this document scanner”? I’m not sure I understand what you mean.
Nicky Fandino

October 3, 2016 at 3:15 am

Hi, Adrian. It’s been a while since you created this blog. I got a question for you. I want to build this kind of paper scanner myself. Do you think it’s possible to make the scanner detect some sort of QR codes at each corners of the paper ? And then use the qr codes as the corner of the digitized paper instead of using edge detection like yours. I could use some help. Thanks before, btw
- Adrian Rosebrock
  
  October 3, 2016 at 7:11 am
  
  Absolutely — as long as you can detect the four corners and recognize them properly, that’s all that matters. I don’t have any QR code detection blog posts, but you might want to take a look at the ZBar library. Once you detect the markers, order them as I do in this blog post, and then apply the transformation. I would suggest starting with an easier marker than QR codes just to understand the general process.
  - Nicky Fandino
    
    October 3, 2016 at 10:15 pm
    
    Thanks for replying. I’m quite new to image processing, so maybe I need to ask a few questions. I know how to detect a certain shape or a square, but I never try to detect 4 squares. What’s the easiest way to do this ? Can you send me a link to your blog that explain this or some other blog maybe ? Also, how can I order them and then apply the transformation ? Is that what line 41 is ?
    - Adrian Rosebrock
      
      October 4, 2016 at 6:54 am
      
      If you’re trying to understand how to order coordinates, start here.
      
      From there, read this post on applying a perspective transform.
      
      As for detecting squares, the simplest method is to use contour approximation and examine the number of vertices as this blog post does. I also have an entire blog post dedicated to finding shapes in image.
      
      I hope that helps!
      - Nicky Fandino
        
        October 4, 2016 at 9:18 pm
        
        Thanks, this really helps !
Oleg Kersh

October 10, 2016 at 3:54 pm

Hi Adrian,
Thank you for the very cool article. I am actually trying to port your code to android (using opencv 3.1 and the android bindings) but I have got stuck at tje step of applying the Canny filter.
Although I am using the very same parameters as you are (and also downscale images to 500 rows) the edge detector does not seem to detect horizontal edges of the paper even though there is good contrast and the background is not busy.
It is strange, because vertical and angled edges are picked up nicely.

I have went even as far as lowering the tresholds to 10 and 20 and while it produces tons of false edges (as expected) it does not produce more than a handful of dots from the horizontal or near-horizontal edges.

I suppose i am missing something trivial. I have even tried the opencv android sample app and its canny does pick up edges nicely.

Your help would be really appreciated.
- Adrian Rosebrock
  
  October 11, 2016 at 12:56 pm
  
  That is quite strange, although I must admit that I do not have any experience working with the Java + OpenCV bindings outside of testing them out by writing a program to load and display an image to my screen. This does seem like a Java specific issue so I would suggest posting on the OpenCV forums.
Hariyama

October 14, 2016 at 8:46 pm

Hi, I am a beginner on Python.

I have a question on Line 38:
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:5].

How does the description “[:5]” work?
I understand “sorted” function, but I cannot understand it.

Sorry for bothering you.
- Adrian Rosebrock
  
  October 15, 2016 at 9:53 am
  
  The [:5] is just an array slice. It simply takes the first 5 elements in the list and discards the rest. You can learn more about array slicing here.
fabio

October 18, 2016 at 9:40 am

Hi

Thanks a lot for the very informative post. Could you elaborate a bit on why you resize the image before the edge detection and why exactly to a height of 500 pixels? Because I tried your technique on a couple of images with and without resizing to 500 pixels. It worked perfectly for the resized images but for the original ones (they were bigger) the edge and contour detection failed horribly. I would probably just need to tune the parameters a bit differently?

Thanks!
- Adrian Rosebrock
  
  October 20, 2016 at 8:54 am
  
  In computer vision and image processing we rarely process images larger than 600 pixels along the maximum dimension. While high resolution images are appealing to the human eye they are simply too much detail for image processing algorithms. You can actually think of resizing an image as a form of “noise removal”. We discard much of the detail so we can focus our attention on a smaller version of the image that still contains the same contents — but with less “noise”, leading to much better results.
Harry

October 27, 2016 at 2:20 pm

Getting error : No module named pyimagesearch.transform.

Please help/
- Adrian Rosebrock
  
  November 1, 2016 at 9:24 am
  
  You need to download the source code to this blog post using the “Downloads” section of this tutorial. You are likely forgetting to create the pyimagesearch directory and put a __init__.py file inside of it.
jagdish

November 7, 2016 at 7:45 am

hey there
which all libraries are you using ?
which version
thanks man
- Adrian Rosebrock
  
  November 7, 2016 at 2:44 pm
  
  This blog post assumes you are using OpenCV 2.4 and Python 2.7. The code can be easily adapted to work with OpenCV 3 and Python 3 as well.
sushma

November 23, 2016 at 11:26 pm

Hi, Can i get same concept ( Mobile Document Scanner) on Android…
- Adrian Rosebrock
  
  November 24, 2016 at 9:36 am
  
  I only cover Python + OpenCV on this blog. You would need to convert it to Java + OpenCV for Android.
Michael

December 1, 2016 at 6:18 am

Hi,
Thanks for the awesome and comprehensive tutorial!
I actually want to apply this to some photo of receipts, but unfortunately not all the corner is inside the image (there is even a photo where not even one of the corner is on the image).

Is there a way to use the four_point_transform in this case? If yes, how to do it and if not is there any good way to deskew the image?

Thanks!
- Adrian Rosebrock
  
  December 1, 2016 at 7:20 am
  
  If you lack the corners you can apply the perspective transform to the entire image, although I really don’t recommend this. Otherwise, you can try to deskew the image. I’ve been meaning to do a tutorial on this, but the gist is that you threshold the image to reveal the text, then computing a rotated bounding box around a text region. The rotated bounding box will give you the angle that you can correct for. Again, I’ll try to do a tutorial on this in the future.
Neil

December 8, 2016 at 7:05 pm

Hi Adrian

I’m struggling to understand how exactly the number of vertices are approximated in lines 43 and 44. Would you mind explaining this?

Many thanks
- Adrian Rosebrock
  
  December 10, 2016 at 7:19 am
  
  First, we compute the perimeter of the contour. We then take the perimeter and multiply it by a small percentage. The exact value of the percentage may take some fiddling based on your dataset, but typically values between 0.01-0.05 are common. This percentage controls the actual approximation according to the Ramer-Douglas-Peucker algorithm. The larger epsilon is, the less points included in the actual approximation.
Umair

December 21, 2016 at 1:36 pm

Hi,
I am working on a similar project and your tutorials are of great help.

My goal was to detect the total price mentioned in a receipt. How can we achieve that goal so that i can easily detect the price? Any input will be much appreciated.
- Adrian Rosebrock
  
  December 23, 2016 at 10:58 am
  
  It sounds like you are trying to apply Optical Character Recognition (OCR) which is the process of recognizing text in images. This is a very extensive (and challenging field). To start, I would suggest trying to localize where in the image the total price would be (likely towards the bottom of the receipt). From there you can apply an OCR engine like Tesseract or the Google Vision API.
Arturo

December 30, 2016 at 5:25 pm

Thanks Adrian!! The practical uses for Computer Vision techniques are amazing. I like them. Do you have a post or any suggestion on how to load the python code on android mobile cell phones? regards!
- Adrian Rosebrock
  
  December 31, 2016 at 1:17 pm
  
  Unfortunately, I don’t know of a way to use OpenCV + Python on a mobile devices. I would instead suggest creating a API endpoint that can process images and have the mobile app call the API.
Eric

January 4, 2017 at 9:53 pm

Hi, it’s a great tutorial. May I ask instead of using skiimage adaptive thresholding, is it possible to use the adaptive thresholding in cv2, such as

cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,2)

as there is a problem importing skiimage python package to AWS Lambda. If it is possible, it would be great if the parameter in cv2 function could be provided to best suit this mobile scanner application . As I am new to opencv and skiimage. Any suggestions would be appreciated. Thanks in advance.
- Adrian Rosebrock
  
  January 7, 2017 at 9:42 am
  
  You can certainly use OpenCV’s adaptive thresholding algorithm, that’s no problem at all.
Jonathan

January 9, 2017 at 10:48 am

Hi there, thanks for the great piece of work. I am having trouble to reliably find the countour. Would there be a way to have cv to default to image boundaries or may be bounding rectangle if the countours could not be approximated? Thanks again so much and happy new year
- Adrian Rosebrock
  
  January 10, 2017 at 1:11 pm
  
  If you are having trouble finding the contour of the document then I would suggest playing with the edge detection parameters. If the script cannot find this contour then there isn’t really a way to “default” any coordinates unless you were working in a fixed, controlled document where you know exactly where the user was placing the document to be scanned.
- Alexander Chebykin
  
  January 15, 2017 at 9:35 am
  
  You could just check if value in screenCnt is None, and in case it is, default to the whole image. Also I found that blurring the edged image before approximation makes approximation better.
Alexander Chebykin

January 15, 2017 at 9:21 am

Hi. Thanks for the article – it was really helpful.
I went further by adding OCR, and optimizing the code for that matter.
My code and thoughts on the subject can be found here: https://awesomelemon.github.io/2017/01/15/Document-recognition-with-Python-OpenCV-and-Tesseract.html
I hope it will be useful for people who want to make the next step.
- Adrian Rosebrock
  
  January 15, 2017 at 11:59 am
  
  Thanks for sharing Alexander!
Murthy

January 30, 2017 at 11:44 pm

Hi Adrian,
Thanks for this. This was a good beginning to learn OpenCV.
I struggled to install opencv on Mac but then was successful in doing so on a linux box.

I used Python 2.7 and opencv 3.1.0

Line 37 : (cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

in this example gave me error (ValueError: too many values to unpack)

I changed left hand side of line 37 to:

(_, cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

and it worked!
Sergi

February 8, 2017 at 2:59 pm

Hi Adrian, I followed your tutorial on installing openCV on RPi3, so there it is working into a virtual environment.
Now I find that I can use scipy, skimage and I also tried with sklearn, only outside the virtual environment, inside the v.environment those packages are not found. I have done many times to “pip uninstall” those packages and install them again from the virtual environment, but nothing changes.
Maybe you have some tips since I am completely lost.
- Adrian Rosebrock
  
  February 10, 2017 at 2:08 pm
  Hey Sergi — it sounds like you have installed SciPy, scikit-image, etc. into the global site-packages directory for your system. You mentioned installing them into your virtual environment, but I don’t think the commands worked:
```
$ workon cv # important! access your virtual environment first
$ pip intall numpy
$ pip install scipy
...
```
qkzk

March 22, 2017 at 3:03 pm

Hello ! Thank you for the wonderfull work.
I followed your raspberry pi installation guide and managed to install CV 3 on my rpi3 (raspbian jessie with pixel). I went here and downloaded your code.

I’m not able to install scikit-image package in the virtual environment.
I did :

workon cv
pip install -U scikit-image (with sudo too)

first try : failed, a lot of garbage from python but only “memory error”.
second try : No eggs found in /tmp/easy_install-V0e5rR/Cython-0.25.2/egg-dist-tmp-dt9B8k (setup script problem?)

But it keeps failing. I managed to install scikit-image out of the vitual environment with apt-get but, obviously, it doesn’t work in cv…

pip is working in the cv : workon cv, pip install scipy
python
>>> import scipy
doesn’t return an error. It just took 3 hours 🙂

—
I’ve read crazy solutions like apt-get download the packages and install them manually but I’d like to find a cleaner workaround.

Any idea ?
- Adrian Rosebrock
  
  March 23, 2017 at 9:31 am
  
  If you are getting a memory issue, try using the --no-cache-dir when installing via pip:
  
  $ pip install scikit-image --no-cache-dir
  
  This should alleviate the issue.
Vineet

March 24, 2017 at 8:18 am

Hello Adrian, can you tell me how to get the transform module? Also, how to set up the module so that I can use it for other programs as well?
- Adrian Rosebrock
  
  March 25, 2017 at 9:25 am
  
  Use the “Downloads” section of this blog post to download the example code (including the “pyimagesearch” and “transform” mini-modules for the example).
Guy

March 25, 2017 at 11:54 am

Hi!
I don’t know if you’re still active
but anyway
just wanted to thank you
me and my friend are doing a project
and you’re code really saves us
without you we were lost

so – making long story short –
THANK YOU VERY VERY MUCH
YOU’RE OUR SAVIOR!!!
- Adrian Rosebrock
  
  March 28, 2017 at 1:11 pm
  
  Thank you Guy, I’m happy the tutorial helped you out!
Karl Dreyer

March 31, 2017 at 9:53 am

Hello, are there any guides or examples how to use OpenCV in a Xamarin Android environment? I’m working on an Android app and need to find a good alternative from Scanbot and other expensive solutions.
- Adrian Rosebrock
  
  March 31, 2017 at 1:46 pm
  
  Hey Karl — I don’t have any experience with OpenCV + Android environments so I’m unfortunately not the right person to ask regarding this question.
Suraj

April 9, 2017 at 6:09 pm

Hello Adrian, i just came across this post its very helpful please do you have anything on OCR?
or is it possible to modified this code for this purpose?
- Adrian Rosebrock
  
  April 12, 2017 at 1:21 pm
  
  I don’t have any posts on OCR right, but I will be covering OCR in the future.
Vikram

April 14, 2017 at 2:47 pm

Can you suggest some tutorial or any source to make an android or IOS app for mobile scanner and integrate with the mobile camera and convert the clicked image into PDF or jpeg
Michiel

April 18, 2017 at 5:45 am

Hi Adrian,

Many thanks for uploading your tutorials! I have a problem with installing the pyimagesearch module though. I am using the jupyter notebook, usually with installing packages I use the following code:

!pip install [PACKAGE NAME]

However, this time this gives te following error: Could not find a version that satisfies the requirement pyimagesearch (from versions: )
No matching distribution found for pyimagesearch

Do you have any idea what might cause the problem?
- Adrian Rosebrock
  
  April 19, 2017 at 12:49 pm
  
  There is no “pyimagesearch” module on PyPI. You need to use the “Downloads” section of this tutorial to download the code + example images, then place the pyimagesearch directory in the same path as your Jupyter Notebook.
Ian

May 5, 2017 at 3:14 am

Just wondering … where is the best place to find Android/iOS developer?
vasanth

May 7, 2017 at 4:38 am

error: argument -i/–image is required while i run the program
- Adrian Rosebrock
  
  May 8, 2017 at 12:26 pm
  
  You need to read up on command line arguments.
  - vasanth
    
    May 8, 2017 at 1:08 pm
    
    i’m using windows
- nisnab udas
  
  July 5, 2017 at 12:25 pm
  
  can u please tell me how to do it, with possible changes/solutions?
vasanth

May 8, 2017 at 2:41 pm

sir how to save the scanned image sir
- Adrian Rosebrock
  
  May 11, 2017 at 9:02 am
  
  I would suggest you use cv2.imwrite to write the image to disk. If you’re just getting started learning the basics of OpenCV and Python, I would absolutely suggest you work through practical Python and OpenCV. This book will help you learn the fundamentals of OpenCV and image processing.
Onkar Pandit

May 10, 2017 at 1:08 pm

Hey, great tutorial.Very informative and easy to understand.
However I am unable to figure out how to use it in android studio to build the app.
Please Help
- Adrian Rosebrock
  
  May 11, 2017 at 8:45 am
  
  Hi Onkar — this blog uses primarily OpenCV and Python. It does not cover Java. I would suggest you read up on Java + OpenCV bindings and convert the code. Alternatively you can build a REST application where the Java app sends the image to be processed on a Python server and the results returned.
Francois

May 30, 2017 at 5:41 pm

The imutils module is not included in the code download for Basic Image Manipulations, so I can’t progress part the first part of this tutorial.
- Adrian Rosebrock
  
  May 31, 2017 at 1:07 pm
  
  Each blog post is independent from the others. You should either use the “Downloads” section of the blog post to download the code or install imutils via pip:
  
  $ pip install imutils
shravankumar

June 12, 2017 at 12:25 pm

Hey chief,

I have successfully tried the code on my machine with the provided images, but when I tried with images downloaded from web, unable to find contours around the sheet. What could be done to optimize the code for any image.

Any help would greatly appreciated

Thank you
- Adrian Rosebrock
  
  June 12, 2017 at 1:03 pm
  
  If there is not enough contrast between the edges of the paper and the background then the paper may not be detected. Furthermore, if paper is noisy/folded/etc. the contour approximation won’t be able to locate the paper region (since it might have more than four vertices). In that case, you would want to train a custom object detector. I’ve also seen work done on using machine learning specifically to find edge-regions (even low contrast ones) in documents. I can’t seem to find the paper though. If I do, I’ll be sure to link you to it.
Adrian Rosebrock

June 27, 2017 at 6:18 am

This code will find the largest rectangular region that has 4 corners. It accomplishes this by sorting the contours by area. If you’re detecting a rectangular region inside the document, then you’ll want to double-check your contour approximation. It’s likely that the contour approximation parameters need to be tweaked since there are likely more than 4 corners being detected on the outer document.
Juan

June 29, 2017 at 12:49 pm

Hello Adrian,

I just signed in for your ten days course, great first class. I am running the code in Python 3.5.2, scikit_image 0.13 and CV 3.2.0 in Ubuntu 16.04

When I run the code, scikit complains with the following warning:

/home/juan/.virtualenvs/cv/local/lib/python3.5/site-packages/skimage/filters/thresholding.py:222: skimage_deprecation: Function “threshold_adaptive“ is deprecated and will be removed in version 0.15. Use “threshold_local“ instead.

def threshold_adaptive(image, block_size, method=’gaussian’, offset=0,
/home/juan/.virtualenvs/cv/local/lib/python3.5/site-packages/skimage/filters/thresholding.py:224: UserWarning: The return value of `threshold_local` is a threshold image, while `threshold_adaptive` returned the *thresholded* image.
warn(‘The return value of `threshold_local` is a threshold image, while

(error cuts there, but the idea is clear)

So I tried to replace threshold_adaptative for threshold_local, but I get a blurry image instead of the black & white. Can’ t figure out why?

Also, if I remove line 66 (just to see what happens) the image I get is “white & black” (inverted, vs. black & white). Also don’t get exactly why? Is due casting when you do the astype?

Finally, all my images are rotated 90 degrees in the screen, also not sure why the differences.

Thanks a lot!!
- Adrian Rosebrock
  
  June 30, 2017 at 8:10 am
  
  Hi Juan — it’s important to understand that this is just a warning, not an error message. It’s simply alerting you that the threshold_adaptive function will be renamed to threshold_local in a future release of scikit-image. The functions should be the same (only with a different name), so I’m not sure why they would give different results. I will look into these and see if any parameters have changed.
  - John Goodman
    
    November 28, 2017 at 12:51 pm
    Hi Adrian and all,
    
    Just wanted to chime in on this.
    
    I’m having the same issue as Juan and it seems that threshold_local doesn’t quite work the same as threshold_adaptive? The work around for threshold_local can be found here:
    http://scikit-image.org/docs/dev/auto_examples/xx_applications/plot_thresholding.html
    
    Here’s how I implemented the doc scanner based on the above scikit doc:
    
    warped = four_point_transform(orig, screenCnt.reshape(4,2)*ratio) warped_gray = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY) block_size = 35 warped = threshold_local(warped_gray, block_size, offset=10) warped_adaptive = (warped_gray > warped).astype("uint8")*255 print("STEP 3: Apply perspective transform") cv2.imshow("Original", imutils.resize(orig, height=650)) cv2.imshow("Scanned", imutils.resize(warped_adaptive, height=650)) cv2.waitKey(0)
    - Adrian Rosebrock
      
      November 28, 2017 at 1:55 pm
      
      Thank you for sharing, John! It looks like scikit-image has deprecated the function. I’ll get this blog post updated.
      - Rajeev Narayanan
        
        January 7, 2018 at 7:33 am
        
        Hi Adrian,
        
        I used the code from John Goodman’s post above. It doesnt work as before. Your code was working perfectly earlier. Now it doesnt work the same way. Could you update the example so it works like before using threshold_local.
        
        Thanks.
        
        regards,
        Rajeev
      - Adrian Rosebrock
        
        January 8, 2018 at 2:49 pm
        
        Hi Rajeev — certainly, I will get the post updated by the end of January.
      - Adrian Rosebrock
        
        January 15, 2018 at 12:11 pm
        
        I just wanted to follow up here and say that the code + associated downloads have been updated to handle Python 2.7/3, OpenCV 2.4/3, and scikit-image.
nisnab udas

July 2, 2017 at 7:24 pm

usage: scan.py [-h] -i IMAGE
scan.py: error: argument -i/–image is required

i encounter this error.

what happens in line 12?? where
should i give image path ? how is it given??please provide example
- Adrian Rosebrock
  
  July 5, 2017 at 6:18 am
  
  Please read up on command line arguments.
Pablo

July 6, 2017 at 7:26 am

Hello Adrian,

I am replicating a similar algorithim in OpenCV4Android. But Im running into a problem that the output I get from the Canny edge detector is not even close to the one you get in terms of edge detection quality. It rarely gets the 4 side edges of the documents.

I have resized the image to an even smaller one and tweeked the parameters from both the gaussian filter and the edge detection. I am even using THRESH_OTSU as a parameter for the Canny edge detection. But no success.

How would you approach this?

Thank you very much
- Adrian Rosebrock
  
  July 7, 2017 at 9:58 am
  
  Hi Pablo — I don’t have any experience using the OpenCV + Java bindings, so it’s hard to provide any substantial guidance here. I would likely speak with the Java bindings developers and explain to them how you are getting different results based on the programming language.
felix jaramillo

July 9, 2017 at 5:46 pm

Sorry but How i make install the package pyimagesearch? pls :/
- Adrian Rosebrock
  
  July 11, 2017 at 6:36 am
  
  You simply use the “Downloads” section of this blog post to download the source code and example images. This download includes the “pyimagesearch” module covered in this post.
Saumya

July 13, 2017 at 1:54 am

What if my Rectangle Paper has many tiny Rectangles on it?
How will it find the longest Rectangle from it?
- Adrian Rosebrock
  
  July 14, 2017 at 7:31 am
  
  Lines 37-50 will find the largest rectangular object in the image.
krishna

July 14, 2017 at 3:01 am

hey Adrian,
In the cases where receipt’s or paper’s image is not exactly rectangular
or if it’s outer edge is not exactly contour
How should I process the Image?
- Adrian Rosebrock
  
  July 14, 2017 at 7:19 am
  
  If the outer edge of the document is not rectangular I would suggest being more aggressive with your contour approximation. You will need to reduce the number of points to four in order to apply the perspective transform.
shrey malhotra

July 24, 2017 at 2:31 pm

When ever i try to run the code i get;
Traceback (most recent call last):
File “scan.py”, line 40, in
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack (expected 2)
What should i do?
- Adrian Rosebrock
  
  July 24, 2017 at 3:26 pm
  
  Please read the other comments before posting. Your question has been addressed multiple times in previous comments. See my reply to “Mohd Ali” for more details.
Nick

August 22, 2017 at 4:22 pm

I was trying to draw the contour on the original image (not scaled down), but I must be doing something wrong. How would one go about doing this? A couple attempts of mine…
cv2.drawContours(orig, [np.multiply(screenCnt,ratio)], -1, (0, 255, 0), 2)
cv2.drawContours(orig, [screenCnt * ratio], -1, (0, 255, 0), 2)
Both of these produce an error saying (-215) npoints > 0 in function drawContours
Zubaer

August 23, 2017 at 3:48 pm

Awesome post. How would you proceed on transforming the perspective of whole image?

Let’s say –

1) we detected the reciept with a set of four co-ordinates.
2) next step I assume to provide with four co-ordinates corresponding to this receipt in the perspective corrected output image
3) next step is to acquire the relationship (homography matrix) between these two sets of four co-ordinates
4) next to calculate the resultant image size
5) next to warp the image.

Is this how I should do it? Let me know if I am missing anything.
- Adrian Rosebrock
  
  August 24, 2017 at 3:34 pm
  
  Your question actually reminds me of this StackOverflow question on computing a homography matrix to transform an entire image. I would start there.
Carlos

August 24, 2017 at 7:06 pm

Hi Adrian! Today I was porting your example to C++ and you can find it in this link

https://github.com/devtodev/TicketScanner/tree/master/src

Could you shine a light on me and tell me why the result image of my program doesn’t seen well as the output of your program?

Thank you!
- Adrian Rosebrock
  
  August 25, 2017 at 12:45 pm
  
  Hi Carlos — thanks for sharing the C++ implementation, I’m sure many other PyImageSearch readers will benefit from this.
  
  As for your result, I think you adaptive thresholding may be incorrect (disclaimer: I didn’t look at the code, just the output).
  - Carlos
    
    August 25, 2017 at 2:15 pm
    
    You are welcome! Your website is a great source of inspiration and learning I am happy to contribute at least a little bit.
Sophia G

September 28, 2017 at 5:26 pm

I just have a question as to how could you actually grab the text and input it into a variable in swift and have it print out? In other words how could you recognize the words on a picture that is taken, and directly input it into code ?
- Adrian Rosebrock
  
  October 2, 2017 at 10:28 am
  
  I don’t cover Swift programming here on PyImageSearch, but the process you are referring to is called Optical Character Recognition.
Dars

September 29, 2017 at 9:45 am

Hello do you have a bubble sheet scanner and to get the data in the sheet like a,b,c,d,e. Sorry for the english…. And how to implement in android
MichaelCu

October 2, 2017 at 4:12 am

Hi Adrian,

I stumbled on your blog post years later. It’s been very educational and informative for me.
Since I didn’t see anyone posting a Flask version of this, I wanted to share a quick and dirty way to use your function with Flask.

Basically make a post request with a file in the body and get back the scanned image as a response.

I hope others find it helpful.

https://github.com/zenners/flask-document-scanner

Cheers
- Adrian Rosebrock
  
  October 2, 2017 at 9:24 am
  
  Awesome, thanks so much for sharing MichaelCu!
Erik

October 9, 2017 at 8:49 am

Hi Adrian,

Great and informative post. I do have a question though from my experiments with the code. It seems very fragile if there is some occlusion of an edge (say we capture just the document, and only one side has some background). It also seems fragile if the canny edge detector gets most of the outline of the document but finds a break in one of the edges (say I’m holding the paper in my hand).

In these cases, it seems the
if len(approx) == 4:
screenCnt = approx
break
doesn’t work.

Do you have any advice for handling these cases?
- Adrian Rosebrock
  
  October 9, 2017 at 12:12 pm
  
  You are correct, you need a good segmentation in order to perform the contour approximation test. You might want to consider training your own custom object detector that specifically detects rectangle/document-like objects.
Shashank

October 12, 2017 at 8:45 am

Hey,
You told about an Android app. Is it possible to create an android app using python. Any suggestions regarding it.

Thank you,
- Adrian Rosebrock
  
  October 13, 2017 at 8:42 am
  
  No, not easily. The problem is getting OpenCV + Python to interface together on the mobile app. I would suggest re-creating the app using the Java + OpenCV bindings for Android.
Ronak Shah

October 16, 2017 at 6:17 am

Hi Adrian,

I’ve had couple of doubts about parameters used in he openCV codes. How do you choose the optimal parameters for Canny function and how it affects the efficiency of edge detection?

How do you Define the Kernel size for gaussian filter?

How do you choose the optimal epsilon for approxPolyDP function?

Your help will be much appreciated!
Thanks
- Adrian Rosebrock
  
  October 16, 2017 at 12:17 pm
  
  The short version is that you need to experimentally tune the values to both the Canny edge detector in approxPolyDP. The values I use here tend to work will if there is sufficient contrast between the background and foreground.
muni

October 16, 2017 at 4:02 pm

can you please tell me how to get accuracy rate from the given image in this process??

Thanks
Ankita

October 19, 2017 at 11:40 pm

Hello sir,

I am ankita pursuing my undergraduate degree in computer science , i have been following all your projects and feel they are great, i would be glad if you could help me in doing a mini project on fingerprint matching using python and opencv.

Thank you,
Alex

November 5, 2017 at 6:17 pm

Hi Adrian,

This was the first of the free lessons given to me from your course.

I unfortunately was not able to jump right in, as I do not see any mentioning of how to install the pyimagesearch package in my windows python 2.7 (with openCV). There is no mentioning on your website from what I can find. Can you direct me to how I would install it through console? Thanks!

Alex
- Adrian Rosebrock
  
  November 6, 2017 at 10:31 am
  
  Hi Alex — please use the “Downloads” section of this tutorial to grab the source code + example images. This will enable you grab the “pyimagesearch” module.
Julian

December 5, 2017 at 5:39 pm

Hi Adrian,

Thanks for the awesome article – very informative and well explained. I managed to build a in-browser working version in opencv.js in just a little bit more than 5 minutes 😉

Cheers,
Julian.
- Adrian Rosebrock
  
  December 8, 2017 at 5:13 pm
  
  Congratulations Julian, nice job!
- Mo
  
  May 17, 2018 at 12:36 pm
  
  Hi Julian do you mind sharing how you were able to achieve the in browser version.
Wasabi Lee

December 19, 2017 at 9:04 am

Hello,

I’m currently working on a project for a contest and your blog really helped me quicken the process. So far everything went flawless but atm I’m encountering a problem regarding skimage. I’m trying to make a image be readable for tesseract but so far I somehow struggle on getting the same results as you did by using the adaptive threshold function. I even tried adding multiple filters before that but noting seemed to work.

Would really appreciate the help!
- Adrian Rosebrock
  
  December 19, 2017 at 4:12 pm
  
  Hey Wasabi, thanks for the comment. I’m not sure what the exact issue is in this case. Which version of scikit-image are you using?
Albe

December 22, 2017 at 10:00 am

Hello

Thanks for this article. I’m new with python and I’m having trouble with cv2.waitKey(0). I have all the modules installed but it freeze.

I would really appreciate the help!
Albe
- Adrian Rosebrock
  
  December 26, 2017 at 4:42 pm
  
  Hey Able — make sure you click the active window opened by OpenCV and then press a key on your keyboard. This will advance the script.
Tript

December 23, 2017 at 3:35 am

Hi,
when I am running this code my image after thresholding isn’t as clear as yours. Could you please clear this to me, why is this happening?
- Adrian Rosebrock
  
  December 26, 2017 at 4:33 pm
  
  Are you using the same example images as the ones in this post? Or are you using different images?
Gabriel Gonzalez

January 20, 2018 at 7:57 pm

Hello Adrian,

I want to thank you for posting this tutorial.
I will be using your code to scan pictures of music sheets in order to make a Nao robot play simple melodies on piano. This will be done by generating a MIDI file from the scanned sheet.

Can you tell me how to properly credit you? I was thinking of

“Credits to Adrian Rosebrock on pyimagesearch.com”

Is that OK?

Thanks,
Gabriel
- Adrian Rosebrock
  
  January 22, 2018 at 6:27 pm
  
  Hi Gabriel — that is a perfectly fine credit, thank you. If you publish your code on GitHub or your own personal site I would appreciate a link back to PyImageSearch from the readme file/webpage.
hamid

February 5, 2018 at 9:41 am

Hi Adrian
There are some cases that the intended contour isn’t a closed one. Is there any way to handle such cases using openCV’s functions or I should write my own algorithm?
- Adrian Rosebrock
  
  February 6, 2018 at 10:14 am
  
  Try using morphological operations such as a dilation or closing operation to close the gap between the contours.
  - hami
    
    February 8, 2018 at 7:12 am
    
    Another problem is that paper contour is connected to other contours in the background. this is the link for canny result: ‘https://ibb.co/js2PFc’ if you want to see. I know the background is not suitable for the purpose but document scanner apps (like CamScanner) are able to detect the paper even in such cases. I think your assumption is not completely safe at least for a publishing app.
    To solve this problem I used a houghline transform to detect lines, but then I don’t know how to extract that final four points. Do you have any idea?
    - Adrian Rosebrock
      
      February 8, 2018 at 7:44 am
      
      This solution is certainly not recommended to be used in a production level app. It’s meant to be a tutorial to help you get started and learn the fundamentals of how these apps work. Production-level apps would use a bit of machine learning, at least an object detector and maybe even semantic segmentation to help detect the document.
Oliver

February 7, 2018 at 9:43 am

Hi Adrian, Everytime I run scan.py I am given this message

usage: scan.py [-h] -i IMAGE
scan.py: error: the following arguments are required: -i/–image

can you please help?
- Adrian Rosebrock
  
  February 8, 2018 at 7:57 am
  
  Please take a look at the comments before posting as I have addressed this issue a few times. See my reply to “vasanth” on May 7, 2017 to get started.
Dinesh Kumar

March 22, 2018 at 2:58 am

Hi Adrian, I have one doubt in step 2..This code is only use for rectangular shape objects?
can i use the same logic for t shirt shape images?
- Adrian Rosebrock
  
  March 22, 2018 at 9:36 am
  
  Correct, this method is meant to be used for rectangular shaped objects. You could technically use something similar for t-shirt detection but that would require you to obtain a very nice, clean segmentation of the t-shirt. You should instead consider training your own custom object detector or perform some sort of pixel-wise segmentation using deep learning, such as Mask R-CNN or UNet.
mohit

March 28, 2018 at 3:20 am

i’m getting this when i run this code:

usage: scan.py [-h] -i IMAGE
scan.py: error: argument -i/–image is required

Please help
- Adrian Rosebrock
  
  March 30, 2018 at 7:23 am
  
  You can solve this error by reading up on command line arguments.
Leks

April 9, 2018 at 1:29 pm

Good Tutorial. I try to implement it in android. My goal is to Scan a national Id Card. I’m able to draw on the contour but sometimes it draws in information inside my card too. Do you have any suggestion how to avoid it?
- Adrian Rosebrock
  
  April 10, 2018 at 12:03 pm
  
  I would suggest tuning your contour approximation values. It sounds like the code is finding areas inside your card that has four vertices but the card itself does not have four vertices (at least according to the approximated contour).
  - Leks
    
    April 11, 2018 at 5:11 pm
    
    Hi Adrian,
    Thanks for your feedback. Finally, I did it by looping trough all the countour detected ( boundy rectangle and get the largest rect (Width and height).
    Have you written some articles describing how to capture(auto capture) image from camera when Threshold are found (distance between camera and object) lighting condition?
    Any suggestions will be appreciated.
    Regards….
    - Adrian Rosebrock
      
      April 13, 2018 at 6:57 am
      
      I would start with this blog post. You can modify it to capture images/frames when the distance reaches some threshold.
lucifer

May 4, 2018 at 10:17 pm

Hi Adrian,
i’m getting this when i run this code:
STEP 1: Edge Detection
Traceback (most recent call last):
…
cv2.drawContours(image, [screenCnt], -1, (0, 255, 0), 2)
NameError: name ‘screenCnt’ is not defined
STEP 2: Find contour of paper

Process finished with exit code 1
can you give me a help,thank you.
- Adrian Rosebrock
  
  May 9, 2018 at 10:28 am
  
  I get the impression that you may have copied and pasted the code rather than using the “Downloads” section of this post to download the code. Make sure you use the “Downloads” section to download the code — this will help reduce any copy and paste errors as I believe what happened here.
Mo

May 17, 2018 at 12:37 pm

Hi Adrian,

This is an awesome resource, Wondering if you have any suggestions on how to get your code to work for a web app
- Adrian Rosebrock
  
  May 22, 2018 at 6:52 am
  
  I’m not sure how to answer this question as the term “web app” is pretty loose. What exactly is the goal of your web app?
jatin

May 23, 2018 at 1:33 am

hi Adrian ,fantastic way to explain things and show how it is done.

How to install pyimagesearch module for four_point_transform as it shows no module named pyimagesearch .

Or We need to write full code under the function four_point_transform in order to use it
- Adrian Rosebrock
  
  May 23, 2018 at 7:08 am
  
  Hey Jatin, make sure you use the “Downloads” section of this blog post to download the source code. It will include the functions/modules that you need.
Bill Harvey

May 30, 2018 at 8:26 am

Adrian;

I posted a comment on an error however I thought I had resolved it by amending the imports as follows:
Instead of importing as in your example:
from skimage.filters import threshold_local

I did the following:
from skimage import filters

I thought this would get around the “ImportError: cannot import name threshold_local”

I also amended the script as per John Goodman’s post (isn’t he an actor 🙂 )

I thought this worked as I could then get all the way past step 2 however as soon as it reads the threshold_local it fails with ‘NameError name threshold_local is not defined”

Any ideas?
- Adrian Rosebrock
  
  May 31, 2018 at 5:05 am
  
  Hey Bill, can you check which scikit-image version you are using? I think you may be using an older version. Please check and let me know.
Adrian Rosebrock

May 31, 2018 at 5:06 am

Hey Bill, please see my reply to your other comment. I believe you are using an older version of scikit-image. Please double-check your scikit-image version and let me know which one you are using.
Jesudas

June 13, 2018 at 11:51 am

thanks for the article, with explanations and code. Loved creating my first program with openCV and python.

Was amazing to see the results, using the scanner on different kinds of receipts and documents, with different margins, backgrounds.

thanks !
- Adrian Rosebrock
  
  June 15, 2018 at 12:38 pm
  
  Thank you for the kind words, Jesudas. I’m happy you enjoyed the tutorial 🙂
Chintan Gandhi

July 1, 2018 at 5:30 am

Hi Adrian. Thanks for your crisp, clear and well-explained blog posts.

As in the “Where to next?” section, I tried incorporating the OCR using pytesseract with reference to your article: https://pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/.

However, I couldn’t get the pytesseract library’s “image_to_string” function to work on the output of this article’s code: the scanned image. Could you please suggest as to what might be wrong with my approach?
- Adrian Rosebrock
  
  July 3, 2018 at 8:35 am
  
  Could you be more specific in what you mean by getting the function to work? Is the function returning an error? Is the OCR output not what you expect?
  - Chintan Gandhi
    
    July 4, 2018 at 1:11 pm
    
    There is no OCR output. The “image_to_string” function does not convert the image into text, so no output is seen.
    - Adrian Rosebrock
      
      July 5, 2018 at 6:26 am
      
      In that case the Tesseract library likely cannot OCR any of the text and hence returns an empty string. You linked to my previous post on using Tesseract + Python together so to confirm the issue, you should run the code on the example images in the post. If it doesn’t work then there is a problem with your Tesseract install. If it does work then you know Tesseract simply cannot OCR your input image.
Kent

July 2, 2018 at 8:23 pm

Adrian, Thank you for this really good tutorial. It needed some exploration with collecting the transform function from the previous blog entry and tweaking the code/reviewing the comments to ascertain just what each line is doing, but taking apart examples is how you learn best.

Could you please update line 29 of “Edge Detection” from a Python 2 print statement (without parenthesis) to match the other python 3 print statements?
- Adrian Rosebrock
  
  July 3, 2018 at 7:18 am
  
  Done! I must have missed that when I updated the post. I have confirmed the source code download is correct though. Thanks Kent!
Michael Adjeisah

July 17, 2018 at 10:36 am

Hi Andrian,
Can you help me on how to Accessing the Kinect Camera with OpenCV and Python?
Thanks.
- Adrian Rosebrock
  
  July 20, 2018 at 6:54 am
  
  Sorry, it’s been a long, long time since I’ve used OpenCV, Python, and the Kinect camera. Any suggestions I would have here would be out dated. I may consider doing a tutorial on it in the future but I’m not sure if/when that may be.
Syifa Hersista

August 13, 2018 at 3:51 am

Hi Adrian!
Thank you for your post! It’s very helpful!

I am currently trying your code now, and trained with some objects.
I just wonder, what if the background is lighter than the object, for example on white table.
Somehow it cannot calculate the right “screenCnt”.
I realized when use Canny Edge, it cannot detect the edge of object.
Then, I tried to use another preprocessing beside Gaussian Blur or Grayscale (like dilate, and threshold), only detects one side of the edge (depend on the light). Can you suggest me how to detect the good edge on the lighter background?
Thank you!
sivaani nagarajan

August 15, 2018 at 7:32 am

how can i create an app using this code?
- Adrian Rosebrock
  
  August 15, 2018 at 8:13 am
  
  There are two options:
  
  1. Use the Python code sitting behind a REST API. Your mobile app would upload the original image and the API can return the scanned image.
  2. You can convert the Python code to the native language of your mobile device.
Ashish

August 17, 2018 at 9:18 am

Awesome blog, helped me a lot in getting started with Opencv.

Thank you,
Adrian Rosebrock
- Adrian Rosebrock
  
  August 17, 2018 at 11:08 am
  
  Thanks Ashish, it is my pleasure to help!
rick5

August 21, 2018 at 10:13 pm

Awesome Adrian Rosebrock, you are so generous.
There is some bugs, if it can not detect 4 count of corner, the value parameter ‘screenCnt’ can’t be define and set value.
This would cause crash.
zhanglu

September 10, 2018 at 11:36 pm

Hi Adrian!
Thank you for your post! It’s very helpful!

When the background and target gray value are similar, how can we find out the edge as much as possible by adjusting canny‘s parameters?
- Adrian Rosebrock
  
  September 11, 2018 at 8:06 am
  
  This method assumes there is sufficient contrast between your background and edge of the document itself. You might want to try histogram equalization, edge detection, and non-maxima suppression. A Hough lines transform may also help.
Nilesh

September 18, 2018 at 12:03 pm

hey 🙂
can you help me to understand argparse
I am not getting that part correctly.
- Adrian Rosebrock
  
  October 8, 2018 at 1:38 pm
  
  No problem. Read this tutorial on argparse and you’ll be up to speed in no time 🙂
Lady

October 4, 2018 at 1:05 pm

hello I have a problem when I run the program it say: No module named ‘scipy’
What I can do?
- Adrian Rosebrock
  
  October 8, 2018 at 10:11 am
  
  Make sure you install SciPy on your system via pip:
  
  $ pip install scipy
Const

October 13, 2018 at 3:49 pm

Hey Adrian, thanks for great example!

I’ve been following this as day 3/17 and opted to download example code/library/image but to actually type everything from scratch following your guidelines and code examples (for learning sake, but I digress).

Issues I’ve found so far are really minor but wanted to bring you attention to them:

1) Original provided image of recipe used as-is from downloaded section is rotated (might be issue with macOS / Debian, but it registers as landscape at my end). I had to rotate it 90 degrees to be upright. Since initially I’ve hardcoded rotation angle, it then broke the page example which is properly portrait (incorrectly finding biggest contour) I eventually had to add:

…
# as second required argument (could be handled as defaulted to 0 if omitted ofc)
ap.add_argument(“-r”, “–rotate”, required = True, help = “Rotation angle of the image to be scanned”)
…
# right after initial imread of image
image = imutils.rotate(image, angle = int(args[“rotate”]))

in order to process both original unmodified images (receipt and page) properly (-r 0 for page, -r 90 for receipt).

2) Contrast of gray-scale receipt image was fairly average, so It had to have its threshold value increased from 11 to 81 to achieve results that are a bit weaker but closer to your resulting dark-contrasted images:

…
T = threshold_local(warped, 81, offset = 10, method = “gaussian”)
…

Overall it was excellent read and thanks for great tutorials!
- Adrian Rosebrock
  
  October 16, 2018 at 8:46 am
  
  Thank you for sharing, Const! Could you also share your Python, OpenCV, OS version just in case others run into a similar problem?
Honey

October 14, 2018 at 1:39 am

Please can somebody explain me this line
warped = (warped > T).astype(“uint8”) * 255
and how to write it in node.js as i am building this in node.js.
shafique

October 26, 2018 at 7:58 am

Hi Adrian,
It is a very good blog to understand the basics how to detect four corners then as per corner how to scan our document but it is only able to detect corners when corners are fully available in image but if I have full screen doc image or docs which are little bit smaller or some one holding docs then it is not able to detect corners and because of that it is not able to scan those images could you please help me how I can achieve scanning for these type of images.
Ripon das

October 28, 2018 at 2:13 pm

Please give a minute.
I am newbie in image processing. I want to process image with a raspberry pi. I have just setup raspberry pi. Tell me please which software package is require to install for opencv+python. For getting started with raspberry pi image processing.

Thank you in advance.
- Adrian Rosebrock
  
  October 29, 2018 at 1:20 pm
  
  I provide a number of OpenCV install tutorials here. Give them a look, they will help you install OpenCV on your Pi.
James

October 31, 2018 at 6:31 am

You almost made it sound like it’s possible for someone without a computer or tech background to actually nail this sort of stuff! Awesome.
- Adrian Rosebrock
  
  November 2, 2018 at 7:37 am
  
  Thanks James! 🙂
RAMJAN RAEEN

November 4, 2018 at 3:55 pm

HI! Adrian
First of all, I want to say thanks for awesome tutorials.

Here I’m getting an error to not install pyimagesearch package.

Please share me the version of pyimagesearch to install.
- Adrian Rosebrock
  
  November 6, 2018 at 1:23 pm
  
  You can download the “pyimagesearch” package by using the “Downloads” section of this tutorial (it is not distributed on PyPI).
AraS

November 23, 2018 at 5:32 am

Hi Adrian, first thanks for your tutorials. They are super didactic

I ask you a question, could you help me do the same as in this tutorial but using the Hough Transform instead of the four point detection.

Is for a job at the University, i’m just over time and I can’t get me out.

Thank you again for your help!
- Adrian Rosebrock
  
  November 25, 2018 at 9:14 am
  
  Hey there — I don’t have any tutorials on using the Hough Transform method but if this project is for a University I really think you should research it, study it, and put in the hard work yourself. If you are going to be teaching others you need to educate yourself.
Gilles T

November 26, 2018 at 4:44 am

Hi Adrian,

Thank you for your wonderfull job and script!!

I spend lot of time on ggle to find something like that…..

I have one question for my own project, do you think it’s possible to determine contours with some corner image, qrcode, anaything else ??

I mean a function like that : https://blog.my-oxford.com/fr/scan/

Many thanks!

Gilles
- Adrian Rosebrock
  
  November 26, 2018 at 2:27 pm
  
  I actually discuss how to perform QR code recognition with OpenCV in this tutorial.
  - Gilles T
    
    November 29, 2018 at 6:32 am
    
    Thanks Andrian,
    
    Ur answer comes for another question, how to decode EAN128 on capture 🙂
    
    But my first question is about the edge, in this tutorial you have decide to take a capture of the document by detect his corner.
    
    But can you think it’s possible to define “custom corner” detection, like a litle place in this document itself.
    
    Thanks an thanks a lot !
    
    Gilles
    - Adrian Rosebrock
      
      November 30, 2018 at 8:57 am
      
      I would suggest taking a look at keypoint detectors, specifically Harris and GFTT which are designed for corner detection. Along with keypoint matching they can be used to detect specific regions of an object.
Onkar

December 20, 2018 at 2:06 am

how can we save the scanned image (step 3) in the directory from which we get an actual image
- Adrian Rosebrock
  
  December 20, 2018 at 5:11 am
  
  You can use the cv2.imwrite function to save any image you wish to disk.
Vijay Chhuttani

December 20, 2018 at 7:22 am

First of all, thank you for this amazing website. I am starting to learn OpenCV and this site has guided me a lot where to start. I have started learning Python as I am a node guy with some java background.

My question here is, what additional step do you think will be required when doing an ID Scan. We are a background check company and perform OCR on government IDs and validate the data for KYC purpose. But now we are trying to improve our performance and scale. By improving the document orientation and cropping and resizing the ID from the whole image, we can provide clean inputs to our OCR engine.

Other than Edge detection and Transform what other steps do you think will help us to solve this problem. One issue I see is user holding the ID card in hand while taking the photo, causing edge detection issues.
- Adrian Rosebrock
  
  December 27, 2018 at 11:12 am
  
  Thanks Vijay, I’m so happy to hear you are enjoying the PyImageSearch blog 🙂
  
  You may be able to get around many of the scanning issues by using a more powerful OCR engine. Have you tried this tutorial?
Hancer

January 2, 2019 at 12:03 pm

This is really Great and very bried
- Adrian Rosebrock
  
  January 5, 2019 at 9:00 am
  
  Thanks Hancer!
Aamir sohail

January 31, 2019 at 5:02 am

Sir i want to scan pictures from live cam
- Adrian Rosebrock
  
  February 1, 2019 at 6:52 am
  
  You’ll want to combine the code from this post with this one on accessing your webcam.
Ahmed Ahmed

February 13, 2019 at 10:08 am

It doesn’t work with me i got the Error :

scan.py : error : argument -i/–image is required

Should my code to look like that ?

ap.add_argument(“-i”, “–image”, required = True,
help = “C:\\Users\\Ahmed\\Desktop\\image.jpg”)
- Adrian Rosebrock
  
  February 14, 2019 at 12:54 pm
  
  No, that is incorrect. The code doesn’t have to be updated at all. Make sure you read this tutorial on argument parsing before you continue.
Binh

February 25, 2019 at 5:41 pm

Hi Adrian,

There is a error when running args:

error: the following arguments are required: -i/–image
An exception has occurred, use %tb to see the full traceback.

Please help to advise how to fix it.

Thanks,
- Adrian Rosebrock
  
  February 27, 2019 at 5:48 am
  
  You need to supply the command line arguments to the script.
Samvatsar Shastrimath

February 28, 2019 at 5:19 am

Wow.. Superb explaination. Really helpful. Thank you so much!!
- Adrian Rosebrock
  
  February 28, 2019 at 1:37 pm
  
  You are welcome, I’m glad you found it helpful!
michael

March 1, 2019 at 6:35 am

very great information about documents.
sometimes where there are lighting issues the contour method not always works.
and my advice use watershad segmentation to find the shape of document and then perform contours.
for performing this one part is middle of image other is edges of image. then segmentation will give two parts one the document other the background.
https://docs.opencv.org/3.3.1/d3/db4/tutorial_py_watershed.html
Mike

March 2, 2019 at 7:51 pm

Thank you very much for this. I’m adapting it to find multiple documents in a single (scanned) image. I’m finding that each document is actually producing two nearly identical contours, the second slightly smaller than the first.

Any idea what might be causing this? I can just throw out every second contour, but that seems inefficient.

Thanks,
Mike
- Adrian Rosebrock
  
  March 5, 2019 at 8:57 am
  
  I would suggest taking a look at your edge map to see if there is some sort of “double contour” going on. Alternatively, you can check to see if the contour is enclosed within another, and if so, discard it.
Shahudullah

March 7, 2019 at 7:19 am

Hello Adrian,
I have one query. Is it necessary to import imutils? Can I use opencv functions instead of imutils for the same operation?
- Adrian Rosebrock
  
  March 7, 2019 at 4:16 pm
  
  For resizing? Yes, you certainly can. The difference is that the “imutils” function automatically preserves the aspect ratio for you while “cv2.resize” will not.
anirudh

March 10, 2019 at 12:04 pm

how to save that scanned image in system
- Adrian Rosebrock
  
  March 13, 2019 at 3:42 pm
  
  You can use the “cv2.imwrite” function to write an image to disk.
Raja

March 27, 2019 at 8:43 am

How to Split image based on dark center line in the book?
Raj

March 29, 2019 at 1:25 am

Hello Adrian, thanks for your post. It helps to detect the edges of the page perfectly.
Your code works for the single page, but i need to slice the image by detecting the edges and shadow of the center part of the book. Can you help me to overcome this issue.
neha jain

April 23, 2019 at 4:12 am

Is there a way to identify whether an image is scanned or original ?
- Adrian Rosebrock
  
  April 25, 2019 at 8:50 am
  
  Sorry, I’m not sure what you mean? Could you clarify?
Shreyas

May 21, 2019 at 8:33 am

Awesome resource Adrian,THANKS a lot..
I had a question…how would i import ur transform module in google colab
MarcoS

June 13, 2019 at 8:17 am

Excellent article. I also suggest dewarping the pages.

https://mzucker.github.io/2016/08/15/page-dewarping.html
- Adrian Rosebrock
  
  June 13, 2019 at 9:30 am
  
  Nice, thanks for sharing!
Joe

June 16, 2019 at 2:38 am

I have a question? How did you choose the right argument value e.g. when you apply local thresholding
- Adrian Rosebrock
  
  June 19, 2019 at 2:11 pm
  
  Typically you manually tune it by trial and error.
Owen

June 20, 2019 at 9:37 pm

Hey thanks a lot for this guide Adrian! Is there any licensing on this code/could I build on top of it for my own ideas?
- Adrian Rosebrock
  
  June 26, 2019 at 1:50 pm
  
  Yes, but read this first.
Valerio

July 16, 2019 at 5:18 pm

Hello Adrian,
I was wondering why you use the skimage adaptive thresholding method threshold_local as opposed to the opencv one adaptiveThresholding. did you find it in any way better at doing the job?
By the way, awesome site and great OpenCV tutorials. I’m an undergraduate studying Robotics and your tutorials have helped a ton in strengthening my skills.
- Adrian Rosebrock
  
  July 25, 2019 at 9:52 am
  
  No reason other than I like the scikit-image adaptive thresholding method. I find it easier to use and more Pythonic.
Shlok

July 25, 2019 at 12:34 am

Hello Adrian! Thanks a lot for the above algorithm, its amazing to work with such a brilliant code. While i was fiddling around with it on my system, i found out that the algorithm doesnt work very well when white paper is used on white backgrounds, i have tried using every possible method i could think of, either adjusting thresh or blur values, resizing the image, but cant work out any of those. Would be grateful if you could guide me through with this….
Thanks a lot!
William

October 16, 2019 at 11:39 pm

Wow. This is might be the oldest comment left of them all, but as a professional in the industry of scanning documents… This script is great! Props Adrian! I’m curious about what the cost might be to use an OCR engine like Tesseract.
- Adrian Rosebrock
  
  October 17, 2019 at 7:45 am
  
  Thanks William. Although I’m not sure what you mean by “cost”. Tesseract is a free and open source library.
Mr.Li

December 2, 2019 at 5:05 am

I would like to ask you a question. If the text picture has no border and has local distortion and skew and part of the shadow to deal with, I know the text skew can be handled, but the local distortion in the picture has no idea.Do you have any good ideas?thank you
Charles Brabec

January 1, 2020 at 3:45 pm

Adrian, Thanks for the wonderful tutorial. I made a few modifications to my copy and I’d like to offer them as additional next-steps suggestions.

– add params to the program to pass in the expected size of the scanned document
– modify the four_point_transform() function to ensure the output image has the same aspect ratio as the scanned document
– the local_threshold() function uses a block_size of 11, which seems to be too low for higher resolution pictures. Use the size of the warped image to calculate the dpi and use that to adjust the block_size
– add an output parameter to save the final result as a new image suitable for printing

Now I can use my phone camera and this program as a quite usable photocopier. Great fun!
- Adrian Rosebrock
  
  January 2, 2020 at 8:48 am
  
  Great job, Charles!
Bilal

February 3, 2020 at 7:55 am

Hey Adrian, I hope you fine I successfully implemented your code and it’s working fine but the problem is it’s good on your example image when i use my own image it’s good is not good and specially not detecting the Contours Accurately so what’s the alternative of the “ration” that you used between 17-20 lines.
Thanks in Advance.

Best Regards: Bilal
Tong Duc Khai

March 2, 2020 at 10:23 pm

Hi Andrian,
Thanks for post. I followed the guide and downloaded your source code and tried executing but I’ve encountered a problem. It stucked on STEP 1. I’ve been waiting for so long but nothing happened. I use python 3.7 and OpenCV 4.2 on Kubuntu 19.10. Please help me. Thank you so much !
- Adrian Rosebrock
  
  March 4, 2020 at 1:31 pm
  
  Click on the window opened by OpenCV and press any key on your keyboard. The “cv2.waitKey” function call pauses execution of the script until you press a key on the keyboard.
Mayur Satav

April 16, 2020 at 5:44 am

Hi Adrian,

Thank you for your wonderful tutorials! very appreciate your work in computer vision.whenever i visit your website I always learn something useful that I can apply in a number of projects. Right now i am working on my engineering project that is “Data extraction from pdf invoices using computer vision” After searching lots of research papers and resources, but failed to extract text into different parts like Vendor name, Invoice number, item name and item quantity.Could you please tell me how i achieve it?

Thank You!
- Adrian Rosebrock
  
  April 16, 2020 at 7:47 am
  
  Hi Mayur — what referring to is a type of Optical Character Recognition (OCR) problem. I have plans to cover more advanced forms of OCR soon, so stay tuned to the PyImageSearch blog!

Trackbacks

Find distance from camera to object using Python and OpenCV says:

January 19, 2015 at 10:00 am

[…] More on this methodology can be found in this post on building a kick-ass mobile document […]
Zero-parameter, automatic Canny edge detection with Python and OpenCV - PyImageSearch says:

April 6, 2015 at 10:01 am

[…] we’ve used the Canny edge detector a fair amount of times. We’ve used it to build a kick-ass mobile document scanner and we’ve used to find a Game Boy screen in a photo, just two name a couple […]
Sorting Contours using Python and OpenCV - PyImageSearch says:

April 20, 2015 at 10:00 am

[…] We used contours to build a kick-ass mobile document scanner. […]
Target acquired: Finding targets in drone and quadcopter video streams using Python and OpenCV - PyImageSearch says:

May 4, 2015 at 10:01 am

[…] way to detect square and rectangular objects in an image. We’ve used in in building a kick-ass mobile document scanner. We’ve used it to find the Game Boy screen in an image. And we’ve even used it on a […]
Accessing the Raspberry Pi Camera with OpenCV and Python - PyImageSearch says:

May 29, 2015 at 5:29 pm

[…] the dominant colors in an image was (and still is) hugely popular. One of my personal favorites, building a kick-ass mobile document scanner has been the most popular PyImageSearch article for months. And the first (big) tutorial I ever […]
Ordering coordinates clockwise with Python and OpenCV - PyImageSearch says:

March 21, 2016 at 10:00 am

[…] little over a year ago, I wrote one my favorite tutorials on the PyImageSearch blog: How to build a kick-ass mobile document scanner in just 5 minutes. Even though this tutorial is over a year old, its still one of the most popular blog posts on […]
Bubble sheet multiple choice scanner and test grader using OMR, Python and OpenCV - PyImageSearch says:

October 3, 2016 at 10:01 am

[…] special is that we are going to combine the techniques from many previous blog posts, including building a document scanner, contour sorting, and perspective transforms. Using the knowledge gained from these previous […]

Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.

Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog.

Looking for the source code to this post?

How To Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Step 1: Edge Detection

Step 2: Finding Contours

Step 3: Apply a Perspective Transform & Threshold

Python + OpenCV document scanning results

More Examples

Where to Next?

What's next? We recommend PyImageSearch University.

Summary

Did You Like this Post?

Download the Source Code and FREE 17-page Resource Guide

About the Author

405 responses to: How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Trackbacks

Comment section

PyImageSearch University

An interview with Kwabena Agyeman, co-creator of OpenMV and microcontroller expert

Building an Image Search Engine: Searching and Ranking (Step 4 of 4)

OpenCV Sudoku Solver and OCR

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

How To Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Step 1: Edge Detection

Step 2: Finding Contours

Step 3: Apply a Perspective Transform & Threshold

Python + OpenCV document scanning results

More Examples

Where to Next?

What's next? We recommend PyImageSearch University.

Summary

Did You Like this Post?

Download the Source Code and FREE 17-page Resource Guide

About the Author

Reader Interactions

4 Point OpenCV getPerspective Transform Example

Thresholding: Simple Image Segmentation using OpenCV

405 responses to: How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Trackbacks

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?