The first time I ever used the Tesseract optical character recognition (OCR) engine was in my college undergraduate years.
A dataset comprising diverse textual images is necessary for an OCR project. It enables the OCR system to learn different text formats, styles, and orientations, increasing the system’s versatility and effectiveness.
Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.
Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.
You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.
Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.
I was taking my first course on computer vision. Our professor wanted us to research a challenging computer vision topic for our final project, extend existing research, and then write a formal paper on our work. I had trouble deciding on a project, so I went to see the professor, a Navy researcher who often worked on medical applications of computer vision and machine learning. He advised me to work on automatic prescription pill identification, the process of automatically recognizing prescription pills in an image. I considered the problem for a few moments and then replied:
Couldn’t you just OCR the imprints on the pill to recognize it?
To learn how to conduct OCR on your first project, just keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionYour First OCR Project with Tesseract and Python
I still remember the look on my professor’s face.
He smiled, a small smirk appearing on the left corner of his mouth. Knowing the problems I was going to encounter, he replied with “If only it were that simple. But you’ll find out soon enough.”
I then went home and immediately started playing with the Tesseract library, reading the manual/documentation, and attempting to OCR some example images via the command line. But I found myself struggling. Some images were being OCR’d correctly, while others were returning complete nonsense.
Why was OCR so hard? And why was I struggling so much?
I spent the evening, staying up late into the night, continuing to test Tesseract with various images — for the life of me, I couldn’t discern the pattern between images that Tesseract could correctly OCR versus the ones it could fail on. What black magic was going on here?!
Unfortunately, this is the same feeling I see many computer vision practitioners having when first starting to learn OCR — perhaps you have even felt it yourself:
- You install Tesseract on your machine
- You follow a few basic examples on a tutorial you found via a Google search
- The examples return the correct results
- … but when you apply the same OCR technique to your images, you get incorrect results back
Sound familiar?
The problem is that these tutorials don’t teach OCR systematically. They’ll show you the how, but they won’t show you the why — that critical piece of information that allows you to discern patterns in OCR problems, allowing you to solve them correctly.
In this tutorial, you’ll be building your very first OCR project. It will serve as the “bare bones” Python script you need to perform OCR. In future posts, we’ll build on what you learn here.
Starting with OCR demands a varied dataset to understand the complexities of text in images. With a well-curated dataset library, you can expose your models to a wide range of text examples, enhancing their performance.
Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.
Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.
You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.
Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.
With a few images, you can train a working computer vision model in an afternoon. For example, bring data into Roboflow from anywhere via API, label images with the cloud-hosted image annotation tool, kickoff a hosted model training with one-click, and deploy the model via a hosted API endpoint. This process can be executed in a code-centric way, in the cloud-based UI, or any mix of the two.
Over 250,000 developers and machine learning engineers from companies such as Cardinal Health, Walmart, USG, Rivian, Intel, and Medtronic build computer vision pipelines with Roboflow. Get started today, no credit card required.
By the end of this tutorial, you’ll be confident in your ability to apply OCR to your projects.
Let’s get started.
Learning Objectives
In this tutorial, you will:
- Gain hands-on experience using Tesseract to OCR an image
- Learn how to import the
pytesseract
package into your Python scripts - Use OpenCV to load an input image from disk
- Pass the image into the Tesseract OCR engine via the
pytesseract
library - Display the OCR’d text results on our terminal
Configuring your development environment
To follow this guide, you need to have the OpenCV library installed on your system.
Luckily, OpenCV is pip-installable:
$ pip install opencv-contrib-python
If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.
Having problems configuring your development environment?
All that said, are you:
- Short on time?
- Learning on your employer’s administratively locked system?
- Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
- Ready to run the code right now on your Windows, macOS, or Linux system?
Then join PyImageSearch University today!
Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.
And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!
Getting Started with Tesseract
In the first part of this tutorial, we’ll review our directory structure for this project. From there, we’ll implement a simple Python script that will:
- Load an input image from disk via OpenCV
- OCR the image via Tesseract and
pytesseract
- Display the OCR’d text on our screen
We’ll wrap up the tutorial with a discussion of the OCR’d text results.
Project Structure
|-- pyimagesearch_address.png |-- steve_jobs.png |-- whole_foods.png |-- first_ocr.py
Our first project is very straightforward in the way it is organized. Inside the tutorial’s code directory, you’ll find three example PNG images for OCR testing and a single Python script named first_ocr.py
.
Let’s dive right into our Python script in the next section.
Basic OCR with Tesseract
Let’s get started with your very first Tesseract OCR project! Open a new file, name it first_ocr.py
, and insert the following code:
# import the necessary packages import pytesseract import argparse import cv2 # construct the argument parser and parse the arguments} ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd") args = vars(ap.parse_args())
The first Python import
you’ll notice in this script is pytesseract
(Python Tesseract), a Python binding that ties in directly with the Tesseract OCR application running on your system. The power of pytesseract
is our ability to interface with Tesseract rather than relying on ugly os.cmd
calls as we needed to do before pytesseract
ever existed. Thanks to its power and ease of use, we’ll use pytesseract
in this and future tutorials!
Our script requires a single command line argument using Python’s argparse
interface. By providing the --image
argument and image file path value directly in your terminal when you execute this example script, Python will dynamically load an image of your choosing. I’ve provided three example images in the project directory for this tutorial that you can use. I also highly encourage you to try using Tesseract via this Python example script to OCR your images!
Now that we’ve handled our imports and lone command line argument, let’s get to the fun part — OCR with Python:
# load the input image and convert it from BGR to RGB channel # ordering} image = cv2.imread(args["image"]) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # use Tesseract to OCR the image text = pytesseract.image_to_string(image) print(text)
Here, Lines 14 and 15 load our input --image
from disk and swap color channel ordering. Tesseract expects RGB-format images; however, OpenCV loads images in BGR order. This isn’t a problem because we can fix it using OpenCV’s cv2.cvtColor
call — just be especially careful to know when to use RGB (Red Green Blue) vs. BGR (Blue Green Red).
Remark 1. I’d also like to point out that many times when you see Tesseract examples online, they will use PIL
or pillow
to load an image. Those packages load images in RGB format, so a conversion step is not required.
Finally, Line 18 performs OCR on our input RGB image
and returns the results as a string stored in the text
variable.
Given that text
is now a string, we can pass it onto Python’s built-in print
function and see the result in our terminal (Line 19). Future examples will explain how to annotate an input image with the text itself (i.e., overlay the text
result on a copy of the input --image
using OpenCV, and display it on your screen).
We’re done!
Wait, for real?
Oh yeah, if you didn’t notice, OCR with PyTesseract is as easy as a single function call, provided you’ve loaded the image in proper RGB order. So now, let’s check the results and see if they meet our expectations.
Tesseract OCR Results
Let’s put our newly implemented Tesseract OCR script to the test. Open your terminal, and execute the following command:
$ python first_ocr.py --image pyimagesearch_address.png PyImageSearch PO Box 17598 #17900 Baltimore, MD 21297
In Figure 2, you can see our input image, which contains the address for PyImageSearch on a gray, slightly textured background. As the command and terminal output indicate, both Tesseract and pytesseract
correctly, OCR’d the text.
Let’s try another image, this one of Steve Jobs’ old business card:
$ python first_ocr.py --image steve_jobs.png Steven P. Jobs Chairman of the Board Apple Computer, Inc. 20525 Mariani Avenue, MS: 3K Cupertino, California 95014 408 973-2121 or 996-1010.
Steve Jobs’ business card in Figure 3 is correctly OCR’d even though the input image is posing several difficulties common to OCR’ing scanned documents, including:
- Yellowing of the paper due to age
- Noise on the image, including speckling
- Text that is starting to fade
Despite all these challenges, Tesseract was able to correctly OCR the business card. But that begs the question — is OCR this simple? Do we just open a Python shell, import the pytesseract
package, and then call image_to_string
on an input image? Unfortunately, OCR isn’t that simple (if it were, this tutorial would be unnecessary). As an example, let’s apply our same first_ocr.py
script to a more challenging photo of a Whole Food’s receipt:
$ python first_ocr.py --image whole_foods.png aie WESTPORT CT 06880 yHOLE FOODS MARKE 399 post RD WEST ~ ; 903) 227-6858 BACON LS NP 365 pacon LS N
The Whole Foods grocery store receipt in Figure 4 was not OCR’d correctly using Tesseract. You can see that Tesseract has to spit out a bunch of garbled nonsense. OCR isn’t always perfect.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial, you created your very first OCR project using the Tesseract OCR engine, the pytesseract
package (used to interact with the Tesseract OCR engine), and the OpenCV library (used to load an input image from disk).
We then applied our basic OCR script to three example images. Our basic OCR script worked for the first two but struggled tremendously for the final one. So what gives? Why was Tesseract able to OCR the first two examples perfectly but then utterly fail on the third image? The secret lies in the image pre-processing steps, along with the underlying Tesseract modes and options.
Congrats on completing today’s tutorial, well done!
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.