Text Detection and OCR with Microsoft Cognitive Services

This lesson is part of a 3-part series on Text Detection and OCR:

Text Detection and OCR with Amazon Rekognition API
Text Detection and OCR with Microsoft Cognitive Services (today’s tutorial)
Text Detection and OCR with Google Cloud Vision API

In this tutorial, you will:

Learn how to obtain your MCS API keys
Create a configuration file to store your subscription key and API endpoint URL
Implement a Python script to make calls to the MCS OCR API
Review the results of applying the MCS OCR API to sample images

To learn about text detection and OCR, just keep reading.

Looking for the source code to this post?

Text Detection and OCR with Microsoft Cognitive Services

In our previous tutorial, you learned how to use the Amazon Rekognition API to OCR images. The hardest part of using the Amazon Rekognition API was obtaining your API keys. However, once you had your API keys, it was smooth sailing.

This tutorial focuses on a different cloud-based API called Microsoft Cognitive Services (MCS), part of Microsoft Azure. Like Amazon Rekognition API, MCS is also capable of high OCR accuracy — but unfortunately, the implementation is slightly more complex (as is both Microsoft’s login and admin dashboard).

We prefer the Amazon Web Services (AWS) Rekognition API over MCS, both for the admin dashboard and the API itself. However, if you are already ingrained into the MCS/Azure ecosystem, you should consider staying there. The MCS API isn’t that hard to use (it’s just not as straightforward as Amazon Rekognition API).

Microsoft Cognitive Services for OCR

We’ll start this tutorial with a review of how you can obtain your MCS API keys. You will need these API keys to request the MCS API to OCR images.

Once we have our API keys, we’ll review our project directory structure and then implement a Python configuration file to store our subscription key and OCR API endpoint URL.

With our configuration file implemented, we’ll move on to creating a second Python script, this one acting as a driver script that:

Imports our configuration file
Loads an input image to disk
Packages the image into an API call
Makes a request to the MCS OCR API
Retrieves the results
Annotates our output image
Displays the OCR results to our screen and terminal

Let’s dive in!

Obtaining Your Microsoft Cognitive Services Keys

Before proceeding to the rest of the sections, be sure to obtain the API keys by following the instructions shown here.

Configuring Your Development Environment

To follow this guide, you need to have the OpenCV and Azure Computer Vision libraries installed on your system.

Luckily, both are pip-installable:

$ pip install opencv-contrib-python
$ pip install azure-cognitiveservices-vision-computervision

If you need help configuring your development environment for OpenCV, we highly recommend that you read our pip install OpenCV guide — it will have you up and running in a matter of minutes.

Having Problems Configuring Your Development Environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

We first need to review our project directory structure.

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.

The directory structure for our MCS OCR API is similar to the structure of the Amazon Rekognition API project in the previous tutorial:

|-- config
|   |-- __init__.py
│   |-- microsoft_cognitive_services.py
|-- images
|   |-- aircraft.png
|   |-- challenging.png
|   |-- park.png
|   |-- street_signs.png
|-- microsoft_ocr.py

Inside the config, we have our microsoft_cognitive_services.py file, which stores our subscription key and endpoint URL (i.e., the URL of the API we’re submitting our images to).

The microsoft_ocr.py script will take our subscription key and endpoint URL, connect to the API, submit the images in our images directory for OCR, and display our screen results.

Creating Our Configuration File

Ensure you have followed Obtaining Your Microsoft Cognitive Services Keys to obtain your subscription keys to the MCS API. From there, open the microsoft_cognitive_services.py file and update your SUBSCRPTION_KEY:

# define our Microsoft Cognitive Services subscription key
SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"

# define the ACS endpoint
ENDPOINT_URL = "YOUR_ENDPOINT_URL"

You should replace the string "YOUR_SUBSCRPTION_KEY" with your subscription key obtained from Obtaining Your Microsoft Cognitive Services Keys.

Additionally, ensure you double-check your ENDPOINT_URL. At the time of this writing, the endpoint URL points to the most recent version of the MCS API; however, as Microsoft releases new API versions, this endpoint URL may change, so it’s worth double-checking.

Implementing the Microsoft Cognitive Services OCR Script

Now, let’s learn how to submit images for text detection and OCR to the MCS API.

Open the microsoft_ocr.py script in the project directory structure and insert the following code:

# import the necessary packages
from config import microsoft_cognitive_services as config
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import argparse
import time
import sys
import cv2

Note on Line 2 that we import our microsoft_cognitive_services configuration to supply our subscription key and endpoint URL. Then, we’ll use the azure and msrest Python packages to send requests to the API.

Next, let’s define draw_ocr_results, a helper function used to annotate our output images with the OCR’d text:

def draw_ocr_results(image, text, pts, color=(0, 255, 0)):
	# unpack the points list
	topLeft = pts[0]
	topRight = pts[1]
	bottomRight = pts[2]
	bottomLeft = pts[3]

	# draw the bounding box of the detected text
	cv2.line(image, topLeft, topRight, color, 2)
	cv2.line(image, topRight, bottomRight, color, 2)
	cv2.line(image, bottomRight, bottomLeft, color, 2)
	cv2.line(image, bottomLeft, topLeft, color, 2)

	# draw the text itself
	cv2.putText(image, text, (topLeft[0], topLeft[1] - 10),
		cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)

	# return the output image
	return image

Our draw_ocr_results function has four parameters:

image: The input image that we will draw on.
text: The OCR’d text.
pts: The top-left, top-right, bottom-right, and bottom-left (x, y)-coordinates of the text ROI
color: The BGR color we’re using to draw on the image

Lines 13-16 unpack our bounding box coordinates. From there, Lines 19-22 draw the bounding box surrounding the text in the image. We then draw the OCR’d text itself on Lines 25 and 26.

We wrap up this function by returning the output image to the calling function.

We can now parse our command line arguments:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image that we'll submit to Microsoft OCR")
args = vars(ap.parse_args())

# load the input image from disk, both in a byte array and OpenCV
# format
imageData = open(args["image"], "rb").read()
image = cv2.imread(args["image"])

We only need a single argument here, --image, which is the path to the input image on disk. We read this image from disk, both as a binary byte array (so we can submit it to the MCS API, and then again in OpenCV/NumPy format (so we can draw on/annotate it).

Let’s now construct a request to the MCS API:

# initialize the client with endpoint URL and subscription key
client = ComputerVisionClient(config.ENDPOINT_URL,
	CognitiveServicesCredentials(config.SUBSCRIPTION_KEY))

# call the API with the image and get the raw data, grab the operation
# location from the response, and grab the operation ID from the
# operation location
response = client.read_in_stream(imageData, raw=True)
operationLocation = response.headers["Operation-Location"]
operationID = operationLocation.split("/")[-1]

Lines 43 and 44 initialize the Azure Computer Vision client. Note that we are supplying our ENDPOINT_URL and SUBSCRIPTION_KEY here — now is a good time to go back to microsoft_cognitive_services.py and ensure you have correctly inserted your subscription key (otherwise, the request to the MCS API will fail).

We then submit the image for OCR to the MCS API on Lines 49-51.

We now have to wait and poll for results from the MCS API:

# continue to poll the Cognitive Services API for a response until
# we get a response
while True:
	# get the result
	results = client.get_read_result(operationID)

	# check if the status is not "not started" or "running", if so,
	# stop the polling operation
	if results.status.lower() not in ["notstarted", "running"]:
		break
	
	# sleep for a bit before we make another request to the API
	time.sleep(10)

# check to see if the request succeeded
if results.status == OperationStatusCodes.succeeded:
	print("[INFO] Microsoft Cognitive Services API request succeeded...")

# if the request failed, show an error message and exit
else:
	print("[INFO] Microsoft Cognitive Services API request failed")
	print("[INFO] Attempting to gracefully exit")
	sys.exit(0)

I’ll be honest — polling for results is not my favorite way to work with an API. It requires more code, it’s a bit more tedious, and it can be potentially error-prone if the programmer isn’t careful to break out of the loop properly.

Of course, there are pros to this approach, including maintaining a connection, submitting larger chunks of data, and having results returned in batches rather than all at once.

Regardless, this is how Microsoft has implemented its API, so we must play by their rules.

Line 55 starts a while loop that continuously checks for responses from the MCS API (Line 57).

If we do not find the status in the ["notstarted", "running"] list, we can safely break from the loop and process our results (Lines 61 and 62).

If the above condition is not met, we sleep for a small amount of time and then poll again (Line 65).

Line 68 checks if the request has succeeded, if it has then we can safely continue to process our results. Otherwise, if the request hasn’t succeeded, then we have no OCR results to show (since the image could not be processed), and then we exit gracefully from our script (Lines 72-75).

Provided our OCR request succeeded, let’s now process the results:

# make a copy of the input image for final output
final = image.copy()

# loop over the results
for result in results.analyze_result.read_results:
	# loop over the lines
	for line in result.lines:
		# extract the OCR'd line from Microsoft's API and unpack the
		# bounding box coordinates
		text = line.text
		box = list(map(int, line.bounding_box))
		(tlX, tlY, trX, trY, brX, brY, blX, blY) = box
		pts = ((tlX, tlY), (trX, trY), (brX, brY), (blX, blY))

		# draw the output OCR line-by-line
		output = image.copy()
		output = draw_ocr_results(output, text, pts)
		final = draw_ocr_results(final, text, pts)

		# show the output OCR'd line
		print(text)
		cv2.imshow("Output", output)

# show the final output image
cv2.imshow("Final Output", final)
cv2.waitKey(0)

Line 78 initializes our final output image with all text drawn.

We start looping through all lines of OCR’d text on Line 81. On Line 83, we start looping through all lines of the OCR’d text. We extract the OCR’d text and bounding box coordinates for each line, then construct a list of the top-left, top-right, bottom-right, and bottom-left corners, respectively (Lines 86-89).

We then draw the OCR’d text line-by-line on the output and final image (Lines 92-94). We display the current line of text on our screen and terminal (Lines 97 and 98) — the final output image, with all OCR’d text drawn on it, is displayed on Lines 101 and 102.

Microsoft Cognitive Services OCR Results

Let’s now put the MCS OCR API to work for us. Open a terminal and execute the following command:

$ python microsoft_ocr.py --image images/aircraft.png
[INFO] making request to Microsoft Cognitive Services API...
WARNING!
LOW FLYING AND DEPARTING AIRCRAFT
BLAST CAN CAUSE PHYSICAL INJURY

Figure 2 shows the output of applying the MCS OCR API to our aircraft warning sign. If you recall, this is the same image we used in a previous tutorial when applying the Amazon Rekognition API. Therefore, I included the same image in this tutorial to demonstrate that the MCS OCR API can correctly OCR this image.

**Figure 2:** Image of a plane flying over a warning sign. The OCR results displayed on the sign in green.

Let’s try a different image, this one containing several challenging pieces of text:

$ python microsoft_ocr.py --image images/challenging.png
[INFO] making request to Microsoft Cognitive Services API...

LITTER
EMERGENCY
First
Eastern National
Bus Times
STOP

Figure 3 shows the results of applying the MCS OCR API to our input image — and as we can see, MCS does a great job OCR’ing the image.

**Figure 3:** *Left:* Sample text from the First Eastern National bus timetable. The sample text is a tough image to OCR due to the low image quality and glossy print. Still, Microsoft’s OCR API can correctly OCR it! *Middle:* The Microsoft OCR API can correctly OCR the *“Emergency Stop”* text. *Right:* A trash can with the text “*Litter*.” We’re able to OCR the text, but the text at the bottom of the trash can is unreadable, even to the human eye.

On the left, we have a sample image from the First Eastern National bus timetable (i.e., the schedule of when a bus will arrive). The document is printed with a glossy finish (likely to prevent water damage). Still, the image has a significant reflection due to the gloss, particularly in the “Bus Times” text. Still, the MCS OCR API can correctly OCR the image.

In the middle, the “Emergency Stop” text is highly pixelated and low-quality, but that doesn’t phase the MCS OCR API! It’s able to correctly OCR the image.

Finally, the right shows a trash can with the text “Litter.” The text is tiny, and due to the low-quality image, it is challenging to read without squinting a bit. That said, the MCS OCR API can still OCR the text (although the text at the bottom of the trash can is illegible — neither human nor API could read that text).

The next sample image contains a national park sign shown in Figure 4:

**Figure 4:** OCR’ing a park sign using the Microsoft Cognitive Services OCR API. Notice that API can give us *rotated* text bounding boxes along with the OCR’d text itself.

The MCS OCR API can OCR each sign line-by-line (Figure 4). We can also compute rotated text bounding box/polygons for each line.

$ python microsoft_ocr.py --image images/park.png
[INFO] making request to Microsoft Cognitive Services API...

PLEASE TAKE
NOTHING BUT
PICTURES
LEAVE NOTHING
BUT FOOT PRINTS

The final example we have contains traffic signs:

$ python microsoft_ocr.py --image images/street_signs.png
[INFO] making request to Microsoft Cognitive Services API...

Old Town Rd
STOP
ALL WAY

Figure 5 shows that we can correctly OCR each piece of text on both the stop sign and street name sign.

**Figure 5:** The Microsoft Cognitive Services OCR API can detect the text on traffic signs.

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: April 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned about Microsoft Cognitive Services (MCS) OCR API. Despite being slightly harder to implement and use than the Amazon Rekognition API, the Microsoft Cognitive Services OCR API demonstrated that it’s quite robust and able to OCR text in many situations, including low-quality images.

When working with low-quality images, the MCS API shined. Typically, I recommend you programmatically detect and discard low-quality images (as we did in a previous tutorial). However, if you find yourself in a situation where you have to work with low-quality images, it may be worth your while to use the Microsoft Azure Cognitive Services OCR API.

Citation Information

Rosebrock, A. “Text Detection and OCR with Microsoft Cognitive Services,” PyImageSearch, D. Chakraborty, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, R. Raha, and A. Thanki, eds., 2022, https://pyimg.co/0r4mt

@incollection{Rosebrock_2022_OCR_MCS,
  author = {Adrian Rosebrock},
  title = {Text Detection and {OCR} with {M}icrosoft Cognitive Services},
  booktitle = {PyImageSearch},
  editor = {Devjyoti Chakraborty and Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha and Abhishek Thanki},
  year = {2022},
  note = {https://pyimg.co/0r4mt},
}

Unleash the potential of computer vision with Roboflow - Free!

Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.

Join Roboflow Now

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Table of Contents

Text Detection and OCR with Microsoft Cognitive Services

Looking for the source code to this post?

Text Detection and OCR with Microsoft Cognitive Services

Microsoft Cognitive Services for OCR

Obtaining Your Microsoft Cognitive Services Keys

Configuring Your Development Environment

Having Problems Configuring Your Development Environment?

Project Structure

Creating Our Configuration File

Implementing the Microsoft Cognitive Services OCR Script

Microsoft Cognitive Services OCR Results

What's next? We recommend PyImageSearch University.

Summary

Citation Information

Unleash the potential of computer vision with Roboflow - Free!

Download the Source Code and FREE 17-page Resource Guide

About the Author

Comment section

PyImageSearch University

Pixel Shuffle Super Resolution with TensorFlow, Keras, and Deep Learning

Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4)

Bubble sheet multiple choice scanner and test grader using OMR, Python, and OpenCV

Topics

Books & Courses

PyImageSearch

Table of Contents

Looking for the source code to this post?

What's next? We recommend PyImageSearch University.

Unleash the potential of computer vision with Roboflow - Free!

Download the Source Code and FREE 17-page Resource Guide

About the Author

Text Detection and OCR with Amazon Rekognition API

Text Detection and OCR with Google Cloud Vision API

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?