In this tutorial, you will learn how to use multiprocessing with OpenCV and Python to perform feature extraction. You’ll learn how to use multiprocessing with OpenCV to parallelize feature extraction across the system bus, including all processors and cores on your computer.
Today’s tutorial is inspired by PyImageSearch reader, Abigail.
Abigail writes:
Hey Adrian, I just read your tutorial on image hashing with OpenCV and really enjoyed it.
I’m trying to apply image hashing to my research project at the university.
They have provided me with a dataset of ~7.5 million images. I used your code to perform image hashing but it’s taking a long time to process the entire dataset.
Is there anything I can do to speedup the process?
Abigail asks a great question.
The image hashing post she is referring to is singled threaded, meaning that only one core of the processor is being utilized — if we switch to using multiple threads/processes we can dramatically speed up the hashing process.
But how do we actually utilize multiprocessing with OpenCV and Python?
I’ll show you in the rest of this tutorial.
To learn how to use multiprocessing with OpenCV and Python, just keep reading.
Multiprocessing with OpenCV and Python
In the first part of this tutorial, we’ll discuss single-threaded vs. multi-threaded applications, including why we may choose to use multiprocessing with OpenCV to speed up the processing of a given dataset.
I’ll also discuss why immediately jumping to Big Data algorithms, tools, and paradigms (such as Hadoop and MapReduce) is the wrong decision — instead, you should parallelize across the system bus first.
From there we’ll implement our Python and OpenCV multiprocessing functions to facilitate processing a large dataset quickly and easily.
Finally, we’ll put all the pieces together and compare how long it takes to process our dataset:
- With only a single core of a processor
- And distributing the load across all cores of the processor
Let’s get started!
Why use multiprocessing for processing a dataset of images?
The vast majority of projects and applications you have implemented are (very likely) single-threaded.
When you launch your Python project, the python
binary launches a Python interpreter (i.e., the “Python process”).
How the actual Python process itself is assigned to a CPU core is dependent on how the operating system handles (1) process scheduling and (2) assigning system vs. user threads.
There are entire books dedicated to multiprocessing, operating systems, and how processes are scheduled, assigned, removed, deleted, etc. via the OS; however, for the sake of simplicity, let’s assume:
- We launch our Python script.
- The operating system assigns the Python program to a single core of the processor.
- The OS then allows the Python script to run on the processor core until completion.
That’s all fine and good — but we are only utilizing a small amount of our true processing power.
To see how we’re underutilizing our processor, consider the following image:

This figure is meant to visualize the 3 GHz Intel Xeon W on my iMac Pro — note how the processor has a total of 20 cores.
Now, let’s assume we launch our Python script. The operating system will assign the process to a single one of those cores:

The Python script will then run to completion.
But do you see the problem here?
We are only using 5% of our true processing power!
Thus, to speed up our Python script we can utilize multiprocessing. Under the hood, Python’s multiprocessing
package spins up a new python
process for each core of the processor. Each python
process is independent and separate from the others (i.e., there are no shared variables, memory, etc.).
We then assign a portion of the dataset processing workload to each individual python
process:

Notice how each process is assigned a small chunk of the dataset.
Each process independently chews on the subset of the dataset assigned to it until the entire dataset has been processed.
Now, instead of using just a single core of our processor, we are using all cores!
Note: Keep in mind that this example is a bit of a simplification. The OS will manage process assignment as there are more processes than just your Python script running on your system. Some cores may be responsible for more than one Python process, other cores no Python processes, and remaining cores OS/system routines.
Why not use Hadoop, MapReduce, and other Big Data tools?
Your first thought when trying to parallelize processing of a large dataset would be to apply Big Data tools, algorithms, and paradigms such as Hadoop and MapReduce — but this would be a BIG mistake.
The golden rule when working with large datasets is to:
- Parallelize across your system bus first.
- And if performance/throughput is not sufficient, then, and only then, start parallelizing across multiple machines (including Hadoop, MapReduce, etc.).
The single biggest multiprocessing mistake I see computer scientists make is to immediately jump into Big Data tools.
Don’t do that.
Instead, spread the dataset processing across your system bus first.
If you’re not getting the throughput speed you want on your system bus only then should you consider parallelizing across multiple machines and bringing in Big Data tools.
If you find yourself in need of Hadoop/MapReduce, enroll in the PyImageSearch Gurus course to learn about high-throughput Python + OpenCV image processing using Hadoop’s Streaming API!
Our example dataset

The dataset we’ll be using for our multiprocessing and OpenCV example is CALTECH-101, the same dataset we use when building an image hashing search engine.
The dataset consists of 9,144 images.
We’ll be using multiprocessing to spread out the image hashing extraction across all cores of our processor.
You may download the CALTECH-101 dataset from their official webpage or you can use the following wget
command:
$ wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz $ tar xvzf 101_ObjectCategories.tar.gz
Project structure
Let’s inspect our project structure:
$ tree --dirsfirst --filelimit 10 . ├── pyimagesearch │ ├── __init__.py │ └── parallel_hashing.py ├── 101_ObjectCategories [9,144 images] ├── temp_output └── extract.py
Inside the pyimagesearch
module is our parallel_hashing.py
helper script. This script contains our hashing function, chunking function, and our process_images
workhorse.
The 101_ObjectCatories/
directory contains 101 subdirectories of images from CALTECH-101 (downloaded via the previous section).
A number of intermediate files will be temporarily stored in the temp_output/
folder.
The heart of our multiprocessing lies in extract.py
. This script includes our pre-multiprocessing overhead, parallelization across the system bus, and post-multprocessing overhead.
Our multiprocessing helper functions
Before we can utilize multiprocessing with OpenCV to speedup our dataset processing, let’s first implement our set of helper utilities used to facilitate multiprocessing.
Open up the parallel_hashing.py
file in your directory structure and insert the following code:
# import the necessary packages import numpy as np import pickle import cv2 def dhash(image, hashSize=8): # convert the image to grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # resize the input image, adding a single column (width) so we # can compute the horizontal gradient resized = cv2.resize(gray, (hashSize + 1, hashSize)) # compute the (relative) horizontal gradient between adjacent # column pixels diff = resized[:, 1:] > resized[:, :-1] # convert the difference image to a hash return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])
We begin by importing NumPy, OpenCV, and pickle
(Lines 2-5).
From there, we define our difference hashing function, dhash
. There are a number of image hashing algorithms, but one of the most popular ones is called the difference hash, which includes four steps:
- Step #1: Convert the input image to grayscale (Line 8).
- Step #2: Resize the image to fixed dimensions, N + 1 x N, ignoring aspect ratio. Typically we set N=8 or N=16. We use N + 1 for the number of rows so that we can compute the difference (hence “difference hash”) between adjacent pixels in the image (Line 12).
- Step #3: Compute the difference. If we set N=8 then we have 9 pixels per row and 8 pixels per column. We can then compute the difference between adjacent column pixels, yielding 8 differences. 8 rows of 8 differences (i.e., 8×8) results in 64 values (Line 16).
- Step #4: Finally, we can build the hash. In practice all we actually need to perform is a “greater than” operation comparing the columns, yielding binary values. These 64 binary values are compacted into an integer, forming our final hash (Line 19).
Typically, image hashing algorithms are used to find near-duplicate images in a large dataset.
For a full review of difference hashing be sure to review the following two blog posts:
- Building an Image Hashing Search Engine with VP-Trees and OpenCV
- Image hashing with OpenCV and Python
Next, let’s look at the convert_hash
function:
def convert_hash(h): # convert the hash to NumPy's 64-bit float and then back to # Python's built in int return int(np.array(h, dtype="float64"))
When I first wrote the code for the image hashing search engine tutorial, I found that the VP-Tree implementation internally converts points to a NumPy 64-bit float. That would be okay; however, hashes need to be integers and if we convert them to 64-bit floats, they become an unhashable data type. To overcome the limitation of the VP-Tree implementation, I came up with the convert_hash
hack:
- We accept an input hash,
h
. - That hash is then converted to a NumPy 64-bit float.
- And that NumPy float is then converted back to Python’s built-in integer data type.
This hack ensures that hashes are represented consistently throughout the hashing, indexing, and searching process.
In order to leverage multiprocessing, we first need to chunk our dataset into N equally sized chunks (one chunk per core of the processor).
Let’s define our chunk
generator now:
def chunk(l, n): # loop over the list in n-sized chunks for i in range(0, len(l), n): # yield the current n-sized chunk to the calling function yield l[i: i + n]
The chunk
generator accepts two parameters:
l
: List of elements (in this case, file paths).n
: Number of N-sized chunks to generate.
Inside the function, we loop over list l
and yield
N-sized chunks to the calling function.
We’re finally to the workhorse of our multiprocessing implementation — the process_images
function:
def process_images(payload): # display the process ID for debugging and initialize the hashes # dictionary print("[INFO] starting process {}".format(payload["id"])) hashes = {} # loop over the image paths for imagePath in payload["input_paths"]: # load the input image, compute the hash, and conver it image = cv2.imread(imagePath) h = dhash(image) h = convert_hash(h) # update the hashes dictionary l = hashes.get(h, []) l.append(imagePath) hashes[h] = l # serialize the hashes dictionary to disk using the supplied # output path print("[INFO] process {} serializing hashes".format(payload["id"])) f = open(payload["output_path"], "wb") f.write(pickle.dumps(hashes)) f.close()
Inside the separate extract.py
script, we’ll use Python’s multiprocessing
library to launch a dedicated Python process, assign it to a specific core of the processor, and then run the process_images
function on that specific core.
The process_images
function works like this:
- It accepts a
payload
as an input (Line 32). Thepayload
is assumed to be a Python dictionary but can actually be any datatype provided that we can pickle and unpickle it. - Initializes the
hashes
dictionary (Line 36). - Loops over input image paths in the
payload
(Line 39). In the loop, we load each image, extract the hash, and updatehashes
dictionary (Lines 41-48). - Finally, we write the
hashes
to disk as a.pickle
file (Lines 53-55).
For the purposes of this blog post we are utilizing multiprocessing to facilitate faster image hashing of an input dataset; however, you should use this function as a template for your own dataset processing.
You should easily swap in keypoint detection/local invariant feature extraction, color channel statistics, Local Binary Patterns, etc. From there, you may take this function an modify it for your own needs.
Implementing the OpenCV and multiprocessing script
Now that our utility methods are implemented, let’s create the multiprocessing driver script.
This script will be responsible for:
- Grabbing all image paths in our input dataset.
- Splitting the image paths into N equally sized chunks (where N is the total number of processes we wish to utilize).
- Using
multiprocessing
,Pool
, andmap
to call theprocess_images
function on each core of the processor. - Grab the results from each independent process and combine them.
If you need to review Python’s multiprocessing module, be sure to refer to the docs.
Let’s see how we can implement our OpenCV and multiprocessing script. Open up the extract.py
file and insert the following code:
# import the necessary packages from pyimagesearch.parallel_hashing import process_images from pyimagesearch.parallel_hashing import chunk from multiprocessing import Pool from multiprocessing import cpu_count from imutils import paths import numpy as np import argparse import pickle import os
Lines 2-10 import our packages, modules, and functions:
- From our custom
parallel_hashing
file, we import both ourprocess_images
andchunk
functions. - To accommodate parallel processing we’ll use Pythons
multiprocessing
module. Specifically, we importPool
(to construct a processing pool) andcpu_count
(to get a count of the number of available CPUs/cores if the--procs
command line argument is not supplied).
All of our multiprocessing setup code must be in the main thread of execution:
# check to see if this is the main thread of execution if __name__ == "__main__": # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--images", required=True, type=str, help="path to input directory of images") ap.add_argument("-o", "--output", required=True, type=str, help="path to output directory to store intermediate files") ap.add_argument("-a", "--hashes", required=True, type=str, help="path to output hashes dictionary") ap.add_argument("-p", "--procs", type=int, default=-1, help="# of processes to spin up") args = vars(ap.parse_args())
Line 13 ensures we are inside the main thread of execution. This helps prevent multiprocessing bugs, especially on Windows operating systems.
Lines 15-24 parse four command line arguments:
--images
: The path to the input images directory.--output
: The path to the output directory to store intermediate files.--hashes
: The path to the output hashes dictionary in .pickle format.--procs
: The number of processes to launch for multiprocessing.
With our command line arguments parsed and ready to go, now we’ll (1) determine the number of concurrent processes to launch, and (2) prepare our image paths (a bit of pre-multiprocessing overhead):
# determine the number of concurrent processes to launch when # distributing the load across the system, then create the list # of process IDs procs = args["procs"] if args["procs"] > 0 else cpu_count() procIDs = list(range(0, procs)) # grab the paths to the input images, then determine the number # of images each process will handle print("[INFO] grabbing image paths...") allImagePaths = sorted(list(paths.list_images(args["images"]))) numImagesPerProc = len(allImagePaths) / float(procs) numImagesPerProc = int(np.ceil(numImagesPerProc)) # chunk the image paths into N (approximately) equal sets, one # set of image paths for each individual process chunkedPaths = list(chunk(allImagePaths, numImagesPerProc))
Line 29 determines the total number of concurrent processes we’ll be launching, while Line 30 assigns each process an ID number. By default, we’ll utilize all CPUs/cores on our system.
Line 35 grabs paths to the input images in our dataset.
Lines 36 and 37 determine the total number of images per process by dividing the number of image paths by the number of processes and taking the ceiling to ensure we use an integer value from here forward.
Line 41 utilizes our chunk
function to create a list of N equally-sized lists of image paths. We will be mapping each of these chunks to an independent process.
Let’s prepare our payloads
to assign to each process (our final pre-multiprocessing overhead):
# initialize the list of payloads payloads = [] # loop over the set chunked image paths for (i, imagePaths) in enumerate(chunkedPaths): # construct the path to the output intermediary file for the # current process outputPath = os.path.sep.join([args["output"], "proc_{}.pickle".format(i)]) # construct a dictionary of data for the payload, then add it # to the payloads list data = { "id": i, "input_paths": imagePaths, "output_path": outputPath } payloads.append(data)
Line 44 initializes the payloads
list. Each payload will consist of data
containing:
- An ID
- A list of input paths
- An output path to an intermediate file
Line 47 begins a loop over our chunked image paths. Inside the loop, we specify the intermediary output file path (which will store the respective image hashes for that specific chunk of image paths) while naming it carefully with the process ID in the filename (Lines 50 and 51).
To finish the loop, we append
our data
— a dictionary consisting of the (1) ID, i
, (2) input imagePaths
, and (3) outputPath
(Lines 55-60).
This next block is where we distribute processing of the dataset across our system bus:
# construct and launch the processing pool print("[INFO] launching pool using {} processes...".format(procs)) pool = Pool(processes=procs) pool.map(process_images, payloads) # close the pool and wait for all processes to finish print("[INFO] waiting for processes to finish...") pool.close() pool.join() print("[INFO] multiprocessing complete")
The Pool
class creates the Python processes/interpreters on each respective core of the processor (Line 64).
Calling map
takes the payloads
list and then calls process_images
on each core, distributing the payloads
to each core (Lines 65).
We’ll then close the pool
from accepting new jobs and wait for the multiprocessing to complete (Lines 69 and 70).
The final step (post-multiprocessing overhead) is to take our intermediate hashes and construct the final combined hashes.
# initialize our *combined* hashes dictionary (i.e., will combine # the results of each pickled/serialized dictionary into a # *single* dictionary print("[INFO] combining hashes...") hashes = {} # loop over all pickle files in the output directory for p in paths.list_files(args["output"], validExts=(".pickle"),): # load the contents of the dictionary data = pickle.loads(open(p, "rb").read()) # loop over the hashes and image paths in the dictionary for (tempH, tempPaths) in data.items(): # grab all image paths with the current hash, add in the # image paths for the current pickle file, and then # update our hashes dictionary imagePaths = hashes.get(tempH, []) imagePaths.extend(tempPaths) hashes[tempH] = imagePaths # serialize the hashes dictionary to disk print("[INFO] serializing hashes...") f = open(args["hashes"], "wb") f.write(pickle.dumps(hashes)) f.close()
Line 77 initializes the hashes dictionary to hold our combined hashes which we will populate from each of the intermediary files.
Lines 80-91 populate the combined hashes dictionary. To do so, we loop over all intermediate .pickle
files (i.e., one .pickle
file for each individual process). Inside the loop, we (1) read the hashes and associated imagePaths
from the data, and (2) update the hashes
dictionary.
Finally, Lines 94-97 serialize the hashes
to disk. We could use the serialized hashes to construct a VP-Tree and search for near-duplicate images in a separate script at this point.
Note: You could update the code to delete the temporary .pickle
files from your system; however, I left that as an implementation decision to you, the reader.
OpenCV and multiprocessing results
Let’s put our OpenCV and multiprocessing methods to the test. Make sure you’ve:
- Used the “Downloads” section of this tutorial to download the source code.
- Downloaded the CALTECH-101 dataset using the instructions in the “Our example dataset” section above.
To start, let’s test how long it takes to process our dataset of 9,144 images using only a single core:
$ time python extract.py --images 101_ObjectCategories --output temp_output \ --hashes hashes.pickle --procs 1 [INFO] grabbing image paths... [INFO] launching pool using 1 processes... [INFO] starting process 0 [INFO] process 0 serializing hashes [INFO] waiting for processes to finish... [INFO] multiprocessing complete [INFO] combining hashes... [INFO] serializing hashes... real 0m9.576s user 0m7.857s sys 0m1.489s
Utilizing only a single process (single core of our processor) required 9.576 seconds to process the entire image dataset.
Now, let’s try using all 20 processes (which could be mapped to all 20 cores of my processor):
$ time python extract.py --images ~/Desktop/101_ObjectCategories \ --output temp_output --hashes hashes.pickle [INFO] grabbing image paths... [INFO] launching pool using 20 processes... [INFO] starting process 0 [INFO] starting process 1 [INFO] starting process 2 [INFO] starting process 3 [INFO] starting process 4 [INFO] starting process 5 [INFO] starting process 6 [INFO] starting process 7 [INFO] starting process 8 [INFO] starting process 9 [INFO] starting process 10 [INFO] starting process 11 [INFO] starting process 12 [INFO] starting process 13 [INFO] starting process 14 [INFO] starting process 15 [INFO] starting process 16 [INFO] starting process 17 [INFO] starting process 18 [INFO] starting process 19 [INFO] process 3 serializing hashes [INFO] process 4 serializing hashes [INFO] process 6 serializing hashes [INFO] process 8 serializing hashes [INFO] process 5 serializing hashes [INFO] process 19 serializing hashes [INFO] process 11 serializing hashes [INFO] process 10 serializing hashes [INFO] process 16 serializing hashes [INFO] process 14 serializing hashes [INFO] process 15 serializing hashes [INFO] process 18 serializing hashes [INFO] process 7 serializing hashes [INFO] process 17 serializing hashes [INFO] process 12 serializing hashes [INFO] process 9 serializing hashes [INFO] process 13 serializing hashes [INFO] process 2 serializing hashes [INFO] process 1 serializing hashes [INFO] process 0 serializing hashes [INFO] waiting for processes to finish... [INFO] multiprocessing complete [INFO] combining hashes... [INFO] serializing hashes... real 0m1.508s user 0m12.785s sys 0m1.361s
By distributing the image hashing load across all 20 cores of my processor I was able to reduce the time it took to process the dataset from 9.576 seconds down to 1.508 seconds — that’s a reduction of over 535%!
But wait, if we used 20 cores, shouldn’t the total processing time be approximately 9.576 / 20 = 0.4788 seconds?
Well, not quite, for a few reasons:
- First, we’re performing a lot of I/O operations. Each
cv2.imread
call results in I/O overhead. The hashing algorithm itself is also very simple. If our algorithm were truly CPU bound, versus I/O bound, the speedup factor would be even better. - Secondly, multiprocessing is not a “free” operation. There are overhead function calls, both at the Python level and operating system level, that prevent us from seeing a true 20x speedup.
Can all computer vision and OpenCV algorithms be made parallel with multiprocessing?
The short answer is no, not all algorithms can be made parallel and distributed to all cores of a processor — some algorithms are simply single threaded in nature.
Furthermore, you cannot use the multiprocessing
library to speedup compiled OpenCV routines like cv2.GaussianBlur
, cv2.Canny
, or any of the deep neural network routines in the cv2.dnn
package.
Those routines, as well as all other cv2.*
functions are pre-compiled C/C++ functions — Python’s multiprocessing
library will have no impact on them whatsoever.
Instead, if you are interested in how to speedup those functions, be sure to look into OpenCL, Threading Building Blocks (TBB), NEON, and VFPv3.
Additionally, if you are working with the Raspberry Pi you should read this tutorial on how to optimize your OpenCV install.
I’m also including additional OpenCV optimizations inside my book, Raspberry Pi for Computer Vision.
What's next? We recommend PyImageSearch University.
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: February 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial you learned how to utilize multiprocessing with OpenCV and Python.
Specifically, we learned how to use Python’s built-in multiprocessing
library along with the Pool
and map
methods to parallelize and distribute processing across all processors and all cores of the processors.
The end result is a massive 535% speedup in the time it took to process our dataset of images.
We examined multiprocessing with OpenCV through indexing a dataset of images for building an image hashing search engine; however, you can modify/implement your own process_images
function to include your own functionality.
My personal suggestion would be to use the process_images
function as a template when building your own multiprocessing and OpenCV applications.
I hope you enjoyed this tutorial!
If you would like to see more multiprocessing and OpenCV optimization tutorials in the future please leave a comment below and let me know.
To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Hey actually i came across this library called numba by Anaconda, it converts various python functions to machine language and compile back for parallelization, i basically works very well on numpy array computational tasks, my current goal is to use opencv drawing functions to draw awesome looking labelled bounding boxes bit it takes time any method to transfer the payload from cpu to gpu will be very helpful.
Thanks for sharing, Nitin! I haven’t written a tutorial on Numba yet but I may do so in the future 🙂
Hello Adrian, great post. I have a question about this multiprocessing, but maybe it will be better at multithreads.
I´m using a GPU for inference with a yolov3 network. looking at the nvidia-smi comand during process i see it only occupies 2GB of the 11GB i have. I tried with 6 threads and the yolov3 inferences sometimes returns garbage. Is this normal? Am I trying the wrong way, is there a better way to accomplish this? I need to proccess videos as quickly as possible.
I am using pytorch by the way. Maybe is the framework.
Kind regards,
I’m not sure I fully understand the question. Is your goal to have a single YOLO model in memory and then have it detect objects for incoming images? If so, I would recommend this approach.
Hi Adrian,
This is a nice simplistic approach to chunking large sets of work in parallel processing and for many cases will be work very well. Would it be worth considering discussing very large worklists which may require fault tolerance, restarting from where it left off if it crashes or after a requested pause if the server is required for other purposes for a while etc.
We also need to understand that each individual piece of work will not take the same finite period of time to process (size of image or processor busy being used by the OS or waiting for I/O etc). Such a situation could leave 1 process running on its own for a long time when the other 19 processes have finished.
best regards
Hebbs
Sure, fault tolerance is extremely important — but that’s often very job/application specific. That’s why I left this code as a template. You can extend it and modify it as you see fit 🙂 I also mentioned in the post how I’m dramatically simplifying how an OS handles process scheduling. There are entire books dedicated to those topics.
Interesting. I’m curious as to what a plot of total time vs number of processes would look like, i.e. where the point of diminishing returns is, where adding more processes fails to gain significant performance advantage.
In my AI code that I’ve derived from you fine tutorials, I’ve found Multiprocessing was inferior to Threads on every case except for RTSP stream decoding. The MP inter-process communication (IPC) appears to be a major burden.
Not totally unexpected as the camera threads or processes need to push frames to the AI (NCS2, TPU etc.) threads or processes. RTSP decoding processes worked better because of a “trick” I used to avoid the Python IPC. I used a localhost MQTT broker to pass image buffers from the RTSP processes to the AI program.
Could you perhaps expound a bit on the Python Pool Object/functions? Maybe a topic for Office Hours?
I’m an accomplished multi-threaded programmer, but relatively novice with Python, and every environment I’ve used has variations of “thread pools” that I’ve never used as it seemed just an be extra layer of complexity for my usage cases.
It seems to me that thread pool systems are for situations like a webserver where threads are launched, run, and rather quickly exit where they can automate the housekeeping and set a limit to the maximum number of active threads to avoid going way past the point of diminishing returns.
All my projects have launched threads that run for the entire program duration so thread pools add nothing.
This example seems to be the same situation as all threads in the pool are launched and you wait until they all complete and end the program.
It wouldn’t be too challenging to create such a plot. The simplest way would be to write a Python script that executes the driver script, uses
time
to monitor how long it takes to run the script, increases the--procs
argument, and then repeats until the total number of desired processes is monitored. The output could then be plotted.Secondly, keep in mind this simple rule:
1. Threads are for I/O bound tasks
2. Processes are for CPU bound tasks
RTSP is by nature I/O. I would use threads for that, not processes. I’ll try to expand on that during office hours.
I get updates from time time in my mailbox. It’s really great what you are doing.
I want to suggest to highlight for all guys that multicore processing has principal limitation whivh you defenetly understand. And everybody understand multi processor processing beats multicore. The point is that you have 1 bus per processor (any example more than 2?) for the memory. So, if you need to use memory intensive, your cores would relax, because the bus restricts the perfomance.
I’ve run in to problems (memory leaks – crashes) using Python’s multiprocessing library and OpenCV. I googled “python multiprocessing opencv issues” and discovered I wasn’t alone. I also discovered the multiprocessing library doesn’t play well with ZeroMQ, Socket-IO or PubNub.
My solution was to use threading instead. My limited understanding is that OpenCV can take some advantage of multiple processors under the hood.
Any thoughts / insights on my experience?
Cheers
It’s hard to say what may be causing memory-related issues without intimate understanding of your code or your project. I will say that threads aren’t necessarily the best solution here as we typically use threads for I/O and processes for CPU/computation. If you’re just using I/O then threads are perfectly fine but you might want to look into the multiprocessing issue a bit more.
Following up on my previous comment regarding issues with OpenCV and Python’s multiprocessing module.
I was trying to use OpenCV’s grab() and retrieve() in place of read() to reduce the processor load when intermittently capturing still images from live video. There is very little documentation on using grab() retrieve() and it appears to be the source of my troubles. I went back to the more computationally expensive read() and all is good now with OpenCV and multiprocessing.
Just a shout out to Adrian’s latest book “Raspberry Pi for Computer Vision”. Reading it gave me additional insight in to my problem and I was able to quickly resolve it. Much appreciated!
Congrats on resolving the issue!
And thanks so much for the shoutout, I appreciate it 🙂
I used this multiprocessing to send multiple video files to MTCNN in parallel, but as soon as it enters the network the code seems to hang there, it is not able come out of the function its calling
results = detector.detect_face(img)
but when I’m sending the data in a serial manner and using only one processor it seems to work fine
You’re probably not batching your images correctly. You should consider batching them up using something like Redis and then doing a batch prediction. This tutorial will help you.
Hey Adrain thanks for the reply, I will surely try that.
I still have some doubts regarding this. This parallel processing worked with opencv haar cascade when i processed about 48 videos at once , but didnt worked with MTCNN, is it still because of the batching problem?
I also had a closer look at the MTCNN code , it also uses multiprocessing, does that also interfere with my multiprocessing?
What do you mean by “it didn’t work” — did you receive an error message of some sort? If you need additional help I would recommend purchasing one of my books/courses. From there we can move the conversation to email and have a more detailed conversation about it.
hi Adrian ,
thanks for this article. It has helped me partially for detection model with multiple camera inputs.
Can the multi processor module be used with machines with i7 processors together with GPUs? Will the Multi processing module of python consider GPUs as well for forking processes on them?
No, the multiprocessing module will not put specific processes on your GPUs. You need to explicitly interface with your GPU and do any prediction/inference there.
Thanks Adrian for the response, Can you please refer any article which handles parallelism on GPUs? Thanks in advance.
I don’t have any tutorials that cover “standard CV” algorithms on the GPU. Most of my GPU-based tutorials are for deep learning such as Keras and multiple GPUs. Otherwise, you might be interested in Deep Learning for Computer Vision with Python.
Good morning Adrian, let me say that I absolutely love your posts.
I checked those solutions you mention to speed up pre-compiled operations in C/C++:
“Instead, if you are interested in how to speedup those functions, be sure to look into OpenCL, Threading Building Blocks (TBB), NEON, and VFPv3.”
For what I can tell, those tools are to improve the efficiency of those functions before compiling but they are not python related. Is that so?
My main objective is to process at a minimum of 60 Hz some pics using the Raspberry Pi 4 with two usb web cameras (they take picks at 120 fps). But I think I have reached a limit at about 45 Hz (and not particularly stable).
Does it make sense to go deeper into C/C++ functions? Would they not be hard to optimize for an outsider? Is not python tools to go closer to acquisition time (120 fps) with this hardware?
Thank you very much in advance!
You are correct, those improve efficiency when you actually compile OpenCV. It’s only Python related in the sense that Python is calling those compiled routines and therefore your script runs faster.
I would suggest you work through Raspberry Pi for Computer Vision. That book is dedicated to getting CV applications to run in real-time on the RPi. There are a number of different optimization chapters in the book enabling you to get every last bit of performance out of your algorithms.
I checked out your books some time ago but I got the impression that they were more machine learning oriented and object detection. I am currently working on 3 different applications using opencv:
1. Measuring piping vibrations ~10 Hz in one or two vibration planes (@60 fps, 1 mm resolution)
2. Measuring 3D trajectories and orientations of mechanical in an video (@4 pics per minute – 1 cm resolution)
3. Measuring relative displacement of a rail with structured light at high speed (@250 Hz – 0.1 mm – resolution)
Do you think I could use the knowledge of your books for this stuff very different to each other?
I have offer four different books and courses.
1. Deep Learning for Computer Vision with Python focuses on deep learning.
2. Practical Python and OpenCV is a gentle intro to the world of computer vision and image processing through the OpenCV library.
3. The PyImageSearch Gurus course is similar to a college survey course on computer vision but much more hands-on
4. Raspberry Pi for Computer Vision focuses on embedded CV and DL
Given that, I would suggest you go with both #1 and #4.
Thank you very much for your advice. I will give them a close look.
Best regards,
Eduardo Briales
Hi Adrian,
The question I am asking probably may not be relative to current page. But I couldn’t come across any proper info regarding it.
Recently I am seeing a lot of object detection or image recognition using Tensorflow. What is the difference between tensorflow and OpenCV?
I know tensorflow is majorly used to machine learning and stuff but does voth packages share some common applications?..
OpenCV facilitates real-time image processing whereas TensorFlow is strictly for neural networks and deep learning. OpenCV can utilize a trained TensorFlow NN but it cannot train the network for you. I would suggest you read Deep Learning for Computer Vision with Python and Practical Python and OpenCV to get up to speed.
Hi Adrian,
Thanks for another great post! It helped me to get how to use multiprocessing with OpenCV. My question is at the end of this way we have separated pickle files of hashes. What is the best way to concatenate them all to have only one pickle file?
Regards
Take a look at the code in the post — Lines 76-97 show you how to combine the individual pickle files into a single large pickle file.
hi Adrian, thanks for the article.
i trying to figure if it is possible to use the multiprocessing script on scripts like the OpenCV object detection,
I would be happy to get your input
What object detection model are you trying to use? Deep learning-based or CPU-based?