In my last post, I mentioned that tiny, one pixel shifts in images can kill the performance your Restricted Boltzmann Machine + Classifier pipeline when utilizing raw pixels as feature vectors.

Today I am going to continue that discussion.

And more importantly, I’m going to provide some Python and scikit-learn code that you can use to apply Restricted Boltzmann Machines to your own image classification problems.

Looking for the source code to this post?

OpenCV and Python versions:
This example will run on Python 2.7 and OpenCV 2.4.X/OpenCV 3.0+.

But before we jump into the code, let’s take a minute to talk about the MNIST dataset.

The MNIST Dataset

Figure 1: MNIST digit recognition sample — **Figure 1:** MNIST digit recognition sample

The MNIST dataset is one of the most well studied datasets in the computer vision and machine learning literature. In many cases, it’s a benchmark, a standard to which some machine learning algorithms are ranked against.

The goal of this dataset is to correctly classify the handwritten digits 0-9. We are not going to utilize the entire dataset (which consists of 60,000 training images and 10,000 testing images), instead we are going to utilize a small sample (3,000 for training, 2,000 for testing). The data points are approximately uniformly distributed per digit, so no substantial class label imbalance exists.

Each feature vector is 784-dim, corresponding to the 28 x 28 grayscale pixel intensities of the image. These grayscale pixel intensities are unsigned integers, falling into the range [0, 255].

All digits are placed on a black background, with the foreground being white and shades of gray.

Given these raw pixel intensities, we are going to first train a Restricted Boltzmann Machine on our training data to learn an unsupervised feature representation of the digits.

Then, we are going to take these “learned” features and train a Logistic Regression classifier on top of them.

To evaluate our pipeline, we’ll take the testing data and run it through our classifier and report the accuracy.

However, I mentioned in my previous post that simple one pixel translations of the testing set images can lead to accuracy dropping, even though these translations are so small they are barely (if at all) noticeable to the human eye.

To test this claim, we’ll generate a testing set four times larger than the original by translating each image one pixel up, down, left, and right.

Finally, we’ll pass this “nudged” dataset through our pipeline and report our results.

Sound good?

Let’s examine some code.

Applying a RBM to the MNIST Dataset Using Python

The first thing we’ll do is create a file, rbm.py, and start importing the packages we need:

# import the necessary packages
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import BernoulliRBM
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline
import numpy as np
import argparse
import time
import cv2

We’ll start by importing the train_test_split function from the cross_validation sub-package of scikit-learn. The train_test_split function will make it dead simple for us to create our training and testing splits of the MNIST dataset.

Next, we’ll import the classification_report function from the metrics sub-package, which we’ll use to produce a nicely formatted accuracy report on (1) the overall system and (2) the accuracy of each individual class label.

On Line 4 we’ll import the classifier we’ll be using throughout this example — a LogisticRegression classifier.

I mentioned that we’ll be using a Restricted Boltzmann Machine to learn an unsupervised representation of our raw pixel values. This will be handled by the BernoulliRBM class in the neural_network sub-package of scikit-learn.

The BernoulliRBM implementation (as the name suggests), consists of binary visible units and binary hidden nodes. The algorithm itself is O(d²), where d is the number of components to be learned.

In order to find optimal values of the coefficient C for Logistic Regression, along with the optimal learning rate, number of iterations, and number of components for our RBM, we’ll need to perform a cross-validated grid search over the feature space. The GridSearchCV class (which we import on Line 6) will take care of this search for us.

Next, we’ll need the Pipeline class, imported on Line 7. This class allows us to define a series of steps using the fit and transform methods of scikit-learn estimators.

Our classification pipeline will consist of first training a BernoulliRBM to learn an unsupervised representation of the feature space, followed by training a LogisticRegression classifier on the learned features.

Finally, we import NumPy for numerical processing, argparse to parse command line arguments, time to track the amount of time it takes for a given model to train, and cv2 for our OpenCV bindings.

But before we get too far, we first need to setup some functions to load and manipulate our MNIST dataset:

def load_digits(datasetPath):
	# build the dataset and then split it into data
	# and labels
	X = np.genfromtxt(datasetPath, delimiter = ",", dtype = "uint8")
	y = X[:, 0]
	X = X[:, 1:]

	# return a tuple of the data and targets
	return (X, y)

The load_digits function, as the name suggests, loads our MNIST digit dataset off disk. The function takes a single parameter, datasetPath, which is the path to where the dataset CSV file resides.

We load the CSV file off disk using the np.genfromtxt function, grab the class labels (which are the first column of the CSV file) on Line 17, followed by the actual raw pixel feature vectors on Line 18. These feature vectors are of 784-dim corresponding to the 28 x 28 flattened representation of the grayscale digit image.

Finally, we return a tuple of our feature vector matrix and class labels on Line 21.

Next up, we need a function to apply some pre-processing to our data.

The BernoulliRBM assumes that the columns of our feature vectors fall within the range [0, 1]. However, the MNIST dataset is represented as unsigned 8-bit integers, falling within the range [0, 255].

To scale the columns into the range [0, 1], all we need to do is define a scale function:

def scale(X, eps = 0.001):
	# scale the data points s.t the columns of the feature space
	# (i.e the predictors) are within the range [0, 1]
	return (X - np.min(X, axis = 0)) / (np.max(X, axis = 0) + eps)

The scale function takes two parameters, our data matrix X and an epsilon value used to prevent division by zero errors.

This function is fairly self-explanatory. For each of the 784 columns in the matrix, we subtract the value from the minimum of the column and divide by the maximum of the column. By doing this, we have ensured that the values of each column fall into the range [0, 1].

Now we need one last function: a method to generated a “nudged” dataset four times larger than the original, translating each image one pixel up, down, left, and right.

To handle this nudging of the dataset, we’ll create the nudge function:

def nudge(X, y):
	# initialize the translations to shift the image one pixel
	# up, down, left, and right, then initialize the new data
	# matrix and targets
	translations = [(0, -1), (0, 1), (-1, 0), (1, 0)]
	data = []
	target = []

	# loop over each of the digits
	for (image, label) in zip(X, y):
		# reshape the image from a feature vector of 784 raw
		# pixel intensities to a 28x28 'image'
		image = image.reshape(28, 28)

		# loop over the translations
		for (tX, tY) in translations:
			# translate the image
			M = np.float32([[1, 0, tX], [0, 1, tY]])
			trans = cv2.warpAffine(image, M, (28, 28))

			# update the list of data and target
			data.append(trans.flatten())
			target.append(label)

	# return a tuple of the data matrix and targets
	return (np.array(data), np.array(target))

The nudge function takes two parameters: our data matrix X and our class labels y.

We start by initializing a list of our (x, y) translations, followed by our new data matrix and target labels on Lines 32-34.

Then, we start looping over each of the images and class labels on Line 37.

As I mentioned, each image is represented as a 784-dim feature vector, corresponding to the 28 x 28 digit image.

However, to utilize the cv2.warpAffine function to translate our image, we first need to reshape the 784 feature vector into a two dimensional array of shape (28, 28) — this is handled on Line 40.

Next up, we start looping over our translations on Line 43.

We construct our actual translation matrix M on Line 45 and then apply the translation by calling the cv2.warpAffine function on Line 46.

We are then able to update our new, nudged data matrix on Line 48 by flattening the 28 x 28 image back into a 784-dim feature vector.

Our class label target list is then updated on Line 50.

Finally, we return a tuple of the new data matrix and class labels on Line 53.

These three helper functions, while quite simple in nature, are critical to setting up our experiment.

Now we can finally start putting the pieces together:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required = True,
	help = "path to the dataset file")
ap.add_argument("-t", "--test", required = True, type = float,
	help = "size of test split")
ap.add_argument("-s", "--search", type = int, default = 0,
	help = "whether or not a grid search should be performed")
args = vars(ap.parse_args())

Lines 56-63 handle parsing our command line arguments. Our rbm.py script requires three arguments: --dataset, which is the path to where our MNIST .csv file resides on disk, --test, the percentage of data to use for our testing split (the rest used for training), and --search, an integer used to determine if a grid search should be performed to tune hyper-parameters.

A value of 1 for --search indicates that a grid search should be performed; a value of 0 indicates that the grid search has already been performed and the model parameters for both the BernoulliRBM and LogisticRegression models have already been manually set.

# load the digits dataset, convert the data points from integers
# to floats, and then scale the data s.t. the predictors (columns)
# are within the range [0, 1] -- this is a requirement of the
# Bernoulli RBM
(X, y) = load_digits(args["dataset"])
X = X.astype("float32")
X = scale(X)

# construct the training/testing split
(trainX, testX, trainY, testY) = train_test_split(X, y,
	test_size = args["test"], random_state = 42)

Now that our command line arguments have been parsed, we can load our dataset off disk on Line 69. We then convert it to the floating point data type on Line 70 and scale the feature vector columns to fall into the range [0, 1] using our scale function on Line 71.

In order to evaluate our system we need two sets of data: a training set and a testing set. Our pipeline will be trained using the training data, and then evaluated using the testing set to ensure our accuracy reports are not biased.

To generate our training and testing splits, we’ll call the train_test_split function on Line 74. This function automatically generates our splits for us.

# check to see if a grid search should be done
if args["search"] == 1:
	# perform a grid search on the 'C' parameter of Logistic
	# Regression
	print "SEARCHING LOGISTIC REGRESSION"
	params = {"C": [1.0, 10.0, 100.0]}
	start = time.time()
	gs = GridSearchCV(LogisticRegression(), params, n_jobs = -1, verbose = 1)
	gs.fit(trainX, trainY)

	# print diagnostic information to the user and grab the
	# best model
	print "done in %0.3fs" % (time.time() - start)
	print "best score: %0.3f" % (gs.best_score_)
	print "LOGISTIC REGRESSION PARAMETERS"
	bestParams = gs.best_estimator_.get_params()

	# loop over the parameters and print each of them out
	# so they can be manually set
	for p in sorted(params.keys()):
		print "\t %s: %f" % (p, bestParams[p])

A check is made on Line 78 to see if a grid search should be performed to tune the hyper-parameters of our pipeline.

If a grid search is to be performed, we first search over the coefficient C of the Logistic Regression classifier on Lines 81-85. We’ll be evaluating our approach using just a Logistic Regression classifier on the raw pixel data AND a Restricted Boltzmann Machine + Logistic Regression classifier, hence we need to independently search the coefficient C space.

Lines 89-97 then print out the optimal parameters values for our standard Logistic Regression classifier.

Now we can move on to our pipeline: a BernoulliRBM and a LogisticRegression classifier used together.

	# initialize the RBM + Logistic Regression pipeline
	rbm = BernoulliRBM()
	logistic = LogisticRegression()
	classifier = Pipeline([("rbm", rbm), ("logistic", logistic)])

	# perform a grid search on the learning rate, number of
	# iterations, and number of components on the RBM and
	# C for Logistic Regression
	print "SEARCHING RBM + LOGISTIC REGRESSION"
	params = {
		"rbm__learning_rate": [0.1, 0.01, 0.001],
		"rbm__n_iter": [20, 40, 80],
		"rbm__n_components": [50, 100, 200],
		"logistic__C": [1.0, 10.0, 100.0]}

	# perform a grid search over the parameter
	start = time.time()
	gs = GridSearchCV(classifier, params, n_jobs = -1, verbose = 1)
	gs.fit(trainX, trainY)

	# print diagnostic information to the user and grab the
	# best model
	print "\ndone in %0.3fs" % (time.time() - start)
	print "best score: %0.3f" % (gs.best_score_)
	print "RBM + LOGISTIC REGRESSION PARAMETERS"
	bestParams = gs.best_estimator_.get_params()

	# loop over the parameters and print each of them out
	# so they can be manually set
	for p in sorted(params.keys()):
		print "\t %s: %f" % (p, bestParams[p])

	# show a reminder message
	print "\nIMPORTANT"
	print "Now that your parameters have been searched, manually set"
	print "them and re-run this script with --search 0"

We define our pipeline on Lines 100-102, consisting of our Restricted Boltzmann Machine and a Logistic Regression classifier.

However, now we have more parameters to search over than just the coefficient C of the Logistic Regression classifier. Now we also have to search over the number of iterations, number of components (i.e. the size of the resulting feature space), and the learning rate of the RBM. We define this search space on Lines 108-112.

We start on the grid search on Lines 115-117.

The optimal parameters for the pipeline are then displayed on Lines 121-129.

To determine the optimal values for our pipeline, execute the following command:

$ python rbm.py --dataset data/digits.csv --test 0.4 --search 1

You might want to make a cup of coffee or go for nice long walk while the grid space is searched. For each of our parameter selections, a model has to be trained and cross-validated. It’s definitely not a fast operation. But it’s the price you pay for optimal parameters, which are crucial when utilizing a Restricted Boltzmann Machine.

After a long walk, you should see that the following optimal values have been selected:

rbm__learning_rate: 0.01
rbm__n_iter: 40
rbm__n_components: 200
logistic__C: 1.0

Awesome. Our hyper-parameters have been tuned.

Let’s set these parameters and evaluate our classification pipeline:

# otherwise, use the manually specified parameters
else:
	# evaluate using Logistic Regression and only the raw pixel
	# features (these parameters were cross-validated)
	logistic = LogisticRegression(C = 1.0)
	logistic.fit(trainX, trainY)
	print "LOGISTIC REGRESSION ON ORIGINAL DATASET"
	print classification_report(testY, logistic.predict(testX))

	# initialize the RBM + Logistic Regression classifier with
	# the cross-validated parameters
	rbm = BernoulliRBM(n_components = 200, n_iter = 40,
		learning_rate = 0.01,  verbose = True)
	logistic = LogisticRegression(C = 1.0)

	# train the classifier and show an evaluation report
	classifier = Pipeline([("rbm", rbm), ("logistic", logistic)])
	classifier.fit(trainX, trainY)
	print "RBM + LOGISTIC REGRESSION ON ORIGINAL DATASET"
	print classification_report(testY, classifier.predict(testX))

	# nudge the dataset and then re-evaluate
	print "RBM + LOGISTIC REGRESSION ON NUDGED DATASET"
	(testX, testY) = nudge(testX, testY)
	print classification_report(testY, classifier.predict(testX))

To obtain a baseline accuracy, we’ll train a standard Logistic Regression classifier on the raw pixel feature vectors (no unsupervised learning) on Lines 140 and 141. The accuracy of the baseline is then printed out on Line 143 using the classification_report function.

We then construct our BernoulliRBM + LogisticRegression classifier pipeline and evaluate it on our testing data on Lines 147-155.

But what happens when we nudge our testing set by translating each image one pixel up, down, left, and right?

To find out, we nudge our dataset on Line 162 and then re-evaluate it on Line 163.

To evaluate our system, issue the following command:

$ python rbm.py --dataset data/digits.csv --test 0.4

After a few minutes, we should have some results to look at.

Results

The first set of results is our Logistic Regression classifier trained strictly on the raw pixel feature vectors:

LOGISTIC REGRESSION ON ORIGINAL DATASET
             precision    recall  f1-score   support

          0       0.94      0.96      0.95       196
          1       0.94      0.97      0.95       245
          2       0.89      0.90      0.90       197
          3       0.88      0.84      0.86       202
          4       0.90      0.93      0.91       193
          5       0.85      0.75      0.80       183
          6       0.91      0.93      0.92       194
          7       0.90      0.90      0.90       212
          8       0.85      0.83      0.84       186
          9       0.81      0.84      0.83       192

avg / total       0.89      0.89      0.89      2000

Using this approach, we were able to achieve 89% accuracy. Not bad for using just the pixel intensities as our feature vectors.

But look what happens when we train our Restricted Boltzmann Machine + Logistic Regression pipeline:

RBM + LOGISTIC REGRESSION ON ORIGINAL DATASET
             precision    recall  f1-score   support

          0       0.95      0.98      0.97       196
          1       0.97      0.96      0.97       245
          2       0.92      0.95      0.94       197
          3       0.93      0.91      0.92       202
          4       0.92      0.95      0.94       193
          5       0.95      0.86      0.90       183
          6       0.95      0.95      0.95       194
          7       0.93      0.91      0.92       212
          8       0.91      0.90      0.91       186
          9       0.86      0.90      0.88       192

avg / total       0.93      0.93      0.93      2000

Our accuracy is able to increase from 89% to 93%! That’s definitely a significant jump!

But now the problem starts…

What happens when we nudge the dataset, translating each image one pixel up, down, left, and right?

I mean, these shifts are so small they would be barely (if at all) recognizable to the human eye.

Surely that can’t be a problem, can it?

Well, it turns out, it is:

RBM + LOGISTIC REGRESSION ON NUDGED DATASET
             precision    recall  f1-score   support

          0       0.94      0.93      0.94       784
          1       0.96      0.89      0.93       980
          2       0.87      0.91      0.89       788
          3       0.85      0.85      0.85       808
          4       0.88      0.92      0.90       772
          5       0.86      0.80      0.83       732
          6       0.90      0.91      0.90       776
          7       0.86      0.90      0.88       848
          8       0.80      0.85      0.82       744
          9       0.84      0.79      0.81       768

avg / total       0.88      0.88      0.88      8000

After nudging our dataset the RBM + Logistic Regression pipeline drops down to 88% accuracy. 5% below the original testing set and 1% below the baseline Logistic Regression classifier.

So now you can see the issue of using raw pixel intensities as feature vectors. Even tiny shifts in the image can cause accuracy to drop.

But don’t worry, there are ways to fix this issue.

How Do We Fix the Translation Problem?

There are two ways that researchers in neural networks, deep nets, and convolutional networks address the shifting and translation problem.

The first way is to generate extra data at training time.

In this post, we nudged our dataset after training to see its affect on classification accuracy. However, we can also nudge our dataset before training in an attempt to make our model more robust.

The second method is to randomly select regions from our training images, rather than using them in their entirety.

For example, instead of using the entire 28 x 28 image, we could randomly sample a 24 x 24 region from the image. Done enough times, over enough training images, we can mitigate the translation issue.

What's next? We recommend PyImageSearch University.

Course information:
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: July 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 86 Certificates of Completion
✓ 115+ hours hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this blog post we’ve demonstrated that even small, one pixel translations in images that are nearly indistinguishable to the human eye are able to hurt the performance of our classification pipeline.

The reason we see this drop in accuracy is because we are utilizing raw pixel intensities as feature vectors.

Furthermore, translations are not the only deformation that can cause loss in accuracy when utilizing raw pixel intensities as features. Rotations, transformations, and even noise when capturing the image can have a negative impact on model performance.

In order to handle these situations we can (1) generate additional training data in an attempt to make our model more robust and/or (2) sample randomly from the image rather than using it in its entirety.

In practice, neural nets and deep learning approaches are substantially more robust than this experiment. By stacking layers and learning a set of convolutional kernels, deep nets are able to handle many of these issues.

Still, this experiment is important if you are just starting out and using raw pixel intensities as your feature vectors.

Either be prepared to spend a lot of time pre-processing your data, or be sure you know how to utilize classification models to handle situations where your data may not be as “clean” and nicely pre-processed as you want it.

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

About the Author

Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.

21 responses to: Applying deep learning and a RBM to MNIST using Python

Joel

June 24, 2014 at 9:06 pm

Super interesting. Though if you’re arguing that its the nature of the RBM that’s causing this loss in accuracy (rather than testing on data that was not trained upon), it would be great to see how plain logistic performed against the nudged dataset.
- Adrian Rosebrock
  
  June 25, 2014 at 6:25 am
  
  It’s definitely not the nature of the RBM, anytime you take a supervised learning model and perturb the testing set, you will see a loss in accuracy — this is especially true when working with raw pixel intensities.
  
  The bigger point that I am trying to make is that extremely special care needs to be taken when using raw pixel intensities as feature vectors. Even tiny one pixel translations or rotations can really hurt performance, hence the incredible care that needs to be placed in pre-processing.
  
  This is why Convolutional Neural Nets are the best performing classification models for image datasets (for the time being, at least). They are able to abstract the raw pixel intensities and learn a set of convolution kernels that can (somewhat) account for the translations and rotations.
vin

December 12, 2015 at 8:08 pm

hi adrian!

why did you use RBM for dimensionality reduction, rather than another technique like PCA? thanks!
- Adrian Rosebrock
  
  December 13, 2015 at 7:36 am
  
  Simply put: reconstruction and probability. RBMs are generative, stochastic network that learns a probability distribution over a set of networks. PCA uses an eigenvalue decomposition to find the most informative components. These components can be used to “reconstruct” an input. RBMs learn a distribution over these inputs which can be used to reconstruct the inputs; however, the “intermediate” components that RBMs learn are more suitable as “features” and inputs to classifiers.
  
  As for when or where you should use RBMs vs. PCA, that’s quite problem specific and you should “spot check” your algorithms to see which gives better performance.
Ying Yi Wu

March 13, 2016 at 8:40 am

After the images in train.txt haven been trained, and the images in val.txt have been tested.
How to get the “precision” and “recall” from the val.txt file (which has been tested by Caffe model)?
- Adrian Rosebrock
  
  March 13, 2016 at 10:13 am
  
  This blog post doesn’t cover how to use Caffe, but if you already have a model trained using Caffe and want to test it against other data points, I recommend using the classification_report function inside of scikit-learn.
Vimal

April 13, 2016 at 9:38 pm

Adrian,

I am having trouble understanding the data type that is provided by mnist. there are enough examples on the web on how to use mnist dataset and python.

but i want to train my own dataset and mix my data with mnist. I also want to train to recognize characters.

usually i go about creating a 28×28 image for training. but i don’t quite seem to understand the data types part. is there a tool to convert the mnist to gifs or jpgs?
- Adrian Rosebrock
  
  April 14, 2016 at 4:49 pm
  
  Are you referring to the binary MNIST dataset? Or the MNIST dataset provided by scikit-learn? I would recommend using the scikit-learn representation. It’s simply a NumPy array where each row is flattened 28 x 28 image, thus each row has 784 entries. If you would like to train your own classifier, then you need to “flatten” your images in the same manner, where each row is a single image. I detail how to do this in more detail inside Practical Python and OpenCV.
Zheo long er

November 28, 2016 at 9:31 pm

Hi Adrian!
What’s the real meaning/relationship between hidden and visible units, when using RBM model?
- Adrian Rosebrock
  
  November 29, 2016 at 7:58 am
  
  I’m not sure what you are saying by “real meaning”? Can you please elaborate?
  - Zheo long er
    
    November 30, 2016 at 2:23 am
    
    HI！Adrian:
    first thanks for your reply.when we use PCA to reduce dimension,we can get k new features ,and thoes features liner combination of original features.so my question is that when we use RBM to reduce dimension we can get some new features ,but i don’t know what’s the relationship between hidden units and visible units ,that is to say how can we explain the new feature and original features when we use RBM model to reduce dim.thanks
    - Adrian Rosebrock
      
      December 1, 2016 at 7:40 am
      
      Think of the purpose of a RBM as to perform a reconstruction of the original data but using lesser inputs. In the same way that PCA does dimensionality reduction to reduce dimensions we can use these principal components to reconstruct the original inputs. The same is true for an RBM. This tutorial does an excellent job explaining the relationship between input and hidden units.
Soham Jani

December 1, 2016 at 1:53 pm

Thank you for such an informative tutorial.
Is there a way to have an RBM network with 2 or more hidden layers using sklearn?
I’m using a leaf data set, which does not have the pixel information, but instead has features obtained from the images, like texture shape, etc. I’m surprised that the combination provides almost a 96% accuracy. Does this seem strange ?
- Adrian Rosebrock
  
  December 5, 2016 at 1:48 pm
  
  An RBM is only intended to have visible and hidden nodes. You can stack multiple RBMs on top of each other to obtain a Deep Belief Network (DBN). Is that what you mean? I would suggest reading more about DBNs and classification before continuing.
Gavin Hartnett

January 27, 2017 at 1:58 pm

Thanks for the great article! Very helpful. I have one confusion however.

You say:
“The BernoulliRBM implementation (as the name suggests), consists of binary visible units and binary hidden nodes.”

But then you later say:
“The BernoulliRBM assumes that the columns of our feature vectors fall within the range [0, 1]. However, the MNIST dataset is represented as unsigned 8-bit integers, falling within the range [0, 255]. To scale the columns into the range [0, 1], all we need to do is define a scale …”

Why are you allowed to take the MNIST visible units to be real valued in [0,1] when the RBM model assumes binary values? Thanks!
- Adrian Rosebrock
  
  January 28, 2017 at 6:51 am
  
  Hey Gavin — you are correct. BernoulliRBMs are intended for binary units. However, keep in mind that the MNIST dataset are (essentially) binary images. The foreground is represented as “white” (255) while the background is black (0). Dividing by 255 yields values of 0 and 1. Thus, they can be fed into the RBM.
Gavin Hartnett

January 28, 2017 at 8:59 am

Hi Adrian,

Thanks for the reply. Sorry, but I am still a bit confused because your scaling doesn’t give strictly binary values (right?).

Starting with the 0-255 valued discrete MNIST data, one option would be to process the data to be binary, perhaps by keeping the 0’s as is, and letting any pixel not 0 be 1 (for on). Another option would be to make up a rule, like any pixel with intensity > 100 is set to 1 and any <= 100 is set to 0. Then the Bernoulli RBM would be appropriate for the processed data because the values would be strictly binary.

Obviously doing any of these options is less than ideal because you loose some information, so in some sense letting the data be real valued in the interval [0,1] might be preferable. But then, strictly speaking, the Bernoulli RBM model isn't appropriate as it assumes binary values.

So would I be correct in interpreting this implementation of the Bernoulli RBM as not being strictly correct (because the underlying mathematical formulae of the model rely on the broken assumption of strictly binary values), but nonetheless the implementation preforms well and is probably a better option than one of the above scenarios I outlined above where information is lost? I understand that machine learning is a mix of mathematics and engineering, and at the end of the day a given algorithm is being used for preform some task, and the important thing is whether it does that task well, not whether it is being 100% consistently implemented. Is this the crux of the matter here? By the way, I don't intend this to be a criticism, but I've seen a few people do this rescaling and I'm trying to make sure I understand it!
- Adrian Rosebrock
  
  January 29, 2017 at 2:51 pm
  
  I think you might have been confused by my original comment. If assume the MNIST digits are already thresholded, then we have two pixels values: 255 (white, the foreground) and 0 (black, the background). If you divide all pixel values by 255, then they are all in the range [0, 1]. In fact, if the MNIST images are thresholded the only possible values are 0 and 1.
  
  You are also correct in saying that machine learning is a mix of mathematics and engineering. We often relax some of the strict theoretical concepts if they work well in practice.
  
  A better option for MNIST classification would be to use a Convolutional Neural Network (CNN). I cover LeNet for digit recognition here. More information on deep learning can be found inside the PyImageSearch Gurus course and in my upcoming deep learning book.
Gavin Hartnett

January 29, 2017 at 5:49 pm

Ah, I see. Good, then everything makes sense now, thanks very much!
Naz

March 26, 2017 at 4:59 am

Hi. Where can I actually download the appropriately formatted MNIST .csv file?
- Adrian Rosebrock
  
  March 28, 2017 at 1:08 pm
  
  The MNIST .csv file is included in the “Downloads” section of this tutorial.

Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.

Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog.

Looking for the source code to this post?

The MNIST Dataset

Applying a RBM to the MNIST Dataset Using Python

Results

How Do We Fix the Translation Problem?

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

21 responses to: Applying deep learning and a RBM to MNIST using Python

Comment section

PyImageSearch University

Training a custom dlib shape predictor

Multiple cameras with the Raspberry Pi and OpenCV

AugMix with KerasCV Breakdown (Part 1): Introduction to AugMix

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

The MNIST Dataset

Applying a RBM to the MNIST Dataset Using Python

Results

How Do We Fix the Translation Problem?

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Reader Interactions

How 1 pixel shifts in images can kill your RBM performance

Histogram of Oriented Gradients and Object Detection

21 responses to: Applying deep learning and a RBM to MNIST using Python

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?