Disclaimer: This post is a bit cynical in tone. In all honesty, I support deep learning research, I support the findings, and I believe that by researching deep learning we can only further improve our classification approaches and develop better methods. We all know that research is iterative. And sometimes we even explore methods decades old, applying only a slightly different twist, yielding significantly different results — and thus a new research area is born. That’s the way machine learning research works, as it should.
The following rant is actually more of an indictment of how we treat current “hot” machine learning algorithms — like “silver bullets” and the magic pill to cure our classification ailments. But these algorithms are not silver bullets, they are not magic pills, and they are not tools in a toolbox — they are methodologies backed by rational thought processes with assumptions regarding the datasets they are applied to. By spending a little bit more time thinking about the actual problem rather than blindly throwing a bunch of algorithms at the wall and seeing what sticks, I believe that we can only further the research.
I feel like every time I get on /r/machinelearning, HN, or DataTau, there’s something being said about deep learning — and more times than not, it just feels like hype.
And I’m not being negative because I think the research is a dead end. Far from it. It’s a fantastic research area and there is still far more left to explore.
I’m just sick of the hype.
Really, stop treating deep learning like Restricted Boltzmann Machines and Convolutional Neural Networks will solve all of your image classification woes.
Yes. They are powerful.
And yes, they are capable of tremendous classification accuracy…provided that they are applied to the right type of problem.
But also realize that deep learning is a hot topic in machine learning right now. And to a certain extent, there is a “bandwagon” trend that happens in the machine learning community — and it didn’t start with deep nets either.
Don’t believe me? Read on.
Why am I talking about deep learning on a computer vision blog?
Because let’s face it. Unless you are doing some very strict forms of image processing, you can’t have computer vision without some sort of machine learning.
From clustering, to forming a bag-of-words model, to soft codeword assignment, to learning distance metrics, to dimensionality reduction, to classification, regression (i.e. pose estimation using regression forests, thus making the Xbox 360 Kinect possible), computer vision utilizes machine learning in an incredible amount of tasks.
That all said, if you are working with computer vision, you’ll also likely be utilizing some sort of machine learning.
In terms of deep nets, computer vision and machine learning become even more entwined — look no farther than convolutional neural networks where we try to learn a set of kernels.
With the rise and fall of machine learning, the tide will thus affect computer vision as well.
And with the tides, also come the trends…
Perpetual Perceptron Troubles
Let me draw your attention to Rosenblatt’s Perceptron algorithm (1958). Following his publication, Percepton based techniques were all the rage.
But then, Minsky and Papert’s 1969 publication effectively stagnated research in neural nets for almost a decade, demonstrating that the Perceptron could not solve the exclusive-or (XOR) problem. Furthermore, the authors argued that we did not have the computational resources required to build and maintain large neural nets.
This single paper alone almost killed neural network research.
Bummer.
Luckily, the backpropagation algorithm and the research by Rumelhart (1986) and Werbos (1974) were able to bring back the neural net from what could have been an untimely demise.
Arguably, without the contribution of these researchers, deep learning may have very well never existed.
Support Vector Machines
Next up on the bandwagon: SVMs.
In the mid-90’s Cortes and Vapnik published their seminal Support-vector networks paper.
And you might as well thought machine learning was solved, even prompting Dr. Lipo Wang to say:
SVMs have been developed in the reverse order to the development of neural networks (NNs). SVMs evolved from the sound theory to the implementation and experiments, while the NNs followed more heuristic path, from applications and extensive experimentation to the theory.
That’s a pretty strong statement, especially in today’s context of deep learning.
And while I’m taking this quote (slightly) out of context, the real reason I am using this quote is to demonstrate that there was a time where machine learning researchers thought that SVMs had effectively “solved” classification for what it was.
SVMs were the future. Nothing could beat them…including neural networks.
Ironic, isn’t it? Because now all we can talk about is stacking Restricted Boltzmann Machines and training massive Convolutional Neural Nets.
But let’s keep this bandwagon going.
Trees. Trees. Trees.
Then, following the SVM craze, we had ensemble based methods.
Building on the work of Amit and Geman (1997), Ho (1998), and Dietterich (2000), the late Leo Brieman contributed his Random Forests paper to the machine learning community in 2001.
We hopped on the bandwagon again, loaded up a bunch of trees, threw in our shovels, and headed off to the closet nursery to setup camp.
And honestly, I’m no different — I drank the Random Forest Kool-Aid, so to speak. My entire dissertation involved how to utilize Random Forests and weak feature representations to outperform heavily engineered state-of-the-art approaches, fixated on single datasets.
And to this day I still find myself slightly biased towards ensemble and forest based methods.
Is this bias a bad thing?
I don’t think so. I think it’s natural, and even human to a degree, to be biased towards something you have painstakingly studied for a significant chunk of your life.
The real question is: can you do it without the hype?
Now we are in the present day. And there’s another “hot” learning model.
Deep learning, deeply flawed?
But it turns out, maybe we can do better do that ensemble based methods.
Maybe we can learn hierarchical feature representations using deep learning.
Sounds awesome, right?
But now we’re on yet another bandwagon. Let’s just stack a bunch of RBMs and see what happens!
I’ll tell you what happens. You leave your model to train, cross-validate, and grid search parameters for over a week (and maybe longer, depending on how large your net is and the computational resources at your disposal) just to have your accuracy increase by a tenth of a percent on ImageNet.
Okay, so I’m being very cynical right now. I’ll admit to that.
But here’s the problem: we need to stop treating machine learning algorithms like they are a silver bullet.
The fact is, there is no silver bullet when it comes to machine learning.
Instead, what we have is an amazing, incredible set of algorithms with both theoretical assumptions and empirical evidence, demonstrating they are capable of solving a certain subset of classification problems.
The goal here is to be able to identify the algorithms that perform well in certain domains, not claim that one method is the end-all to machine learning, marking classification as “case closed”.
That all said, I’m honestly not trying to bash deep learning. These deep nets are incredibly powerful, as the scientific community has shown. And I wholeheartedly support their research and findings.
Intriguing properties of neural networks
However, the latest article by Google, Intriguing properties of neural networks, has suggested there is a gaping hole lurking in every deep neural net.
In their paper, the authors are able to construct “adversarial images” — that is, taking an image and perturbing the pixel values in such a way that it makes it (effectively) identical to human eye, but can lead to a mis-classification by the deep net.
These adversarial images were constructed in a fairly involved manner — the authors purposely adjusted pixel values in an image to maximize the network’s prediction error, leading to an “adversarial image”, that when used as input to the net, is nearly always misclassified, even when applied to different neural nets trained on different subsets of the data.
And if these small changes in images (that are again, for all intents and purposes, completely undetectable to the human eye) can lead to performance completely falling off a cliff, what does that imply for real-world datasets?
Because let’s face it, real-word datasets are not clean like MNIST. They are messy. They often contain noise. And they are far from perfect — this is especially true when we migrate our algorithms from academia to industry.
So, in practice, what does it mean?
It means that methods learning from raw pixel based features still have a long way to go.
Deep learning is here to stay. And honestly, I think it’s a good thing.
There is some incredible researching going on right now, and I personally get excited over Convolutional Neural Nets — I think for the next five years Convolutional Neural Nets will continue to dominate in certain image classification challenges, such as ImageNet.
I also hope the deep learning field stays active (I believe it will), because no matter what, our research and insights gained from studying deep nets will only help us create an even better approach years from now.
But in the meantime, maybe we can drop the buzz down just a little?
The Takeaway:
There is no single machine learning model that is the “silver bullet” to solve all your problems.
In fact, it’s best if we don’t treat machine learning models as tools in our toolbox at all — I believe that is where most of our problems come from.
Instead, we need to spend a lot more time thinking about the actual problem we are trying to solve instead of throwing a bunch of algorithms at the problem and seeing what sticks.
Because when we sit down and think about a problem, when we take the time to not only understand what our feature space “is” and what it “implies” in the real-world — then we are acting like machine learning scientists. Otherwise, we just a bunch of machine learning engineers, blindly performing black box learning and operating a set of R, MATLAB, and Python libraries.
The takeaway is this: machine learning isn’t a tool. It’s a methodology with a rational thought process that is entirely dependent on the problem we are trying to solve. We shouldn’t blindly apply algorithms and see what sticks. We need to sit down, explore the feature space (both empirically and in terms of real-world implications), and then consider our best mode of action.
Sit down, take a deep breath. And invest the time to think it through.
And most importantly, avoid the hype.
What's next? We recommend PyImageSearch University.
86+ total classes • 115+ hours hours of on-demand code walkthrough videos • Last updated: February 2025
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Coming Up:
In my next post, I’ll show you how only a single pixel shift in an image can kill your Restricted Boltzmann Machine performance.
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
I don’t think hyping up anything is a bad thing. I think hype is necessary in figuring out just how well deep learning or other methods can perform in general. Hype makes people try deep learning on things most people wouldn’t normally think are a deep learning application. This allows a method to be pushed to its limits which was done with many methods previous like random forests which you mentioned.
This reminds me of how in fighting games when a new technique is discovered nearly everyone tries to use it in every situation possible to see how good it is. Eventually (sometimes never) people figure out how to counter that technique and it ends up being decent but you end up learning a lot after everything is said and done.
I think hype is a bad thing for two reasons. First, we generally don’t publish papers with negative results that can bring out the bad parts of an algorithm. Secondly, industry mostly follows academia so if they think deep learning is the answer to all problems, they will focusing on it leading to discontent.
I agree with you when saying that not being able to publish papers with negative results leads to a lot of shortcomings. As far as industry following academia, I don’t entirely agree with that statement, but it definitely has merit.
How about the other classifiers, such as SVM? Will the `adversarial images’ also kill SVM? Many classification pipelines these years focus on combining deep learning features with SVM. Is it a possible to avoid the ‘Intriguing properties’ of neural nets?
It honestly depends on how you generate the adversarial images — it’s not clear if the adversarial images generated from a deep net would also be mis-classified by a SVM. My intuition tells me it would, especially on a large dataset such as ImageNet where Convolutional Neural Nets dramatically outperform SVMs.
However, you could take the same approach and generate adversarial images specifically for SVMs. You would just tweak the adversarial image to maximize the loss of the SVM rather than the net.
You might consider reading about Perceptra – Yet another deep learning pattern classification algorithm with a few new twists / ideas that you may not have come across. No stochastics / probabilities and real simple binary nodes. Certainly not the silver bullet. 🙂
http://www.adaptroninc.com/html/adaptron_inc_-_perceptra.html
Cheers
Brett Martensen
That is not really something new, and certainly no reason for such a frustrated rant 🙂
Research proceeds in waves. Now it’s deep learning, and when the dust will begin to settle, we will see people working on combining the lessons learned from deep learning with new stuff. Personally, I think it will be structured methods combined with DL.
My prediction is, that at some point we will get deep higher level reasoning, and we will move away from pure feed forward networks.
But all this takes time, lot’s of time.
Christian
Nice…….
im currently working on Food recognition…its quite hard…food almost share the same color many are look similar
@Christian Wolf
Now it’s deep learning. The source of the rant, I think, is that deep learning is vogue for what – the third time since the perceptron? But (@Adrian) part of the reason it’s become popular again is dropout (pdf: http://arxiv.org/pdf/1207.0580.pdf), which is I think a significant advance for ensemble methods in general.
YES! YES! YES! I do research in compbio and I am so sick of this hype too. It is nauseating and what is worse is that people use these techniques and really really *DO NOT UNDERSTAND* them. If I have to listen to another talk from a researcher using these techniques who don’t really understand probability theory or statistics (probably never taking a formal class or even independent, thorough study of either), I will lose my mind. Trust me, if more people actually have very strong, rigorous training in probability theory and/or theoretical statistics, people would realize why these techniques are NOT A BIG DEAL. They do not offer really offer deep technical insight or breakthroughs in most cases. (Sorry to get on my own soapbox and rant, but I did.)
What is your opinion about using Hidden Markov models ? Also can you share your opinion on hybrid classification model ? I have read several papers using Hybrid classification models like Optimization models (G.A) with Learning models (ANN). I am just a learner in ML. But all of a sudden I started to see that people talking Deep learning everywhere. Particularly, Andrew Ng famous ML prophet predicted this to be used everywhere. Not sure of his claims. Anyway please contribute more articles like this. Could be useful for self-learners like me. Thank you
My opinion on any classification model, whether HMM, deep learning, hybrid classification, etc. can be answered by asking the question “Does the problem fit the technique?” When it comes to datasets like ImageNet, convolutional neural nets obtain the highest accuracy. They obviously fit the problem. But would I try to use a CNN to solve a simple XOR problem? Nope. The problem doesn’t fit the technique — there are a many other simpler solutions. The issue is that when we have a big powerful tool, we want to treat it like a hammer, and that everything else is a nail. That’s just not the case. And ss long as you ask yourself that question, you should be in good shape.
How about an article on forests and vision intertwined??
I can certainly do that, thank you for the suggestion!
Hi Adrian
With your recent experiments with deep learning, will you be re-visiting this particular post? I ask because, the deep learning bandwagon or hype shows no sign of slowing down and if anything is getting stronger.
Google appears to be the leading proponent of deep learning and there is a “if it is good enough for Google, it is good enough for the rest of the world” following.
It has become obvious in the past 12 months that deep learning has failed in the initial focus on unsupervised learning but has triumphed with supervised learning. The success has come with the availability of Google-sized labeled data sets and heroic computations on large clusters of gpu’s. Essentially, the representation of objects by features generated from deep learning are superior to traditional methods ie. humans with domain knowledge.
New companies are appearing that address vertical sectors by essentially hoovering up vast amounts of labeled data, running deep learning on EC2 gpu’s and then presenting the classifications (patterns).
Anyway, be good to get your perspective again.
Hi Dinesh, thanks for the comment. As far as revisiting this post, my opinion still stands. There is no silver bullet to machine learning. And more importantly, I see more and more people applying deep learning to problems where simple classifier methods would work just as well. People seem to be armed with the hammer of deep learning, and now every problem looks like a nail.
That all said, I have done a few tutorial posts on this blog regarding deep learning, including setting up an Amazon EC2 instance to utilize the GPU so you might want to check those out. I’ll certainly be doing more tutorials about deep learning in the future, I think it’s a great topic to cover.
What if we do not use raw pixels at all and instead only use some higher abstraction of image data ? Would the adversarial images still break up a deep neural net ??
Thanks for the post and your blog. It is brilliant !!
Regards
CleverCrow
One of the main benefits of deep learning is the ability to learn abstract representations of the data without using hand-tuned features such as SIFT and HOG. Even if we used these hand-tuned features there will always be methods to “confuse” machine learning models, whether or not they fall into the “deep learning” class.
Nice…….
im currently working on Food recognition… and im on it right now
Disclaimer: I’m a layman on Machine Learning although with a degree in computer science.
After reading the article and the comments I must say I totally understand the ‘silver bullet’ issue; however, I must ask what is it that everyone is trying to achieve?
From a layman, Machine Learning is about making the machine learn so that we don’t need to build a specific solution to a specific problem; looking at the problem in order to understand which algorithm/framework to use, is a bit of ‘defeating the purpose’. We as humans don’t (consciously) swap parts of the brain to use the ones that fit a problem better.
On the other hand, and regarding the ‘adversarial images’, I’d say that should be put in perspective because there are ‘adversarial images’ to the human eye as well, as all of us know well.
So again, what is it that people are trying to achieve? Modelling how the human brain goes about to solve a problem? Achieve better results than the human brain? Equivalent results than the human brain?
Actually, I’d guess that the human brain although using neurons as the building block, uses techniques similar to trees and all the other classes of machine learning algorithms/frameworks discussed.
Just 2c and first post, so be gentle 😉
The objective here is to achieve best efficiency & accuracy in pattern recognition/classifications. In narrow domains the correct method can achieve a human like accuracy. In broad or multiple domains, the noise wins.
To your point, one could use/over-use a NNs, or SVMs, or HMMs as a general tool for a broad range or problems, however the quality of results will be poor or generally less than specific methods. Hence the right method for the right problem. There is no general classifier that can do it all with sufficient quality – no silver bullet.
I agree with you very much Daniel 🙂
I agree with the conclusion, Machine Learning is more than a black box, but a methodology. However this methodology continues to evolve as new techniques are discovered and invented.
The bigger question is will practitioners of other techniques (i.e. probabilistic networks, ensemble modeling, kernel methods etc.) evolve or will they continue to stick to their area of expertise?
Hi Adrian,
Thanks for your awesome tutorial. I am new to this Deep Learning and I got interested after going over your tutorial. I have a questions which you can help me out here.
I know there a number of pretrained networks available for usage which are trained on large amounts of data, therefore giving very good results. However, I want to combine the 2D image with the 3D features (surface Normals) from the Point Cloud but I cannot find a network that can handle this. Perhaps I missed something here. Do you know of some network that is available that can do this. Or perhaps you may suggest an alternative for this?
Thanks
Hey Rish — Great question, thanks for asking. However, 3D is not my area of expertise so I don’t keep up with that sub-field as well as I do for others. The only paper I have ran across that is related to what you want is this one. I would suggest giving it a read and following its references, as well as looking at who has cited it recently.
Hi Adrian,
Thanks for your reply. Actually, I did go briefly over the paper but I guess I will go over it in detail and try to implement. Besides I have one more question while going over the paper. In deep learning, like the traditional learning approaches do we also have to get set of negative samples or just positive samples good enough. However, in my case I am interested in transfer learning so I just want to clarify my doubts.
Thanks.
Adrian
You’ll be wanting to hang out on Prof Jianxiong Xiao’s site. http://vision.princeton.edu/people/xj/
He da man on deep 3D.
See, in particular, their Shapenets:
http://3dshapenets.cs.princeton.edu
Thanks for sharing! I’ll be sure to take a look.
From the No Free Lunch Theorem (which is at first shocking and then at second, obvious), there is no best classifier.
What’s good about deep is that it might be a unifying paradigm for feature learning/processing — layered representations. You could put most “traditional” vision algorithms into a layer, give it gradient “hooks” and tune it up in a back propagated error optimization framework.
My pet peeve is that this isn’t “AI”, it’s filter and classification. To get AI, you need a framework that does something like
1) transduction of sensory inputs to “mate” with a causal model
2) the causal model (simulation) that represents the problem environment complete with goals
3) translation of the causal results into outputs.
My pet peeve is that this isn’t “AI”
+1
I once had a paper reviewer describe my submission on genetic algorithms and information filtering as “another toy GA application.”
That burned 🙂 but the reviewer was right. I foresee most of these “promising” developments exposed as dead-ends once they leave their training set. There is a difference between a statistical model (hiding as a GA?) and actual “learning”.
Ouch, that must have hurt to have your paper rejected like that! But it’s great that you could see the positive in the end. It might have taken me a bit longer to arrive there 😉
shouldn’t it be “Get ON the deep learning bandwagon…” ?
It’s a play on words based on the phrase “get off the bandwagon”. This post explains why deep learning is still just a tool that works in specific situations. It’s not a silver bullet.
Hi Adrian,
I am working in the field of medical imaging with expert subject involved and an inherent issue with such work is the dataset is small compared to what CNNs require. I am doing something similar to image region classification where I have to learn features such as SIFT, HOG etc. based on classes. In this case how does one decide whether CNN will be the best and how can one defend any other algorithm that works with small dataset.
For very small datasets, it’s very unlikely that CNNs will be able to “learn” anything. That said, it’s common to apply “spot tests”, or more simply small experiments using a variety of feature extraction and machine learning algorithms. This allows you to use the empirical results (i.e., accuracy) to determine which route you should take.
Hi Lisa, I am also working in the Medical Imaging field. At the MICCAI conference this year, I believe there were some talks on transfer learning. Basically, the person (sorry, I can’t recall their name atm) used the filters from CNN trained on natural images and applied to their dataset and found that it was much better than their handcrafted features.
This might be a relevant paper by some dutch group: Transfer Learning Improves Supervised Image Segmentation Across Imaging Protocols (van Opbroek, Annegreet Ikram, M. Arfan Vernooij, Meike W. de Bruijne, Marleen)
How to build each layer and each node is crucial for any learning methodology
What does each node represent is a crucial problem.
Defining a math formula for each node is a radical error.
Believing number of layers will go up to hundreds is a funny error.
Programming in neural networks is extremely inefficient.
Neural networks will be dead soon
convnets are a massive memory algorithm with fancy convolutional feature extractio. It generalizes feature extraction and memorizes classification. it is really good alot of times. not so good sometimes.
Adrian, I found this googling around and its interesting that its 2016, everyone is crazy about Convolutional neural networks (CNN).
However, I second your thought. The recent ImageNet winner had 160+ layers which learned 150 million parameters and took weeks to learn. CNN might be state of the art, but definitely not the right answer.
I think “convolution”, in its raw sense is something which makes sense and will be around. A part of my research is to find lighter alternatives which are smaller and learn fast. I think we will see a lot more of these coming up in next few years.
Yeah, RESNet is super powerful. It’s very deep and it can obtain the highest accuracy on the ImageNet benchmark dataset. But it took 8 GPUs and 3 weeks to train. I think we’ve really started to hit the point of “diminishing returns” in terms of gains from the deeper we go.
Hi, Adrian
Thanks for your wonderful blog, especially your interactions with the readers.
Wonder how is the depth of the deep learning networks, in the above 160+ layers, was determined? Any underline methods to make it working? This is closely related to whether it can be considered as “AI” or just some extensions of optimisation.
Currently, many CNN researchers believe that the deep the network, the more powerful the network is and the more it can learn (provided you have enough training data). Thus, the general the intuition is that if you have a lot of data, you can train a deep network. I personally think this is true to an extent — I think we’re reaching the point of diminishing returns strictly by using layer depth. That said, determining the number of layers, the number of filters per layer, etc., is determined via hyperparameter tuning, the process of exploring different parameters of the network and seeing which one works best. I’ll be doing a blog post on hyperparameter tuning next week.
Hey Adrian
Stumbled upon your blog while googling, congrats on such a successful blog 🙂
I am a computer vision engineer, but I work on the industry where actual solutions for real problems are demanded, and I have a couple of questions regarding this topic.
Are Deep Nets being used on real-life problems?
Are they viable to be used on the industry, even if they require huge amounts of data and weeks of training time?
Can they perform in embedded systems with low CPU/GPU power?
Thanks 🙂
Deep Neural Networks are absolutely being used in real-life problems. In fact, Google has built an entire API around them, called the Google Vision API. Whether or not this is profitable in the long-run remains to be seen.
But in general, you can use larger, more powerful machines to train your network and then deploy the network on smaller GPU devices (provided the GPU has enough memory to hold the network, of course).