Table of Contents
- Introduction to Natural Language Processing (NLP)
- Natural Language Processing
- Preface
- The Bare Beginnings of Natural Language Processing
- Natural Language Processing Finds Its Footing
- Rise of NLP
- Summary
Introduction to Natural Language Processing (NLP)
Many times, the brilliance of the human brain is taken for granted. Every day, we process a myriad of information, all pertaining to different domains.
We see things with our eyes. We classify the objects we see into different groups. We apply mathematical formulae in our jobs. Even our mode of communication requires processing of information by the brain.
All of these tasks are done in less than a second. The final goal of artificial intelligence has long been to recreate the brain. But for now, we are held back by several constraints like computational power and data.
It is extremely difficult to build machines capable of doing multiple tasks simultaneously. Hence, we categorize problems and split them mainly into Computer Vision and Natural Language Processing.
We have become adept at making a model deal with image data. Images have underlying patterns visible to the eyes, and at their core, images are matrices. The patterns in numbers are now identifiable, especially by the progress made with convolutional neural networks.
But what happens when we move into the Natural Language Processing (NLP) domain? How do you make a computer understand the logic behind language, semantics, grammar, etc.?
This tutorial marks the beginning of our NLP series. In this tutorial, you will learn about the evolution of NLP through the ages.
This lesson is the 1st in a 4-part series on NLP 101:
- Introduction to Natural Language Processing (NLP) (today’s tutorial)
- Introduction to the Bag-of-Words (BoW) Model
- Word2Vec: A Study of Embeddings in NLP
- Comparison Between BagofWords and Word2Vec
To learn about the history of Natural Language Processing, just keep reading.
Introduction to Natural Language Processing (NLP)
Natural Language Processing
Since images are matrices at their core, convolutional filters can easily help us detect features of images. The same can’t be said for language. The most you can do with CV techniques is teach a model to identify letters from images.
Even that leads to training with 26 labels, and in general, it is a very bad approach since we do not capture the essence of language at all. So what do we do here? How do we solve the mystery of language (Figure 1)?
Let’s start with a big spoiler; We are currently in the age of language models like GPT-3 (Generative Pre-Trained Transformer 3) and BERT (Bidirectional Encoder Representations from Transformers). These models are more than capable of holding conversations with us, following perfect grammar and semantics.
But where did it all start?
Let’s take a brief look at NLP through history.
Preface
Humans created language as a medium of communication to share information more efficiently. We were intelligent enough to create complex paradigms on which to base language. Language has gone through extensive changes throughout history, but the essence of sharing information through it has remained intact.
When we hear the word apple, the image of a fresh red oval fruit pops up in our heads. We can instantly associate the word with the image we have in our heads. What we see, what we touch, and what we feel. Our complex nervous systems react to these stimuli, and our brain helps us categorize these feelings into set words.
But here we are dealing with a computer, which only understands what a 0
or a 1
is. Our rules and paradigms do not apply to a computer. So how do we explain something as complex as language to a computer?
Before we do that, it is important to understand that our own understanding of language wasn’t as sharp as it is today. Language, as a science, is what the subject of linguistics encompasses. Consequently, natural language processing becomes a subset of linguistics itself.
So let’s take a little detour to how linguistics itself developed into what it is today.
The Bare Beginnings of Natural Language Processing
Linguistics, in itself, is the scientific study of human language. That means it takes an approach of a thorough, methodical, objective, and accurate examination of all aspects of language. Needless to say, a lot of foundations in NLP have direct links to Linguistics.
Now you may ask what that has to do with our journey today. The answer lies in the story of a man who is now considered the father of 20th century linguistics, Ferdinand de Saussure (Figure 2).
During the first decade of the 20th century, de Saussure taught a course at the University of Geneva, which utilized an approach of describing languages as systems.
A renowned Russian linguist Vladimir Plungyan later stated,
The essence of the ‘Saussurean revolution’ in linguistics was that language was prescribed to be viewed not as a chaotic totality of facts but as an edifice in which all elements are bound to one another (source).
To de Saussure, an acoustic sound in language represents a notion that changes according to the context.
His posthumous publication “Cours de linguistique générale” put his structuralist approach toward language to the center stage of linguistics.
The structuralist approach of viewing language as a system is something we can prevalently see in modern-day NLP techniques. According to de Saussure and his students, the answer lies in viewing Language as a system where you could correlate elements to each other, leading to identifying contexts through causation.
Our next stop was the 1950s when Alan Turing published his famous “Computing Machinery and Intelligence” article, now known as the Turing Test. This test determines the ability of a computer program to impersonate a human being in a real-time conversation with an independent human judge present (Figure 3).
Although there are several limitations to this test, several checks inspired by this test are still used. Most notably, the CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) pops up every now and then when browsing the internet.
The Turing Test was popularly known as “The Imitation Game” since the test aimed to see if a machine could imitate a human being. The original article, “Computing Machinery and Intelligence,” asks, “Can machines think?” The big question that arose here was whether imitation equaled the ability to think for yourself.
In 1957, Noam Chomsky’s “Syntactic Structures” took a rule-based approach but still managed to revolutionize the NLP world. However, The era presented its own problems, especially with computation complexity.
There were a few inventions after this, but the staggering problem brought in by computational complexity seemed to halt any significant progress.
So what happened after researchers slowly had access to sufficient computing power?
Natural Language Processing Finds Its Footing
Once the dependency on complex hardcoded rules eased, stellar results were achieved using early machine learning algorithms like decision trees. However, the breakthrough occurred due to something entirely different.
The rise of statistical computing in the 1980s also found its way into the NLP domain. The foundation of these models lies in the mere ability to assign weighted values to input features. This consequently means that instead of basing itself on complex paradigms, the input would invariably dictate the decisions taken by the model.
One of the easiest examples of statistics-based NLP is n-grams, where the concept of a Markov model (present state dependent only on the directly previous state) is used. Here, the idea is to identify an interpretation of words in context to its neighboring words.
One of the most successful concepts that pushed the NLP world forward was Recurrent Neural Networks (RNNs) (Figure 4).
The idea behind RNNs was ingenious yet devastatingly simple. We have one recurrent unit through which the input x1
is passed. The recurrent unit gives us an output y1
and a hidden state h1
, which carries the information from x1
.
The input to an RNN is a sequence of tokens representing a sequence of words. This is repeated for all inputs, and consequently, the information from previous states is always preserved. Of course, RNNs weren’t perfect and were replaced with stronger algorithms (e.g., LSTMs and GRUs).
These concepts used the same overall idea behind RNNs but introduced some additional efficiency mechanisms. LSTM (long short-term memory) cells came with three pathways or gates: the input, output, and forget gates. LSTMs looked to solve the long-term dependency problem, where it could correlate inputs with their long previous sequences.
However, LSTMs brought in the problem of complexity. Gated Recurrent Units (GRUs) solved that by reducing the number of gates and toning down on the complexity of LSTMs.
Let’s take a moment to appreciate the fact these algorithms came out in the late 1990s and early 2000s, when computational power was still an issue. So let’s look at the feats we have achieved with ample computational power.
Rise of NLP
Before we move further, let’s walk through an interpretation of how computers can understand language. A computer can create a matrix where the columns refer to contexts in which the words in the rows are assessed (Table 1).
Alive | Wealth | Gender | |
Man | 1 | -1 | -1 |
Queen | 1 | 1 | 1 |
Box | -1 | 0 | 0 |
Table 1: Embeddings. |
Since we cannot physically spoon-feed the meaning of words to the computer, why not create a sphere of finite contexts in which we will express the word?
Here, the word Man
has the value 1
under the column Alive
, -1
under the column Wealth
, and -1
under Gender
. Similarly, we have the word Queen
with a value of 1
, 1
, and 1
under Alive
, Wealth
, and Gender
.
Notice how the Gender
and Wealth
columns have polar values for these two words. What if that’s how we can explain to the computer that Man
is poor, while Queen
is rich, or Man
is male, while Queen
is female?
So we try to “represent” each word in a finite N-dimensional space. The model understands each word based on its weight in each N dimension. First seen in 2003, this representation learning approach has been widely used in the NLP world since the 2010s.
In 2013, the word2vec
series of papers were published. It used the concept of representation learning (embeddings) by expressing words in an N-dimensional space and, by definition, as vectors existing in that space (Figure 5).
Depending on how good the input corpus is, proper training will show that words with similar contexts end up together when expressed in visible space, as seen in Figure 5. Based on how good the data is and the frequency of a word used in a similar context, its meaning depends on its neighboring words.
This concept blew open the NLP world yet again, and to this day, embeddings play a huge role in all subsequent research that has come up. The notable spiritual follower of Word2Vec was the FastText series of papers, which introduced the notion of subwords to empower models even more.
In 2017, the concept of Attention came about, which made a model focus on the relevance of each input word in relation to each of the output words. The bewildering concept of Transformers is based on a variant of Attention known as self-attention.
Transformers have produced strong enough models even to beat the Turing Test easily. That itself is a testament to the distance we have come in the journey of teaching computers how to understand language.
Recently, the GPT-3 model created a huge buzz when task-trained GPT-3 models popped up on the web. These models could flawlessly hold a conversation with any human, which also became a subject of amusement since fine-tuning them for different tasks created extremely interesting results.
Let’s see how good grasp transformers have on language (Figure 6).
With a few starting tokens provided, GPT-Neo 1.3B, EleutherAI’s GPT-3 replication model gives us a small paragraph as the output, with the utmost respect to syntactic and semantic rules.
At one point, NLP was deemed too expensive, and its research was severely stopped. We lacked computation power and access to data. Now we have models that can keep up a conversation with us, not even suspecting that we are talking to non-humans.
However, if you wonder what the 1.3B in GPT-Neo’s name stands for, it is the number of parameters inside the model. This speaks volumes about how much computational complexity today’s state-of-the-art (SOTA) language models possess.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
A brief walk-through of NLP’s history showed that research had started a long time back. Researchers used the foundation laid by our understanding of human language in linguistics and had the right idea of how to take NLP forward.
However, the limitations of technology became the biggest roadblock, and at one point, research in this domain was almost stalled. But technology only goes one way, that is forward. The developments in technology provided NLP researchers with adequate computing power and broadened many horizons.
We are at a stage where language models have helped create virtual assistants that can converse with us, help us with our tasks, etc. Imagine the world has reached a point where a blind person can ask a virtual assistant to describe an image, and it can do that flawlessly.
But this kind of progress comes at the expense of severe computing power requirements and, above all, access to tons and tons of data. Language is such a topic that applying augmentation techniques as we do in images cannot help us at all. Hence, the subsequent line of research focuses on somehow decreasing these mammoth requirements.
Even so, the growth of NLP through the years is laudable. The concepts are ingenious yet intuitive. The next blogs of this series will focus on going into the modern NLP concepts in more detail.
Citation Information
Chakraborty, D. “Introduction to Natural Language Processing (NLP),” PyImageSearch, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, R. Raha, and A. Thanki, eds., 2022, https://pyimg.co/60xld
@incollection{Chakraborty_2022_NLP, author = {Devjyoti Chakraborty}, title = {Introduction to Natural Language Processing {(NLP)}}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha and Abhishek Thanki}, year = {2022}, note = {https://pyimg.co/60xld}, }
Unleash the potential of computer vision with Roboflow - Free!
- Step into the realm of the future by signing up or logging into your Roboflow account. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.
- Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch’s comprehensive library, crafted to cater to a wide range of requirements.
- Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.
- Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.
- Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.