Table of Contents
3D Gaussian Splatting vs NeRF: The End Game of 3D Reconstruction?
In this tutorial, you will learn about 3D Gaussian Splatting.
This lesson is the last of a 3-part series on 3D Reconstruction:
- Photogrammetry Explained: From Multi-View Stereo to Structure from Motion
- NeRFs Explained: Goodbye Photogrammetry?
- 3D Gaussian Splatting vs NeRF: The End Game of 3D Reconstruction? (this tutorial)
To learn more about 3D Gaussian Splatting, just keep reading.
3D Gaussian Splatting vs NeRF: The End Game of 3D Reconstruction?
As a kid, there is a fable we all heard as children, it’s called “The Boy Who Cried Wolf.” Have you heard it before?
There was once a boy who looked after some sheep, and was bored. One day, he played a trick on the villagers, and shouted: “Wolf! Wolf!”
The villagers came up the hill to save the sheep, but when they got up there, there was no wolf. The boy laughed at them. “You must not tell lies!” said the villagers, going back down to the village.
Soon, the boy was bored again. He shouted, “Wolf! Wolf!”, and once again, the villagers rushed up the hill to save the sheep from the wolf. But all they found was the boy, laughing at them again, so the villagers went back to their village angry.
And everything changed at the end of the day when a wolf really came into the field to eat the sheep. The boy saw it, and started to shout “Wolf! Wolf!” But this time, no one came to help, and the wolf ate all the sheep.
This fable is interesting, because it’s very true in many aspects of our lives, and recently, I found it to be true in 3D Reconstruction… Over the past few years, we have kept hearing about new “breakthrough” techniques for 3D Reconstruction. Photogrammetry. Structure From Motion. NeRFs. Yet, every time we yelled, “This is the one!” The next few years were filled with a new technique.
Today’s technique is Gaussian Splatting, and in this article, we answer: “Is this for real this time?”
In the 1st blog of this series, you were introduced to Photogrammetry, which is based on 3D Reconstruction via heavy geometry. And in the 2nd blog of this series, you were introduced to NeRFs, which is 3D Reconstruction via Neural Networks, projecting points in the 3D space.
So, what’s the big idea behind Gaussian Splatting? Essentially, it is to go backward. Rather than projecting a point to the 3D world and then predicting its color, we project 3D Gaussians to the 2D world and then see if the color matches the true picture.
The process works in 3 steps:
- Step 1: Starting from a point cloud (obtained using photogrammetry)
- Step 2: For each point, initiate a 3D Gaussian
- Step 3: Refine the Gaussians, add colors, etc…
Want to see it? Okay, here it is:
Now, the actual process isn’t so easy to get, it’s actually explained here:
See how we added 3 blocks? That’s A, B, and C.
So, where do we begin?
Let’s start with Block A, which is the initialization. Block B will be called rasterization, and Block C optimization.
Block A — Initialization: SfM to Gaussian Splatting
Right off the bat, notice we don’t begin from scratch. We start in the 3D world. We begin using a point cloud that we obtained from Structure from Motion (photogrammetry — Step 1). And how is that done? Yes, using COLMAP! Essentially, you send 30+ input images to an SfM algorithm, and it returns a point cloud.
We are covering the exact technique (photogrammetry) in the 1st blog of this series, but as a foundation for this article, let’s talk briefly about classic 3D reconstruction. In 3D Reconstruction, we want to turn multiple images into a point cloud.
Traditional 3D reconstruction methods, such as photogrammetry and Structure-from-Motion (SfM), rely on establishing correspondences between images and triangulating 3D points to create a point cloud. However, these methods often struggle with complex scenes, large-scale environments, and real-world scenarios with motion blur, occlusions, and varying lighting conditions. With that, the resulting 3D models often lack detailed textures, colors, and other essential information. In short, it’s a basic reconstruction.
The next step for researchers was to use deep learning approaches such as NeRFs and 3D Gaussian Splatting, which have shown promising results in novel view synthesis, computer graphics, high-resolution image generation, and real-time rendering.
Example? Okay, I used photogrammetry to reconstruct my iMac, iPad, and keyboard:
See? It’s not perfect, but it gives us a nice structure to begin with! The point cloud is obtained from Structure From Motion (photogrammetry), and we then initialize a 3D Gaussian at the center of every point.
Wait, what is a Gaussian again? The bell curve with mean and covariance we use in Kalman Filters? Yes, exactly, this curve, but in 3D:
Now, let’s explain:
- A 1D Gaussian is defined by a mean μ and a covariance σ; basically, position and uncertainty
- A 2D Gaussian is the same, but the position this time is expressed in 2 dimensions (x and y)
- A 3D Gaussian is again the same, but the position is in x, y, z
In our case, the 3D Gaussian is defined by its mean (xyz position), opacity, covariance, and color.
An example of what it looks like:
This initialization step allows us to begin from an existing structure. We know the Gaussians are going to be well positioned because they come from a 3D Reconstruction algorithm. Then, we fit 3D Gaussians that will take the look and color of our objects. Let’s continue…
Block B — Rasterization
Okay, this term can feel scary, but the idea is to project the 3D Gaussians into 2D images. Remember when we said after the World War II story that the Gaussian Splatting was working in reverse?
Well, it is, especially when compared to NeRFs, because rather than starting from 2D images to retrieve 3D points, we begin from 3D points to retrieve 2D images.
What’s the point of going 2D? Don’t we want 3D? We do, but we use a clever technique here to compare the color and shape of the rasterized Gaussians with the 2D images. If they look the same, we’re good! If they don’t, then we need to change our 3D Gaussian.
Don’t get it? Okay, here’s a simplified schematic:
Keep in mind the important point: we’re not optimizing the 2D Gaussians, but the 3D ones. The original 2D images serve as training data for optimizing the 3D Gaussians. This means we’ll use the 2D images only as an intermediate comparison / ground truth / label / loss function.
Get it?
- We create 3D Gaussians supposed to represent our 3D scene, but they’re random and ugly.
- We splat them into 2D images and compare these splats with our images. Obviously, they don’t look the same…
- We optimize the 3D Gaussians so that their splat projections look a bit more like the 2D images.
- And we repeat!
So you should get the gist now. Let’s move on to Block C: optimization!
Block C — Optimization
The final stage is about optimizing the Gaussian. For this, there are two key ideas: Gaussian removal and densification.
Fine tuning the 3D Gaussians can enhance the quality and efficiency of the rendering outcomes.
So, what is Gaussian removal? As you guessed, if a Gaussian is too transparent, we remove it. If a Gaussian is very large, we remove it.
The next part is densification, and it’s decoupled into 2 subparts:
- Under Reconstruction: If a Gaussian is too thin to represent a real shape, we’ll clone it until it fills the space.
- Over Reconstruction: If a Gaussian is too big to represent a real shape, we’ll divide it by 1.6 until they all fill the space.
It’s like “commandments,” but for Gaussians… An example of “under” reconstruction:
And this is how it’s done!
Now, what’s the actual output?
We will show you:
Isn’t it cool?
Okay, let’s immediately see an example of my favorite topic: self-driving cars.
Example: Gaussian Splatting in Self-Driving Cars
Meet Street Gaussian, an algorithm from 2024 that leverages Gaussian Splatting to do 3D reconstruction of moving objects. This is much harder than when you’re reconstructing a building, and while some techniques use NeRFs to do this, this one uses Gaussian Splatting.
Gaussian Splatting can manage the processing of large scenes, including the development of multi-GPU parallel solutions for training.
This scene is entirely reconstructed from 3D Gaussian Splatting, see how “real” it looks? There’s also a comparison with a NeRF technique named EmerNeRF. Notice how more accurate Gaussian splatting is.
Now, if we pause on the model for one second, you’ll see that when modeling “moving” objects, one needs to reconstruct an understanding of foreground vs background scenes… Look:
This is actually pretty cool because from there, we can do this:
There are quite a few other applications within self-driving cars and outside of them. For now, let’s summarize what we learned and conclude.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
- In 3D Reconstruction, a new process named Gaussian Splatting is currently making a lot of noise with its accuracy and reliability.
- The Gaussian Splatting process works in 5 steps: begin with 3D Gaussians at the location of SfM points; predict the color and Gaussian values; rasterize the image to compare with 2D and optimize.
- While NeRFs work by projecting 2D into 3D, Gaussian Splatting is the inverse process: rather than mapping a 2D point to 3D, we start from 3D points and then rasterize them to 2D.
- Gaussian Splatting can be used in self-driving cars and is currently the best technique for 3D Reconstruction.
3D Gaussian Splatting vs NeRF: Which Is Better?
Is the most recent technique necessarily better than the others? They could be, and in fact, Gaussian Splatting (released late 2023) IS seen as better than NeRFs (released late 2020).
Why? Because it’s using an explicit representation. It begins from a 3D geometry and predicts the color at these positions. On the other hand, NeRFs project rays and take space almost randomly, which makes it more “random.”
3D Gaussian Splatting combines the best of both photogrammetry and NeRFs by using a differentiable process (neural nets) while having the explicit 3D Gaussians (that can be edited, optimized, improved, etc…). Yet, for now, it’s still not “real-time.”
So, is it the end game?
Who knows? A few years from now, someone will find a problem with the use of Gaussians and do something better.
For now, we know one thing for sure:
If you enjoyed this blog post, please read the following related 3D Computer Vision posts:
- A 5-Step Guide to Build a Pseudo-LiDAR Using Stereo Vision: https://www.thinkautonomous.ai/blog/pseudo-lidar-stereo-vision-for-self-driving-cars/
- Computer Vision: From Stereo Vision to 3D Reconstruction: https://www.thinkautonomous.ai/blog/3d-computer-vision/
And if you want to get further in Computer Vision? There is an entire Computer Vision Roadmap compiled for PyImageSearch Engineers who wish to learn Computer Vision. Inside, you’ll learn the core skills to go from beginner to intermediate to advanced in Computer Vision. We’ll tackle 3D Computer Vision, Video Computer Vision, Intermediate Deep Learning, and much more…
With this, there’s a BONUS VIDEO just on Gaussian Splatting, going further than this blog post by adding a bit more detail.
Interested?
You can sign up here: https://www.thinkautonomous.ai/py-cv
Citation Information
Cohen, J. “3D Gaussian Splatting vs NeRF: The End Game of 3D Reconstruction?” PyImageSearch, P. Chugh, S. Huot, R. Raha, and P. Thakur, eds., 2024, https://pyimg.co/oauke
@incollection{Cohen_2024_3d-gaussian-splatting, author = {Jeremy Cohen}, title = {{3D Gaussian Splatting vs NeRF: The End Game of 3D Reconstruction?}}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and Susan Huot and Ritwik Raha and Piyush Thakur}, year = {2024}, url = {https://pyimg.co/oauke}, }
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
Comment section
Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.
At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.
Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.
If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.
Click here to browse my full catalog.