**Table of Contents**

**Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 2**- Configuring Your Development Environment
- Having Problems Configuring Your Development Environment?
- Project Structure
- Introduction to NeRF
- Input Data Pipeline
- NeRF Multi-Layer Perceptron
- Volume Rendering
- Photometric Loss
- Breather
- Enhancing NeRF
- Credits
**Summary**

The uniqueness of NeRF is proved by the number of doors it opens up in the field of computer graphics and deep learning. These range from medical imaging, 3D scene reconstruction, animation industry, relighting a scene to depth estimation.

In our previous week’s tutorial, we familiarize ourselves with the prerequisites of NeRF. We have also explored the dataset that will be used. Now, it is best to remind ourselves of the initial problem statement.

What if there was a way to capture the entire 3D scene just from a sparse set of 2D pictures?

In this tutorial, we will focus on the algorithm that NeRF takes to capture the 3D scene from the sparse set of images.

This lesson is part 2 of a 3-part series on Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras:

*Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 1*(last week’s tutorial)*Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 2*(this week’s tutorial)*Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 3*(next week’s tutorial)

**To learn about Neural Radiance Fields or NeRF, ***just keep reading.*

#### Looking for the source code to this post?

Jump Right To The Downloads Section**Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 2**

In this tutorial, we dive straight into the concepts of NeRF. We have divided this tutorial into the following sections:

**Introduction to NeRF:**overview of NeRF**Input Data Pipeline:**the`tf.data`

input data pipeline**Utility and images:**building the`tf.data`

pipeline for images**Generate rays:**building the`tf.data`

pipeline for rays**Sample points:**sampling points from the rays

**NeRF Multi-Layer Perceptron:**the NeRF Multi-Layer Perceptron (MLP) architecture**Volume Rendering:**understanding the volume rendering process**Photometric Loss:**understanding the loss used in NeRF**Enhancing NeRF:**techniques to enhance NeRF**Positional encoding:**understanding positional encoding**Hierarchical sampling:**understanding hierarchical sampling

By the end of this tutorial, we will be able to understand the concepts proposed in NeRF.

**Configuring Your Development Environment**

To follow this guide, you need to have the TensorFlow library installed on your system.

Luckily, TensorFlow is pip-installable:

$ pip install tensorflow

**Having Problems Configuring Your Development Environment?**

All that said, are you:

- Short on time?
- Learning on your employer’s administratively locked system?
- Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
**Ready to run the code***right now***on your Windows, macOS, or Linux system?**

Then join PyImageSearch University today!

**Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are ***pre-configured*** to run on Google Colab’s ecosystem right in your web browser!** No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

**Project Structure**

We first need to review our project directory structure.

Start by accessing the ** “Downloads”** section of this tutorial to retrieve the source code and example images.

Let’s take a look at the directory structure:

$ tree --dirsfirst . ├── dataset │ ├── test │ │ ├── r_0_depth_0000.png │ │ ├── r_0_normal_0000.png │ │ ├── r_0.png │ │ ├── .. │ │ └── .. │ ├── train │ │ ├── r_0.png │ │ ├── r_10.png │ │ ├── .. │ │ └── .. │ ├── val │ │ ├── r_0.png │ │ ├── r_10.png │ │ ├── .. │ │ └── .. │ ├── transforms_test.json │ ├── transforms_train.json │ └── transforms_val.json ├── pyimagesearch │ ├── config.py │ ├── data.py │ ├── encoder.py │ ├── __init__.py │ ├── nerf.py │ ├── nerf_trainer.py │ ├── train_monitor.py │ └── utils.py ├── inference.py └── train.py

The parent directory has two python scripts and two folders.

- The
`dataset`

folder contains three subfolders:`train`

,`test`

, and`val`

for the train, test, and validation images. - The
`pyimagesearch`

folder contains all of the python scripts we will be using for training. - Finally, we have the two driver scripts:
`train.py`

and`inference.py`

. We will be looking at training and inference in next week’s tutorial.

*Note: **In the interest of time, we have divided the implementation of NeRF into two parts. This blog introduces the concepts, while next week’s blogs will cover the train and inference scripts.*

**Introduction to NeRF**

Let’s talk about the premise of the paper. You have images of a particular scene from a few specific viewpoints. Now you want to generate an image of the scene from an entirely new view. This problem falls under novel image synthesis, as shown in **Figure 2**.

The immediate solution to novel view synthesis that comes to our mind is to use a Generative Adversarial Network (GAN) on the training dataset. With GANs, we are constraining ourselves to the 2D space of images.

Mildenhall et al. (2020), on the other hand, ask a simple question.

Why not capture the entire 3D scenery from the images itself?

Let’s take a moment and try to absorb this idea.

We are now looking at a transformed problem statement. From *novel view synthesis*, we have transited to *3D scene capture from a sparse set of 2D images*.

This new problem statement will also serve as a solution to the novel view synthesis problem. How difficult is it to generate a novel view if we have the 3D scenery at our hands?

Note that, NeRF is not the first to tackle this problem. Its predecessors have used various methods, including Convolutional Neural Networks (CNN) and gradient-based mesh optimization. However, according to the paper, these methods could not scale to better resolution due to higher space and time complexity. NeRF aims at optimizing an underlying **continuous volumetric scene function**.

Do not worry if you don’t get all of these terms at first glance. The rest of the blog is dedicated to breaking each of these topics down in the finest details and explaining them one by one.

We begin with a sparse set of images and their corresponding camera metadata (orientation and position). Next, we want to achieve a 3D representation of the entire scene, as shown in **Figure 3**.

The steps for NeRF can be visualized in the following figures:

**Generate Rays:**In this step, we march rays through each pixel of the image. The rays (Ray A and Ray B) are the red lines (**Figure 4**) that intersect the image and traverse through the 3D box (scene).

**Sample points:**In this step we sample points on the rays as shown in**Figure 5**. We must note that these points are located on the rays, making them 3D points inside the box.

Each point has a unique position and a direction component linked as shown (**Figure 6**). The direction of each point is the same as the direction of the ray.

**Deep Learning:**We pass these points into an MLP (**Figure 7**) and predict the color and density corresponding to that point.

**Volume Rendering:**Let’s consider a single ray (Ray A here) and send all the sample points to the MLP to get the corresponding color and density, as shown in**Figure 8**. After we have the color and density of each point, we can apply classical volume rendering (defined in a later section) to predict the color of the image pixel (pixel P here) through which the ray passes.

**Photometric Loss:**The difference between the predicted color of the pixel (shown in**Figure 9)**and the actual color of the pixel makes the photometric loss. This eventually allows us to perform backpropagation on the MLP and minimize the loss.

**Input Data Pipeline**

At this point, we have a bird’s eye view of NeRF. However, before describing the algorithm further, we need first to define an **input data pipeline**.

We know from the previous week’s tutorial that our dataset contains *images* and the corresponding *camera orientations*. So now, we need to build a data pipeline that produces *images* and the corresponding *rays*.

In this section, we will build this data pipeline step by step using the `tf.data`

API. `tf.data`

ensures an efficient way to build and use the dataset. If you want a primer on `tf.data`

, you can refer to this tutorial.

The entire data pipeline is written in the `pyimagesearch/data.py`

file. So, let’s open the file and start digging!

**Utility and Images**

# import the necessary packages from tensorflow.io import read_file from tensorflow.image import decode_jpeg from tensorflow.image import convert_image_dtype from tensorflow.image import resize from tensorflow import reshape import tensorflow as tf import json

We begin with importing the necessary packages on **Lines 2-8**

`tensorflow`

to build the data pipeline`json`

for reading and working with json data

def read_json(jsonPath): # open the json file with open(jsonPath, "r") as fp: # read the json data data = json.load(fp) # return the data return data

On **Lines 10-17**, we define the `read_json`

function. This function takes the path to the `json`

file (`jsonPath`

) and returns the parsed `data`

.

We open the `json`

file with the `open`

function on **Line 12**. Then, with the file pointer in hand, we read the contents and parse it with the `json.load`

function on **Line 14**. Finally, **Line 17** returns the parsed json data.

def get_image_c2w(jsonData, datasetPath): # define a list to store the image paths imagePaths = [] # define a list to store the camera2world matrices c2ws = [] # iterate over each frame of the data for frame in jsonData["frames"]: # grab the image file name imagePath = frame["file_path"] imagePath = imagePath.replace(".", datasetPath) imagePaths.append(f"{imagePath}.png") # grab the camera2world matrix c2ws.append(frame["transform_matrix"]) # return the image file names and the camera2world matrices return (imagePaths, c2ws)

On **Lines 19-37**, we define the `get_image_c2w`

function. This function takes the parsed json data (`jsonData`

) and the path to the dataset (`datasetPath`

) and returns the path to the images (`imagePaths`

) and its corresponding camera-to-world (`c2ws`

) matrices.

On **Lines 21-24,** we define two empty lists: `imagePaths`

and `c2ws`

. On **Lines 27-34**, we iterate over the parsed json data and add the image paths and camera-to-world matrices to the empty lists. After iterating over the entire data, we return both lists (**Line 37**).

Working with `tf.data.Dataset`

instances, we will need a way to transform our dataset while feeding it to the model. To efficiently do this, we use the `map`

functionality. The `map`

function takes in the `tf.data.Dataset`

instance and a function that is applied to each element of the dataset.

The later part of the `pyimagesearch/data.py`

defines functions used with the `map`

function to transform the dataset.

class GetImages(): def __init__(self, imageWidth, imageHeight): # define the image width and height self.imageWidth = imageWidth self.imageHeight = imageHeight def __call__(self, imagePath): # read the image file image = read_file(imagePath) # decode the image string image = decode_jpeg(image, 3) # convert the image dtype from uint8 to float32 image = convert_image_dtype(image, dtype=tf.float32) # resize the image to the height and width in config image = resize(image, (self.imageWidth, self.imageHeight)) image = reshape(image, (self.imageWidth, self.imageHeight, 3)) # return the image return image

Before moving ahead, let’s discuss why we chose to build a **class with a **`__call__`

** method** instead of building a **function** that could be applied with the `map`

function.

The problem is that the function passed to the `map`

function cannot accept anything other than the element of the dataset. This is an imposed constraint which we need to bypass.

To overcome this problem, we have created a class that can hold some properties (here `imageWidth`

and `imageHeight`

) used during the function call.

On **Lines 39-60**, we build the `GetImages`

class with a custom `__call__`

and `__init__`

function.

`__init__`

: we will be using this function to initialize the parameters `imageWidth`

and `imageHeight`

(**Lines 40-43**)

`__call__`

: this method makes the object callable. We will be using this function to read the images from the `imagePaths`

(**Line 47**). Next, it is now decoded in a usable jpeg format (**Line 50**). We then convert the image from `uint8`

to `float32`

and reshape it (**Lines 53-57**).

**Generate Rays**

A ray in computer graphics can be parameterized as

where

- is the ray
- is the origin of the ray
- is the unit vector for the direction of the ray
- is the parameter (e.g., time)

To build the ray equation, we need the origin and the direction. In the context of NeRF, we generate rays by taking the origin of the ray as the pixel position of the image plane and the direction as the straight line joining the pixel and the camera aperture. This is illustrated in** Figure 10**.

We can easily devise the pixel positions of the 2D image with respect to the camera coordinate frame using the following equations.

It is easy to locate the origin of the pixel points but a little challenging to get the direction of the rays. From the previous section, we have

The camera-to-world matrix from the dataset is the that we need.

To define the direction vector, we do not need the entire camera-to-world matrix; instead, we use the upper matrix that defines the camera’s orientation.

With the rotation matrix, we can get the unit direction vector by the following equation.

The difficult calculations are now over. For the easy part, the rays’ origin will be the translation vector of the camera-to-world matrix.

Let’s see how we can translate this to code. We will be continuing with the `pyimagesearch/data.py`

file.

class GetRays: def __init__(self, focalLength, imageWidth, imageHeight, near, far, nC): # define the focal length, image width, and image height self.focalLength = focalLength self.imageWidth = imageWidth self.imageHeight = imageHeight # define the near and far bounding values self.near = near self.far = far # define the number of samples for coarse model self.nC = nC

On **Lines 62-75**,** **we create the class `GetRays`

with a custom `__call__`

and `__init__`

function.

`__init__`

: we initialize the `focalLength`

, `imageWidth`

, and `imageHeight`

on **Lines 66-68** and also the `near`

and `far`

bounds of the camera viewing field (**Lines 71 and 72**). We will need this to construct the rays to be marched into the scene, as shown In **Figure 8**.

def __call__(self, camera2world): # create a meshgrid of image dimensions (x, y) = tf.meshgrid( tf.range(self.imageWidth, dtype=tf.float32), tf.range(self.imageHeight, dtype=tf.float32), indexing="xy", ) # define the camera coordinates xCamera = (x - self.imageWidth * 0.5) / self.focalLength yCamera = (y - self.imageHeight * 0.5) / self.focalLength # define the camera vector xCyCzC = tf.stack([xCamera, -yCamera, -tf.ones_like(x)], axis=-1) # slice the camera2world matrix to obtain the rotation and # translation matrix rotation = camera2world[:3, :3] translation = camera2world[:3, -1]

`__call__`

: we input the `camera2world`

matrix to this method which in turn returns

`rayO`

: the origin points`rayD`

: the set of direction points`tVals`

: the sampled points

On **Lines 79-83**, we create a meshgrid of the image dimension. This is the same as the Image Plane shown in **Figure 10.**

Next, we obtain the camera coordinates (**Lines 86 and 87**) using the equation derived from our previous blog.

We define a homogeneous representation (**Lines 90 and 91**) of the camera vector `xCyCzC`

by stacking the camera coordinates.

On** Lines 95 and 96**, we extract the rotation matrix and the translation vector from the camera-to-world matrix.

# expand the camera coordinates to xCyCzC = xCyCzC[..., None, :] # get the world coordinates xWyWzW = xCyCzC * rotation # calculate the direction vector of the ray rayD = tf.reduce_sum(xWyWzW, axis=-1) rayD = rayD / tf.norm(rayD, axis=-1, keepdims=True) # calculate the origin vector of the ray rayO = tf.broadcast_to(translation, tf.shape(rayD)) # get the sample points from the ray tVals = tf.linspace(self.near, self.far, self.nC) noiseShape = list(rayO.shape[:-1]) + [self.nC] noise = (tf.random.uniform(shape=noiseShape) * (self.far - self.near) / self.nC) tVals = tVals + noise # return ray origin, direction, and the sample points return (rayO, rayD, tVals)

We then transform the camera coordinates to world coordinates using the rotation matrix (**Lines 99-102**).

Next, we calculate the direction `rayD`

(**Lines 105 and 106**) and the origin vector `rayO`

(**Line 109**).

On **Lines 112-116**, we sample points from the ray.

*Note:**We will learn about sampling points on a ray in the following section.*

Finally we return `rayO`

, `rayD`

, and `tVals`

on **Line 119**.

**Sample Points**

After the generation of rays, we need to draw sample 3D points from the rays. To do this, we suggest two ways.

**Sample points at regular intervals**: The name of the method is self-explanatory. Here, we sample points on the ray at regular intervals, as shown in**Figure 11.**

The sampling equation is as follows:

where and are the farthest and nearest points on the ray, respectively. We divide the entire ray into equidistant parts, and the divisions serve as the sample points.

**Sample points randomly:**In this method, we add randomness into the process of sampling points. The idea here is that if the sample points come from random positions of the ray, the model will be exposed to new data. This will regularize it to produce better results. The strategy is shown in**Figure 12**.

This is demonstrated by the equation below:

where refers to uniform sampling. Here, we take a random point from the space between two adjacent points.

**NeRF Multi-Layer Perceptron**

Each sample point is of 5 dimensions. The spatial location of the point is a 3D vector (), and the direction of the point is a 2D vector (). Mildenhall et al. (2020) advocate expressing the viewing direction as a 3D Cartesian unit vector .

These 5D points serve as the input to the MLP. **This field of rays with 5D points is referred to as the neural radiance field in the paper**.

The MLP network predicts each input point’s color and volume density . Color refers to the () content of the point. The volume density can be interpreted as the differential probability of a ray terminating at an infinitesimal particle at that point.

The MLP architecture is displayed in **Figure 13.**

An important point to note here is that:

We encourage the representation to be multiview consistent by restricting the network to predict the volume density as a function of only the location while allowing the RGB color to be predicted as a function of both locations and viewing direction.

With all that theory out of the way, we can start building the NeRF architecture in TensorFlow. So, let’s open the file `pyimagesearch/nerf.py`

and start digging.

# import the necessary packages from tensorflow.keras.layers import Dense from tensorflow.keras.layers import concatenate from tensorflow.keras import Input from tensorflow.keras import Model

We begin with importing our necessary packages on **Lines 2-5**.

def get_model(lxyz, lDir, batchSize, denseUnits, skipLayer): # build input layer for rays rayInput = Input(shape=(None, None, None, 2 * 3 * lxyz + 3), batch_size=batchSize) # build input layer for direction of the rays dirInput = Input(shape=(None, None, None, 2 * 3 * lDir + 3), batch_size=batchSize) # creating an input for the MLP x = rayInput for i in range(8): # build a dense layer x = Dense(units=denseUnits, activation="relu")(x) # check if we have to include residual connection if i % skipLayer == 0 and i > 0: # inject the residual connection x = concatenate([x, rayInput], axis=-1) # get the sigma value sigma = Dense(units=1, activation="relu")(x) # create the feature vector feature = Dense(units=denseUnits)(x) # concatenate the feature vector with the direction input and put # it through a dense layer feature = concatenate([feature, dirInput], axis=-1) x = Dense(units=denseUnits//2, activation="relu")(feature) # get the rgb value rgb = Dense(units=3, activation="sigmoid")(x) # create the nerf model nerfModel = Model(inputs=[rayInput, dirInput], outputs=[rgb, sigma]) # return the nerf model return nerfModel

Next, on **Lines 7-46,** we create our MLP model in the function called `get_model`

. This method takes in the following inputs:

`lxyz`

: the number of dimensions used for positional encoding of the xyz coordinates`lDir`

: the number of dimensions used for positional encoding of the direction vector`batchSize`

: the batch size of the data`denseUnits`

: the number of units in each layer of MLP`skipLayer`

: the layer at which we want the skip connection

On **Lines 9-14,** we define the `rayInput`

and the `dirInput`

layers. Next, we create the MLP with the skip connection (**Lines 17-25**).

To align with the paper (*multiview consistency*), only the `rayInput`

is passed through the model to produce `sigma`

(volume density) and a feature vector on **Lines 28-31**. Finally, the feature vector is concatenated with the `dirInput`

(**Line 35**) to produce color (**Line 39**).

On **Lines 42 and 43**,** **we build the `nerfModel`

using the Keras functional API. Finally, we return the `nerfModel`

on **Line 46.**

**Volume Rendering**

In this section, we study how to achieve volume rendering. We use the predicted color and volume density from the MLP to render the 3D scene.

The predictions from the network are plugged into the classical volume rendering equation to derive the color of one particular point. For example, the equation for the same is given below:

Sounds complicated?

Let us break this equation down into simple parts.

- The term is the color of the point of the object.
- is the ray that is fed into the network where the variable stands for the following:
- as the origin of the ray point
- is the direction of the ray
- is the set of uniform samples between the near and far points used for the integral

- is the volume density which can also be interpreted as the differential probability of the ray terminating at the point .
- is the color of the ray at the point

These are the building blocks of the equation. Apart from these, there is another term

This represents the transmittance along the ray from near point to the current point . Think of this as a measure of how much the ray can penetrate the 3D space to a certain point.

Now when we have all the terms together, we can finally make sense of the equation.

The color of an object in the 3D space is defined as the sum over of () the transmittance (), volume density (), the color of the current point () and the direction of the ray sampled for all points existing between the near () and far () of the viewing plane.

Let’s look at how to express this in code. First, we will look at the `render_image_depth`

in the `pyimagesearch/utils.py`

file.

def render_image_depth(rgb, sigma, tVals): # squeeze the last dimension of sigma sigma = sigma[..., 0] # calculate the delta between adjacent tVals delta = tVals[..., 1:] - tVals[..., :-1] deltaShape = [BATCH_SIZE, IMAGE_HEIGHT, IMAGE_WIDTH, 1] delta = tf.concat( [delta, tf.broadcast_to([1e10], shape=deltaShape)], axis=-1) # calculate alpha from sigma and delta values alpha = 1.0 - tf.exp(-sigma * delta) # calculate the exponential term for easier calculations expTerm = 1.0 - alpha epsilon = 1e-10 # calculate the transmittance and weights of the ray points transmittance = tf.math.cumprod(expTerm + epsilon, axis=-1, exclusive=True) weights = alpha * transmittance # build the image and depth map from the points of the rays image = tf.reduce_sum(weights[..., None] * rgb, axis=-2) depth = tf.reduce_sum(weights * tVals, axis=-1) # return rgb, depth map and weights return (image, depth, weights)

On **Lines 15-42**, we are building a `render_image_depth`

function which takes as inputs:

`rgb`

: the red-green-blue color matrix of the ray points`sigma`

: the volume density of the sample points`tVals`

: the sample points

It produces the volume-rendered image (`image`

), its depth map (`depth`

), and the weights (required for hierarchical sampling).

- On
**Line 17**, we reshape`sigma`

for ease of calculation. Next, we calculate the space (`delta`

) between adjacent`tVals`

(**Lines 20-23**). - Next we create
`alpha`

using`sigma`

and`delta`

**Line 26).** - We create the transmittance and weight vector (
**Lines 33-35**)**.** - On
**Lines****38 and 39**, we create the image and depth map.

Finally, we return `image`

, `depth`

, and `weights`

on **Line 42**.

**Photometric Loss**

We refer to the loss function used by NeRF as the photometric loss. This is computed by comparing the colors of the synthesized image with the ground-truth image. Mathematically this can be expressed as:

where is the real image and is the synthesized image. This function, when applied to the entire pipeline, is still fully differentiable. This allows us to train the model parameters () using backpropagation.

**Breather**

Let’s take a moment here to realize how far we have come. Take a deep breath like our friend in **Figure 14.**

We have learned about computer graphics and their fundamentals in the first part of our blog series. In this tutorial, we have taken those concepts and applied them to 3D scene representation. Here we have:

- Built an
**image**and a**ray dataset**from the given`json`

files. **Sampled points**from the rays using the random sampling strategy.- Passed these points into the
**NeRF MLP.** **Rendered a novel image**using the color and volume density predicted by the MLP.- Established a loss function (
**photometric loss**) with which we will optimize the parameters of the MLP.

These steps are sufficient to train a NeRF model and render novel views. However, this vanilla architecture will eventually produce renderings of low quality. To mitigate these issues, Mildenhall et al. (2020) propose additional enhancements.

In the next section, we will learn about these enhancements and their implementation using TensorFlow.

**Enhancing NeRF**

Mildenhall et al. (2020) propose two methods to enhance the renderings from NeRF.

- positional encoding
- hierarchical sampling

**Positional Encoding**

Positional Encoding is a popular encoding format used in architectures like transformers. Mildenhall et al. (2020) justify using this to better render high-frequency features such as *texture* and *details*.

Rahaman et al. (2019) suggest that deep networks are biased toward learning low-frequency functions. To bypass this problem NeRF proposes mapping the input vector to a higher dimensional representation. Since the 5D input space is the position of the points, we are essentially encoding the positions from which it gets the name.

Let’s say we have 10 positions indexed as . The indices are in the decimal system. If we encode the digits in the binary system, we will get something, as shown in **Figure 15**.

The binary system is an easy encoding system. The only problem we face here is that the binary system is filled with **zeros**, making it a sparse representation. We would want to make this system *continuous* and *compact*.

The encoding function used in NeRF is as follows:

To draw a parallel between the binary and the NeRF encoding, let’s look at **Figure 16**.

The sine and cosine functions make the encoding continuous, and the term makes it similar to the binary system.

A visualization of the positional encoding function is given in **Figure 17**. The blue line depicts the cosine component, while the red line is the sine component.

We can create this fairly simply in a function called `encoder_fn`

in the `pyimagesearch/encode.py`

file.

# import the necessary packages import tensorflow as tf def encoder_fn(p, L): # build the list of positional encodings gamma = [p] # iterate over the number of dimensions in time for i in range(L): # insert sine and cosine of the product of current dimension # and the position vector gamma.append(tf.sin((2.0 ** i) * p)) gamma.append(tf.cos((2.0 ** i) * p)) # concatenate the positional encodings into a positional vector gamma = tf.concat(gamma, axis=-1) # return the positional encoding vector return gamma

We start with importing `tensorflow`

(**Line 2**). On **Lines 4-19**, we define the encoder function, which takes in the following parameters:

`p`

: position of each element to be encoded`L`

: the dimension into which the encoding will take place

On **Line 6**, we define a list that will hold the positional encoding. Next, we iterate over dimensions and append the encoded values into the list (**Lines 9-13**). **Lines 16-19** are used to convert the same list into a tensor and finally return it.

**Hierarchical Sampling**

Mildenhall et al. (2020) found another problem with the original structure. The random sampling method would sample `N`

points along each camera ray. This means we don’t have any prior understanding of where it should sample. That ultimately leads to an inefficient rendering.

They propose the following solution to remedy this:

- Build two identical NeRF MLP models, the
**coarse**and**fine**network. - Sample a set of points along the camera ray using the random sampling strategy, as shown in
**Figure 12**. These points will be used to query the coarse network. - The output of the coarse network is used to produce a more informed sampling of points along each ray. These samples are biased towards the more relevant parts of the 3D scene.

To do this, we rewrite the color equation:

As a weighted sum of all sample colors .

where the term . - The weights, when normalized, produce a piecewise-constant probability density function.

The entire procedure of turning the weights into a probability density function is visualized in**Figure 18.**

- From the probability density function, we sample the second set of locations using the inverse transform sampling method, as shown in
**Figure 19**.

- Now we have both and set of sampled points. We send these points to the
**fine network**to produce the final rendered color of the ray.

This process of converting weights to a new set of sample points can be expressed through a function called `sample_pdf`

. First, let’s refer to the `utils.py`

file inside the `pyimagesearch`

folder.

def sample_pdf(tValsMid, weights, nF): # add a small value to the weights to prevent it from nan weights += 1e-5 # normalize the weights to get the pdf pdf = weights / tf.reduce_sum(weights, axis=-1, keepdims=True) # from pdf to cdf transformation cdf = tf.cumsum(pdf, axis=-1) # start the cdf with 0s cdf = tf.concat([tf.zeros_like(cdf[..., :1]), cdf], axis=-1) # get the sample points uShape = [BATCH_SIZE, IMAGE_HEIGHT, IMAGE_WIDTH, nF] u = tf.random.uniform(shape=uShape) # get the indices of the points of u when u is inserted into cdf in a # sorted manner indices = tf.searchsorted(cdf, u, side="right") # define the boundaries below = tf.maximum(0, indices-1) above = tf.minimum(cdf.shape[-1]-1, indices) indicesG = tf.stack([below, above], axis=-1) # gather the cdf according to the indices cdfG = tf.gather(cdf, indicesG, axis=-1, batch_dims=len(indicesG.shape)-2) # gather the tVals according to the indices tValsMidG = tf.gather(tValsMid, indicesG, axis=-1, batch_dims=len(indicesG.shape)-2) # create the samples by inverting the cdf denom = cdfG[..., 1] - cdfG[..., 0] denom = tf.where(denom < 1e-5, tf.ones_like(denom), denom) t = (u - cdfG[..., 0]) / denom samples = (tValsMidG[..., 0] + t * (tValsMidG[..., 1] - tValsMidG[..., 0])) # return the samples return samples

This code snippet has been inspired by the official NeRF implementation. On **Lines 44-86**, we create a function called `sample_pdf`

that takes in the following parameters:

`tValsMid`

: the midpoints between two adjacent`tVals`

`weights`

: the weights used in the volume rendering function`nF`

: number of points used by the fine model

On **Lines 46-49**, we define the probability density function from the weights and then convert the same into a cumulative distribution function (cdf). This is then converted back into sample points for the fine model using inverse transform sampling (**Lines 52-86**).

We recommend this supplementary reading material to understand hierarchical sampling better.

**Credits**

This tutorial was inspired by the work of Mildenhall et al. (2020).

### What's next? We recommend PyImageSearch University.

**Course information:**

84 total classes • 114+ hours of on-demand code walkthrough videos • Last updated: February 2024

★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

**I strongly believe that if you had the right teacher you could master computer vision and deep learning.**

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s *not* the case.

All you need to master computer vision and deep learning is for someone to explain things to you in *simple, intuitive* terms. *And that’s exactly what I do*. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to *successfully* and *confidently* apply computer vision to your work, research, and projects. Join me in computer vision mastery.

**Inside PyImageSearch University you'll find:**

- ✓
**86 courses**on essential computer vision, deep learning, and OpenCV topics - ✓
**86 Certificates**of Completion - ✓
**115+ hours**of on-demand video - ✓
**Brand new courses released**, ensuring you can keep up with state-of-the-art techniques*regularly* - ✓
**Pre-configured Jupyter Notebooks in Google Colab** - ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to
**centralized code repos for**on PyImageSearch*all*540+ tutorials - ✓
**Easy one-click downloads**for code, datasets, pre-trained models, etc. - ✓
**Access**on mobile, laptop, desktop, etc.

**Summary**

We have gone through the core concepts proposed in the paper NeRF and also implemented them using TensorFlow.

We can recall what we have learned so far in the following steps:

- Building the image and ray dataset for 5D scene representation
- Sample points from the rays using any of the sampling strategies
- Passing these points through the NeRF MLP model
- Volume rendering based on the output of the MLP model
- Calculating the photometric loss
- Using positional encoding and hierarchical sampling to improve the quality of rendering

In next week’s tutorial, we will cover how to utilize all of these concepts to train the NeRF model. In addition, we will also render a 360-degree video of a 3D scene from 2D images.

We hope you enjoyed this week’s tutorial, and as always, you can download the source code and try it out yourself.

**Citation Information**

**Gosthipaty, A. R., and Raha, R. **“Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 2,” *PyImageSearch*, 2021, https://pyimagesearch.com/2021/11/17/computer-graphics-and-deep-learning-with-nerf-using-tensorflow-and-keras-part-2/

@article{Gosthipaty_Raha_2021_pt2, author = {Aritra Roy Gosthipaty and Ritwik Raha}, title = {Computer Graphics and Deep Learning with {NeRF} using {TensorFlow} and {Keras}: Part 2}, journal = {PyImageSearch}, year = {2021}, note = {https://pyimagesearch.com/2021/11/17/computer-graphics-and-deep-learning-with-nerf-using-tensorflow-and-keras-part-2/}, }

**To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!**

#### Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a **FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning.** Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

## Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.

Instead, my goal is to

do the most goodfor the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses— they have helped tens of thousands of developers, students, and researchersjust like yourselflearn Computer Vision, Deep Learning, and OpenCV.Click here to browse my full catalog.