Neural radiance fields (NeRFs) slowly become the next hot topic in the world of Deep Learning. Since they were originally proposed in 2020, there is an explosion of papers as it can be seen from CVPR’s 2022 submissions. Time magazine recently included a variation of NeRFs, called instant graphics neural primitives, in their best inventions of 2022 list. But what exactly are NeRFs they and what are their applications?
In this article, I will try to demystify all the different terminologies such as neural fields, NeRFs, neural graphic primitives etc. To give you a preview, they all stand for the same thing depending on who you ask. I will also present an explanation of how they work by analyzing the two most influential papers.
What is a neural field?
The term neural field was popularized by Xie et al. ^{ and describes a neural network that parametrizes a signal. This signal usually is a single 3D scene or object but that’s not mandatory. We can also use neural fields to represent any type of signals (discrete or continuous) such as audio or images. }
Their most popular use is in computer graphics applications such as image synthesis and 3D reconstruction, which is the main topic of this article.
Please note that neural fields have also been applied in other applications such as generative modeling, 2D Image Processing, robotics, medical imaging and audio parameterization.
In most neural field variations, fully connected neural networks encode objects or scenes’ properties. Importantly, one network needs to be trained to encode (capture) a single scene. Note that in contrast with standard machine learning, the goal is to overfit the neural network to a particular scene. In essence, neural fields embed the scene into the weights of the network.
Why use neural fields?
3D scenes are typically stored using voxel grids or polygon meshes. On the one hand, voxels are usually very expensive to store. On the other hand, polygon meshes can represent only hard surfaces and aren’t suitable for applications such as medical imaging.
Voxels vs Polygon meshes. Source: Wikipedia on Voxels, Wikipedia on Polygon Meshes
Neural fields have gained increasing popularity in computer graphics applications as they are very efficient and compact 3D representations of objects or scenes. Why? In contrast with voxels or meshes, they are differentiable and continuous. One other advantage is that they can also have arbitrary dimensions and resolutions. Plus they are domain agnostic and do not depend on the input for each task.
At that point, you may ask: where does the name neural fields come from?
What do fields stand for?
In physics, a field is a quantity defined for all spatial and/or temporal coordinates. It can be represented as a mapping from a coordinate $x$ to a quantity $y$, typically a scalar, a vector, or a tensor. Examples include gravitational fields and electromagnetic fields.
Next question you may ask: what are the steps to “learn” a neural field?
Steps to train a neural field
Following Xie et al. ^{, the typical progress of computing neural fields can be formulated as follows:}

Sample coordinates of a scene.

Feed them to a neural network to produce field quantities.

Sample the field quantities from the desired reconstruction domain of the problem.

Map the reconstruction back to the sensor domain (e.g 2D RGB images).

Calculate the reconstruction error and optimize the neural network.
A typical neural field algorithm. Source: Xie et al. ^{}
For clarity, let’s use some mathematical terms to denote the process. The reconstruction is a neural field, denoted as $\Phi : X \rightarrow Y$
As a result, we can solve the following optimization problem to calculate the neural field $\Phi$.
The table below (Xie et al.) illustrates different applications of neural fields alongside the reconstruction and sensor domains.
Examples of forward maps. Source: Xie et al. ^{}
Let’s analyze the most popular architecture of neural fields called NeRFs that solves the problem of view synthesis.
Neural Radiance Fields (NeRFs) for view synthesis
The most prominent neural field architecture is called Neural Radiance Fields or NeRFs. They were originally proposed in order to solve view synthesis. View synthesis is the task where you generate a 3D object or scene given a set of pictures from different angles (or views). View synthesis is almost equivalent to 3D reconstruction.
Multiview 3D reconstruction. Source: Convex Variational Methods for SingleView and SpaceTime MultiView Reconstruction
Note that in order to fully understand NeRFs, one has to familiarize themselves with many computer graphics concepts such as volumetric rendering and ray casting. In this section, I will try to explain them as efficiently as possible but also leave a few extra resources to extend your research. If you seek for a structured course to get started with computer graphics, Computer Graphics by UC San Diego is the best one afaik
NeRFs and Neural fields terminology side by side
As I already mentioned, NeRFs are a special case of neural fields. For that reason, let’s see a sidebyside comparison. Feel free to revisit this table once we explain NeRFs in order to draw the connection between them and neural fields.
Neural Fields  Neural Radiance Fields (NeRF) 

World coordinate $x_{recon} \in X$  Spatial location $(x, y, x)$ 
Field quantities $y_{recon} \in Y$  Color $c=(r,g,b)$ 
Field $\Phi : X \rightarrow Y$  MLP 
Sensor coordinates $x_{sens} \in S$  2D images 
Measurements $t_{sens} \in T$  Radiance 
Sensor $\Omega: S \rightarrow T$  Digital camera 
Forward mapping $F : (X \rightarrow Y ) \rightarrow (S \rightarrow T)$  Volume rendering 
The reason I decided to first present neural fields and then NeRFs is to understand that neural fields are a far more general framework
NeRFs explained
NeRFs as proposed by Mildenhall et al ^{. accept a single continuous 5D coordinate as input, which consists of a spatial location $(x, y, x)$}
The (probability) volume density indicates how much radiance (or luminance) is accumulated by a ray passing through $(x, y, z)$
Neural Radiance Fields. Source: Mildenhall et al. ^{}
The power of the neural field is that it can output different representations for the same point when viewed from different angles. As a result, it can capture various lighting effects such as reflections, and transparencies, making it ideal to render different views of the same scene. This makes it a much better representation compared to voxels grid or meshes.
Training NeRFs
The problem with training these architectures is that the target density and color are not known. Therefore we need a (differentiable) method to map them back to 2D images. These images are then compared with the ground truth images formulating a rendering loss against which we can optimize the network.
NeRFs training process. Source: Mildenhall et al. ^{}
As shown in the image above, volume rendering is used to map the neural field output back to 2D the image. The standard L2 loss can be computed using the input image/pixel in an autoencoder fashion. Note that volume rendering is a very common process in computer graphics. Let’s see in short how it works.
Volume rendering
When sampling coordinates from the original images, we emit rays at each pixel and sample at different timesteps, a process known as ray marching. Each sample point has a spatial location, a color, and a volume density. These are the inputs of the neural field.
A ray is a function of its origin $o$, its direction $d$, and its samples at timesteps $t$. It can be formulated as $r$
Ray Marching. Source: Creating a Volumetric Ray Marcher by Ryan Brucks
To map them back to the image, all we have to do is integrate these rays and acquire the color of each pixel.