Ray Tracing Realistic Cameras
This project was part of an independent study where I, along with a few other students, were extending a pre-existing ray tracer made by a PhD student, Kelly Jiang. What follows is the report I submitted for the project that runs through the different camera systems I implemented with a brief explainer and example results of each.
Introduction
Ray tracing has the unique advantage of easily simulating many different types of cameras. All we need is to provide the appropriate outgoing ray from the camera system and let the ray tracer take care of the rest. The ability to simulate different types of cameras opens up many artistic possibilities when making rendered images or animations. For instance, one of my favorite artistic uses of a realistic camera system can be seen in Toy Story 4. In this movie they modeled several different types of cameras including the simpler spherical lens camera (which are discussed later in this report) as well as more complex split-diopter (fig. 1) and anamorphic lenses. The YouTube channel Nerdwriter has a good explainer on the artistic impact of these lenses in this video. Ofcourse, realistic camera simulations can also be useful for research. For example, the Dahl lab at Northwestern is currently in the process of simulating images from their bubble chamber. The cameras peer into the bubble chamber through a series of relay lenses which need to be accurately simulated for correct results.
This report will cover 4 different camera types that were implemented following chapter 6 of the book Physically Based Rendering: From Theory to Implementation (PBRT), available online here. For more information on any of these camera models, please read the corresponding section of the book. Note that unless otherwise stated, all images of the 3-sphere scene were taken from the same position and looking in the same direction with a sample rate of 256 samples per pixel and the gamma set to 2. In addition, a border has been added to distinguish the edges of the images since the background is also white. Finally, full resolution images can be viewed from this google drive.
Pinhole Camera
This implementation of the pinhole camera model assumes that the film plane is in front of the pinhole. Then, the rays originate from the location of the camera and the direction is determined by sending the ray in the direction of a given pixel on the film plane. Spatial anti-aliasing is done by randomly choosing a position within the pixel to sample. The user can set the field of view and the aspect ratio (though it is usually automatically determined by the resolution), as well as the distance to the near-clipping plane. The field of view and aspect ratio determine the size of the pixel grid at which the rays are shot and hence the range of angles covered by the camera.
In reality, we would expect pinhole cameras to produce really dim images or require long exposure times since only a few light rays will make it past the pinhole to produce the image. However, for this ray tracer we are explicitly only shooting rays that are simulated as passing through the pinhole. In addition, we do not have any exposure time settings for this camera (at least not for this branch of the project) so we would not be able to simulate that result regardless.
A defining characteristic of pinhole cameras is that, since they have an infinitesimal aperture, everything is in focus. We can see that explicitly in fig. 2.
Environment Camera
The environment camera operates on a similar principle to the pinhole camera except that, instead of getting the ray direction by drawing a vector from the camera origin to a point on the film plane, we now determine the ray direction via a point on a sphere that encapsulates the camera. The environment camera effectively maps the full 360 degree field of view from a point on the scene to a rectangular image. This also means that the environment camera captures all the incident light on a point in the scene.
While it is not explicitly required, the aspect ratio for the resolution of the environment camera should be 2:1. This is because the environment camera is mapping 0 to 2Pi radians of azimuthal angle along the x axis of the camera and 0 to Pi radians of the inclination angle.
Images produced using the environment camera can be viewed by projecting the image back onto a sphere and rotating a fixed camera. Doing so allows you to effectively “look around” at what a scene looks like from a given position. While this is not explicitly built into this rendering, an online tool can be used to view the image such as this one.
Thin Lens Camera
As the name suggests, a thin lens camera makes use of the thin lens approximation which models the lens system as a single lens with spherical profiles where the thickness of the lens is small compared to the radius of curvature for the profiles. It is also the first of the camera models that utilize a finite aperture.
The basic process for generating rays using this model can be seen in fig. 4. First, we trace a ray originating from a given pixel and pass it through the center of the aperture to determine where it hit the focal plane. The distance to the focal plane and the size of the aperture can be set by the user. Once we know where a ray passing through the center of the aperture would intersect the focal plane, we sample a point on the aperture and shoot it towards the point on the focal plane. Consequently, all objects within the focal plane are in focus and objects outside of the focal plane will be increasingly out of focus the farther they are from the focal plane. See fig. 6 for examples of various focal lengths.
The blurriness of objects outside the plane of focus is determined by the size of the circle of confusion – points not on the focal plane will be imaged to a disk on the film plane. Objects will appear sharply in focus if the circle of confusion is smaller than the size of individual pixels. The circle of confusion is generally determined by the size of the aperture and the distance the film plane is from the focus of the lens. The circle of confusion for a particular object is also determined by the distance to the object from the lens. Notably, the change in the size of the circle of confusion is more dramatic for objects closer to the lens than the focal plane than for objects farther away (see fig. 5). The size of the circle confusion determines the depth of field of the image. A lens with a large aperture will have a higher depth of field than one with a smaller lens. For examples of this effect and the impact of aperture size, see fig. 7.
Realistic Camera
A key advantage of ray tracing over traditional rasterization methods is the ability to accurately model the transportation of light through a scene. This can be extended to include accurate transport of light through a series of lenses as well. Consequently, it is possible to ray trace actual camera lenses and incorporate several realistic effects such as depth of field, aberration, distortion, and vignetting. Tracing real camera lenses is conceptually simple: if we assume our lens profiles are spherical, then we just need to trace a list of spheres and refract the ray at each intersection accordingly. And, it turns out that this is almost exactly how the lens elements are simulated.
The lenses are treated as a series of spherical interfaces and stored in a table (or more precisely, a vector of LensElementInterfaces
) and the lenses are treated as extending to the left of the film plane along the negative z-axis. Each entry in the table lists the radius of the spherical element, the thickness of the element (i.e. the distance along the z-axis from the previous element), the index of refraction of the medium to the right of the interface, and the diameter of the aperture. The radius of the element can be positive or negative and is what determines if this element is concave or convex, i.e. if we should return the closer or further intersection point. Aperture stops (which block exiting rays) are represented as elements with 0 radius. The aperture diameter determines the radial extent from the z-axis that rays are allowed to intersect the element at before they are considered blocked. The aperture diameter of the first element is used to sample the possible directions that rays can be initialized in. The thickness of the first element is used to adjust the focus of the lens system.
There are a number of performance enhancements we can utilize to ray trace the lenses compared to ray tracing normal objects in our scene. For one, we know the order of objects we are ray tracing so we do not need to perform intersection tests with multiple objects or bounding boxes. Instead, we just iterate through the lenses to trace them. Also, we can utilize specialized ray-sphere intersection code for the case where all spheres lie along the z-axis. This primarily saves us from having to do matrix transformations to move the elements into place. One other advantage of treating all lenses as lying along the z-axis is that we can trace the rays through the lenses in the lens coordinate system then do a single matrix transformation from the camera coordinates to the world coordinates.
For these renders, ray samples are given 10 attempts to exit the lens system before they are treated as having failed to exit the lens system and painted black. I.e., for each starting position on the sensor, we sample the first element’s aperture radius to determine the initial ray direction up to 10 times before giving up. The probability that a ray will exit the lens system is determined by the size of the exit pupil which dictates the range of positions at which a ray originating from a point on the sensor can exit the lens system from. Consequently, lens systems that have smaller exit pupils are more likely to end up dark as the current setup of the renderer treats any rays that fail to exit the lens system as black. One method to improve the efficiency of sampling exiting rays is to compute the exit pupil radially from the center of the lens. However, that has not been implemented for this report.
The following figures showcase a variety of lens systems. One common feature among each of these images is that they have been cropped to just the right size such that the image produced by the lenses fits within the bounds of the sensor. To produce these images, I manually adjusted the distance to the first lens element to set the focus and then adjusted the size of the sensor to encapsulate the whole image.
Future Directions
The camera models presented in this report are still just scratching the surface of the range of possible cameras and effects that can be simulated. For instance, the lens elements in this report were treated as purely spherical which meant that we could not have, for instance, non-spherical aperture stops or be able to simulate split diopter lenses like the ones discussed in the introduction. Also, total internal reflection for rays are completely ignored which are important for simulating effects such as lens flares. The current implementation also assumes that all wavelengths of light have the same indices of refraction which means effects such as chromatic aberration cannot be simulated properly.
There are also improvements that can be made to improve the efficiency of tracing the lenses. I have already mentioned that the exit pupil is not being calculated. Another improvement that is not currently working is an auto-adjustment of the focus using the thick lens approximation.