Week 5: Perspective, Lighting



The Perspective matrix projection is a 4x4 matrix transform that converts from eye space to clip space, similar to the Orthographic projection. The Orthographic projection preserves distances, and parallel lines. This can be helpful for CAD applications, or engineering sketches where it is important to maintain a consistent scale. For added realism however, we the perspective projection projects all the points in the scene through a center of projection located at the eye origin. We consider the final 2D image to be the projection of the points onto the near clipping plane of the perspective transform. We will see how this transform, along with a little help from the OpenGL pipeline and homogeneous coordinates can make objects that are farther away appear smaller in the final image.


The general parameters for a perspective matrix can be specified with the twgl.m4.frustum(left, right, bottom, top, near, far) function. All the parameters are with respect to the eye frame. The near and far parameters are interpreted as distances from the eye. Since the convention in eye space is that the viewer is looking down the \(-\hat{z}\) axis, the near and far values are negated internally by the frustum function. You should use 0 < near < far.

A slightly easier to use form for creating perspective matrices is the twgl.m4.perspective(fovy, aspect, near, far) function, where fovy is the field of view in radians of the desired projection, aspect is the aspect ratio (width/height) of the viewing canvas, and near and far are the same as above. Internally, this can be converted to a frustum call using.

top = tan(fovy/2)*near;
bottom = -top

right = top*aspect
left = -right

We will now look at how to build the perspective matrix for the frustum function, and see how it differs from our previous transforms.

Assume we want to convert a point in eye coordinates \((x_e,y_e,z_e)\) into perspective projected coordinates \((x_p,y_p,z_p)\) on the near clipping plane. The coordinates \((x_p,y_p,z_p)\) are not quite the clip coordinates output by the vertex shader, but are closely related as we will see later.

X perspective

The image above shows the eye coordinate projected through the origin onto the near clipping plane. Only the \(\hat{x}\) and \(\hat{z}\) axes are shown. Using similar triangles, we can determine that:

\[\frac{x_e}{z_e} = \frac{x_p}{z_p} = \frac{x_p}{-n}, \implies x_p = \frac{-n x_e}{z_e}\]

Y perspective

Repeating the same exercise in the \(\hat{y}-\hat{z}\) plane, we find:

\[\frac{y_e}{z_e} = \frac{y_p}{z_p} = \frac{y_p}{-n}, \implies y_p = \frac{-n y_e}{z_e}\]

So the coordinates of the projected point are \((x_p,y_p,z_p)\) = \((\frac{-nx_e}{z_e},\frac{-n y_e}{z_e},-n)\) and it seems like we are done. However, there are a few issues with this solution.

  1. With \(z_e\) in the denominator, this is a non-linear transform. It seems impossible to express this as a linear matrix operation.

  2. We actually want to map our projected coordinates to clip coordinates, so that projected points that map to \(x_p=l\) and \(x_p=r\) map to \(x_c=-1\) and \(x_c=1\) (and similarly for y).

  3. We would like to preserve some depth information in \(z_p\) so we can still use depth testing for object occlusion in the fragment shader.

Handling a non-linear transform

For the first issue, we can exploit our fourth component of our homogeneous coordinates to allow perspective transforms. Consider the matrix \(\mathbf{P}\) below

\[\mathbf{P} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}\]

If we apply \(\mathbf{P}\) to the homogeneous point \((x_e,y_e,z_e,1)\), we get:

\[\mathbf{P} \begin{pmatrix} x_e \\ y_e \\ z_e \\ 1 \\ \end{pmatrix} = \begin{pmatrix} x_e \\ y_e \\ z_e \\ -z_e \\ \end{pmatrix}\]

Note now we have a forth component \(w=-z_e\) of the transformed point which is neither 0 (a vector) nor 1 (a point). The openGL pipeline recognizes this situation a perspective transformation and will automatically apply perspective division, dividing all components of the point by \(w\) provided this value is not 0. Doing so yield the final transformed point of

\[= \begin{pmatrix} -x_e/z_e \\ -y_e/z_e \\ 1 \\ 1 \\ \end{pmatrix}\]

Note that after division, we have a geometric point, but our points have been transformed such that \(x_e\) and \(y_e\) have been scaled by \(-1/z_e\), which is the non-linear part of what we first derived. The other components of the transform can be computed using linear means.

Adjusting the scales

To convert to clip coordinates, we will map our projected points in the \(\hat{x}\) and \(\hat{y}\) directions to the \(-1,1\) range.

For example, we know that the clip coordinates in the \(\hat{x}\) direction change by two units in the same span that the projected units change by \(r-l\) units (right-left). So we can write the clip coordinate

\[x_c = \frac{2}{r-l} x_p + \beta\]

for some coefficient \(\beta\) which we determine by using one of the know conversions, e.g., \(x_p=r \implies x_c=1\). A little bit of math later and we can determine:

\[x_c = \frac{2 x_p}{r-l} - \frac{r+l}{r-l}\]

Repeating the same exercise in the \(\hat{y}\) direction gives us the conversion there.

\[y_c = \frac{2 y_p}{t-b} - \frac{t+b}{t-b}\]

These expressions are still expressed in terms of the projected coordinates we computed earlier, not the original eye coordinates. Using \(x_p = \frac{-n x_e}{z_e}\) and \(y_p = \frac{-n y_e}{z_e}\) and getting everything over a common \(-z_e\) denominator results in:

\[x_c = \frac{1}{-z_e}\cdot \left( \frac{2n}{r-l} x_e + \frac{r+l}{r-l} z_e \right)\]
\[y_c = \frac{1}{-z_e}\cdot \left( \frac{2n}{t-b} y_e + \frac{t+b}{t-b} z_e \right)\]

Note that the inner terms are linear with respect to the eye coordinates. Combining with the projection portion of the matrix we derived early, we now have:

\[\mathbf{M} = \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & A & B \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}\]

Note that to get the final clip coordinates derived above, OpenGL must apply the perspective division step after applying the matrix to the geometry.

Retaining depth information

For the final image display, the computation of the x and y positions are the most important, but for object occlusion, we cannot simply map all z positions to the near clipping plane. We must retain some depth information. The third row of our perspective matrix has two generic parameters \(A\) and \(B\) that we will now determine to help us retain depth information. If we apply the matrix \(\mathbf{M}\) to a point in eye space we get, after perspective division:

\[z_c = \frac{A z_e + B}{-z_e}\]

We haven’t talked too much about clip space or a closely related normalized device coordinates, but one slightly annoying feature is that the clipping volume is actually left handed, meaning the near clipping plane in the front maps to \(z=-1\), while the far clipping plane maps to \(z=+1\). You almost never need to worry about this, because there is not much that you actually do once the points have been converted to clip space. The only place where it becomes relevant is in constructing the projection matrices. If we map \(z_e=-n\) to \(z_c=-1\) and \(z_e=-f\) to \(z_c=1\) we can find:

\[A = -\frac{f+n}{f-n} \\ B = -\frac{2fn}{f-n}\]
\[\mathbf{M} = \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & -\frac{f+n}{f-n} & -\frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}\]

For more details, see the Song Ho Ahn (안성호) Perspective Matrix page. This page also contains info on the much simpler Orthographic projection derivation.


Today we will begin our discussion on lighting, an important topic for realism in 3D computer graphics. We will first cover a local illumination model suitable for the OpenGL pipeline we have been discussing so far. In the local illumination model, we will illuminate each surface/triangle as if it were the only surface/triangle in the scene. This will allow for fast computation in the either the vertex or fragment shader. However, this model has some limits in that it cannot easily model shadows or reflections caused by the interaction of multiple objects in the scene. For these effects, we will explore a global illumination model as part of your midterm project.

Rendering equation

To create realistic scenes, we must try to capture the interaction of light with surface materials. Consider a point \(p\) and a direction \(\vec{w_0}\) from \(p\) towards a viewer. The color or light intensity of observed at \(p\) in the \(\vec{w_0}\) is given by the rendering equation:

Path Tracing

\[L_o(p,\vec{w_0}) = L_e(p,\vec{w_0}) + \int_\Omega f_r(p,\vec{w_i},\vec{w_0} ) L_i(p,\vec{w_i}) \vec{w_i} \cdot \hat{n}\ d \omega_i\]
  • \(L_o(p,\vec{w_0})\): outgoing light from \(p\) in direction of \(\vec{w_0}\)

  • \(L_e(p,\vec{w_0})\): emitted light from \(p\) in direction of \(\vec{w_0}\)

  • \(\vec{w_i}\): A vector from \(p\) in direction of incoming light

  • \(L_i(p,\vec{w_i})\): Intensity of incoming light at \(p\) from direction \(-\vec{w_i}\)

  • \(\hat{n}\): A vector perpendicular to surface at \(p\), the normal.

  • \(\vec{w_i} \cdot \hat{n}\): Lambertian weakening of incoming light intensity.

  • \(f_r(p,\vec{w_i},\vec{w_0} )\): The bidirectional reflectance distribution function (BRDF) of the surface. Describes the material and how much of the incoming light from \(\vec{w_i}\) is reflected towards \(\vec{w_0}\).

  • \(\Omega\): A hemisphere centered around \(p\) in direction of \(\hat{n}\)

  • \(\int_\Omega \cdots d\omega_i\): An integral overall all incoming light sources hitting the surface at \(p\).

The rendering equation describes a conservation of light energy. If you knew all the light sources of a scene and how each of the light photons interacted with the materials, you could solve the rendering equation exactly and create an replica of a real scene. However, this is impractical in practice as we cannot trace the path of every photon computationally. One 800 lumen light bulb produces over \(10^{18}\) photons every second. Various approaches to estimating the rendering equation include radiosity, path tracing, and photon mapping. We will explore the concept of path tracing more during the midterm project, but this approach deviates a bit away from the pipeline approach we have been using so far. The basic idea is to trace paths from the viewer through the scene and see how the paths interact with surfaces and light sources and estimating \(L_o(p,\vec{w_0})\) by accumulating contributions along the path.

Phong Lighting

We will explore the simpler Blinn-Phong local illumination model first in the OpenGL pipeline before looking at more advanced global illumination models.

In this model we consider a simplified scene around a point \(p\), consisting of four unit vectors radiating out from \(p\).

Phong reflection model

  • \(\hat{v}\) a vector pointing from \(p\) toward the viewer.

  • \(\hat{n}\) a vector normal to the surface at \(p\).

  • \(\hat{l}\) a vector pointing from \(p\) toward a light source.

  • \(\hat{r}\) a vector pointing from \(p\) in the direction of \(\hat{l}\) reflected about \(\hat{n}\).

Note that \(\hat{v}\) is \(\vec{w_0}\) in the rendering equation and \(\hat{l}\) is \(\vec{w_i}\) for some \(\vec{w_i}\).

The Blinn-Phong (James F. Blinn 1977, Bui Tuong Phong 1975) model approximates the scene interaction with light by describing three components to the illumination of a point:

  1. Ambient illumination: Ambient light has no discernible direction and is scattered uniformly everywhere. It is more of a background lighting.

  2. Diffuse illumination: Diffuse light has a source direction, but when it hits a rough surface, it scatters uniformly in all directions. The amount of contribution depends on the angle between the surface normal and the light direction. A surface facing away from a light source has no diffuse contribution.

  3. Specular illumination: Glossy objects have small but intense local reflections of light particularly when the view direction is close to the reflected direction of the light source. Think of the reflection of the sun off a glossy vehicle, window, or mirror.

The Blinn-Phong model evaluates each of these contributions separately and computes the final color as the sum of these contributions.

For each light in the scene, we will define diffuse and specular intensities, \(i_d\) and \(i_s\). These can be simply floating point values, or a three component vector describing separate intensities for the red, green, and blue channels. Since ambient light is not from a particular source, there is only one ambient light intensity \(i_a\) for the entire scene.


For each surface, we define the following material properties. Under the same lighting conditions, different materials can appear shiny, flat, rough, etc.

  • \(k_a\): The ambient reflection constant, or how much ambient light this surface reflects.

  • \(k_d\): The diffuse reflection constant

  • \(k_s\): The specular reflection constant. This relates to how much of the specular light is reflected, but a separate shininess parameter, \(\alpha\) controls how glossy the surface is.

  • \(\alpha\): The shininess constant. For high values, e.g, 100, the surface looks more mirror like.

In practice, the reflection constants are typically three-tuple colors, e.g., \(k_a=(k_{ar}, k_{ab}, k_{ag})\), but for simplicity of the discussion, you can think of them as just a single values between 0 and 1. If you can run through the steps for a single component, the other two components are computed identically.


The ambient intensity observed by the viewer at a point \(p\) is just:

\[I_a = k_a i_a\]

Note this is independent of the viewer location or light location.

The diffuse component has a dependency between the light location and the surface.

\[I_d = k_d i_d \mathrm{max}(\hat{l} \cdot \hat{n}, 0)\]

This gives our first gradient of color. Surfaces pointed away from the light source have no diffuse contribution and the intensity gradually increase as the surface becomes more directly underneath the light source. Note this model does not account for occlusion by other objects (shadows), including self occlusion, nor does it account for light attenuation. This model works well for distant point light sources, e.g., the sun.

The specular contribution is defined by

\[I_s = k_s i_s \mathrm{max}(\hat{r} \cdot \hat{v}, 0)^\alpha\]

Using the projection of vector math from week 04, we compute \(\hat{r}\) as:

\[2(\hat{l} \cdot \hat{n}) \hat{n} - \hat{l}\]

The final color intensity at \(p\) for a single light source is the total of the ambient, diffuse and specular contributions.

\[I_p = I_a + I_d + I_s\]

If there are multiple light sources, we repeat the process for the diffuse and specular components only for each light source and accumulate the results.

\[I_p = I_a + \sum_i^m I_{d,i}+ I_{s,i}\]