Week 5: LookAt/View and Projection Matrices

Reading

Some LookAt references:

gluLookAt() Song Ho Ahn (안성호) good math and animations.
LearnOpenGL LookAt Notes and camera controls. Uses glm and variable names a bit confusing, but otherwise good.
LookAt/Perspective Demo

The LookAt/Camera matrix

Today’s goal is to understand and derive the View matrix transform, a matrix that transforms world coordinates into eye coordinates. I refer to this matrix at the lookAt matrix because this is what many OpenGL tool kits call this matrix.

The LookAt matrix in practice

We saw an abstract sketch of the lookAt matrix earlier. We want to describe a viewer’s location (an eye or camera) with three parameters:

eye: the location of the viewer in world coordinates.
at: the location of the viewer’s gaze in world coordinates.
up: a vector in world coordinate that is roughly in the vertical direction in the eye frame. This does not have to be terribly precise, but it should not be parallel to the direction of the gaze.

In the Week 04 Demo, I have modified an earlier 3D demo to support multiple viewpoints. By varying the viewpoint slider, the program computes a different view matrix. In the vertex shader, this matrix is a uniform that is multiplied between the projection matrix and the model matrix.

See getView(time) in the .js source for some examples of how to set the views. I the first case, we set the eye at [0,0,20] and look at the center of our scene with the up vector being the y direction in the world. This is similar to our original setup from last week.

If we change the eye to [0,0,-20] and keep everything else the same, we see the same scene, but from the opposite side. Note that this is purely a function of the view transform. We are not changing the ortho or model transform.

A more elaborate change is the top down view. We move the eye to [0,20,0], keeping the at the same. Since we are now looking down the vertical axis in the world, we need to change our up vector to define what is locally up in the eye frame. Anything could work here. I chose the +z axis, [0,0,1].

Deriving the LookAt matrix

The lookAt matrix is a flexible tool for moving about a scene or observing from different viewpoints. In any first person game where you are navigating a world, you typically do the animation by modifying the lookat matrix and repositioning the camera/viewer in the scene.

We will now look at how to derive the lookat matrix. Fundamentally, this is a change of frame problem. We want to convert from world coordinates to eye coordinates. The input parameters will allow us express the basis and origin of the eye frame as a linear combination of the basis vectors and origin of the world frame without too much hassle. This will guide us towards the correct lookat matrix.

We will define the eye frame in terms of the following basis vectors:

\(\hat{n}\): A unit vector pointing from at towards eye. This is conceptually the local \(+z\) axis in the eye frame.
\(\hat{r}\): A unit vector pointing to the right of the view from the eye to the target. This is conceptually the local \(+x\) axis in the eye frame. We choose \(\hat{r}\) to be perpendicular to \(\hat{n}\)
\(\hat{u}\): A unit vector perpendicular to both \(\hat{r}\) and \(\hat{n}\) and roughly in the same direction as \(\vec{up}\).

The origin \(P_0\) of the eye frame will be, not surprisingly, the point defined by the eye input parameter.

eyeframe

Following the outline of the change of basis exercise of Week 03, we will start defining the eye frame basis vectors in terms of the world frame basis vectors. Even though we are provided \(\vec{up}\) as a vector, we will start with \(\hat{n}\).

\[\vec{n} = \texttt{eye}-\texttt{at}, \ \ \hat{n} = \frac{\vec{n}}{\|\vec{n}\|}\]

Since the coordinates of eye and at are in world coordinates, the coefficients computed in this manner are the coefficients of \(\hat{n}\) in world coordinates. What are the coefficients of \(\hat{n}\) in eye coordinates? Recall you can use the dot product to compute the length of a vector.

Next up is the right vector, for which it seems we have no information. But since we have computed \(\hat{n}\) and are given \(\vec{up}\), we can compute a vector perpendicular to both with the cross product.

\[\vec{r} = \vec{up} \times \hat{n}, \ \ \hat{r} = \frac{\vec{r}}{\|\vec{r}\|}\]

At this point, the vectors \(\hat{n}, \hat{r}\), and \(\vec{up}\), but not necessarily an orthonormal basis as it is possible for \(\vec{up}\) to not be perpendicular to \(\hat{n}\). We can create a new vector \(\hat{u}\) which forms an orthonormal basis as follows:

\[\hat{u} = \hat{n} \times \hat{r}\]

Note that unlike the \(\hat{n}\) and \(\hat{r}\), I did not perform an explicit normalization step here. Why not?

To fully transition between the frames, I must also express the eye position \(P_0\) in terms of the world basis and origin \(Q_0\), but this is just the eye coordinates themselves.

\[ P_0 = (e_x, e_y, e_z, 1)^T \begin{pmatrix} \vec{w_x} \\ \vec{w_y} \\ \vec{w_z} \\ Q_0 \end{pmatrix}\]

Just like our week 3 change of frame, we now have the eye frame expressed as coefficients in the world frame.

\[\mathbf{v} = \begin{pmatrix} \hat{r} \\ \hat{u} \\ \hat{n} \\ P_0 \end{pmatrix} = \begin{bmatrix} r_x & r_y & r_z & 0 \\ u_x & u_y & u_z & 0 \\ n_x & n_y & n_z & 0 \\ e_x & e_y & e_z & 1 \\ \end{bmatrix} \cdot \begin{pmatrix} \vec{w_x} \\ \vec{w_y} \\ \vec{w_z} \\ Q_0 \end{pmatrix} = \mathbf{M} \cdot \mathbf{w}\]

If a point/vector is expressed with coordinates \(\mathbf{a}^T=(a_1, a_2, a_3, a_4)\) in the world frame \(\mathbf{w}\) and coordinates \(\mathbf{b}^T=(b_1, b_2, b_3, b_4)\) in eye frame \(\mathbf{v}\), we can convert back and forth using the matrix \(\mathbf{M}^T\) and its inverse as follows:

\(\mathbf{a}^T\mathbf{w} =\mathbf{b}^T\mathbf{v} = \mathbf{b}^T\mathbf{M}\mathbf{u} \implies \mathbf{a}^T= \mathbf{b}^T\mathbf{M}\)

To convert to world coordinates from view coordinates , use

\(\mathbf{a} = \mathbf{M}^T\mathbf{b}\).

To convert from world coordinate to view coordinates, use

\(\mathbf{b} = \mathbf{M^{-T}}\mathbf{a}\)

In this case, we actually want the second version, \(\mathbf{b} = \mathbf{M^{-T}}\mathbf{a}\), meaning we need the inverse of the transpose of \(\mathbf{M}\).

But let’s look at \(\mathbf{M}^T\) a little more closely.

\[\mathbf{M}^T = \begin{bmatrix} r_x & u_x & n_x & e_x \\ r_y & u_y & n_y & e_y \\ r_z & u_z & n_z & e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & e_x \\ 0 & 1 & 0 & e_y \\ 0 & 0 & 1 & e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} r_x & u_x & n_x & 0 \\ r_y & u_y & n_y & 0 \\ r_z & u_z & n_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \mathbf{TR}\]

The matrix \(\mathbf{M}^T\) can be expressed as the product of two matrices \(\mathbf{T}\) and \(\mathbf{R}\), where \(\mathbf{T}\) is a translation matrix, and \(\mathbf{R}\) is a generic rotation matrix. A generic rotation matrix has the property that all the columns/rows are orthonormal to each other. These special matrices have are known as orthogonal or unitary matrices and have the special property that their inverse is their transpose. The inverse of a translation matrix is a translation in the opposite direction. Using some general linear algebra properties, we can compute the inverse of \(\mathbf{M}^T\) without too much work.

\[\mathbf{M}^{-T} = \mathbf{(TR)}^{-1} = \mathbf{R}^{-1} \mathbf{T}^{-1} =\mathbf{R}^{T} \mathbf{T}^{-1} = \begin{bmatrix} r_x & r_y & r_z & 0 \\ u_x & u_y & u_z & 0 \\ n_x & n_y & n_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -e_x \\ 0 & 1 & 0 & -e_y \\ 0 & 0 & 1 & -e_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} r_x & r_y & r_z & -\hat{r}\cdot\vec{e} \\ u_x & u_y & u_z & -\hat{u}\cdot\vec{e}\\ n_x & n_y & n_z & -\hat{n}\cdot\vec{e}\\ 0 & 0 & 0 & 1 \\ \end{bmatrix}\]

where \(\vec{e}\) is a vector from the world origin to the eye origin.

This powerful matrix \(\mathbf{M}^T\) which can be computed from three relatively easy to interpret parameters allows us to position a viewer or camera in the scene and transform the displayed image to the point of view of the viewer. As part of lab 5, you will be asked to extend your solar system to include camera controls and this view matrix in your shader pipeline.

Given this matrix and the ability to position a viewer anywhere in the scene, how could you implement the following motions?

Move the camera closer to the scene?
Turn to look at something to the right of the screen
Tilt and look up?
Slide/Pan left or right?

You can describe how you would adjust your call to lookAt (in world coordinates), or how you would adjust the lookAt matrix directly (in eye coordinates).