Observations about the Z-Buffer

Any game-engine currently on the market, uses the GPU of your computer – or your tablet – to do most of the work of rendering 3D scenes to a 2D screen, that also represents a virtual camera-position. There are two constants about this process which th game-engine defines, which are the closest distance at which fragments are allowed to be rendered, which I will name ‘clip-near’, and the maximum distance rendering is to be extended to, which I will name ‘clip-far’.

Therefore, what some users might expect, is that the Z-buffer, which determines the final outcome of the occlusion of the fragments, should contain a simple value from [ clip-near … clip-far ) . However, this is not truly how the Z-buffer works. And the reason why has to do with its origins. The Z-buffer belonging to the earliest rendering-hardware was only a 16-bit value, associated with each output pixel! And so a system needed to be developed that could use this extremely low resolution, according to which distances closer to (clip-near) would be spaced closer together, and according to which distance closer to (clip-far) could receive a smaller number of Z-values, since at that distance, the ability of the player even to distinguish differences in distances, was also diminished.

And so the way hardware-rendering began, was in this Z-buffer-value representing a fractional value between [ 0.0 … 1.0 ) . In other words, it was decided early-on, that these 16 bits followed a decimal point – even though they were ones and zeros – and that while (0) could be reached exactly, (1.0) could never be reached. And, because game-engine developers love to use 4×4 matrices, there could exist a matrix which defines conversion from the model-view matrix to the model-view-projection matrix, just so that a single matrix could minimally be sent to the graphics card for any one model to render, which would do all the necessary work, including to determine screen-positions and to determine Z-buffer-values.

The rasterizer is given a triangle to render, and rasterizes the 2D space between, to include all the pixels, and to interpolate all the parameters, according to an algorithm which does not need to be specialized, for one sort of parameter or another. The pixel-coordinates it generates are then sent to any Fragment Shader (in modern times), and three main reasons their number does not actually equal the number of screen-pixels are:

  1. Occlusion obviates the need for many FS-calls.
  2. Either Multi-Sampling or Super-Sampling tampers with the true number of fragments that need to be computed, and in the case of Multi-Sampling, in a non-constant way.
  3. Alpha Entities“, whose textures have an Alpha channel in addition to R, G, B per texel, are translucent and do not write the Z-buffer, thereby requiring that Entities behind them additionally be rendered.

And so there exists a projection-matrix which I can suggest which will do this (vertex-related) work:

 


| 1.0 0.0 0.0 0.0 |
| 0.0 1.0 0.0 0.0 |
| 0.0 0.0 1.0 0.0 |
| 0.0 0.0  a   b  |

a = clip-far / (clip-far - clip-near)
b = - (clip-far * clip-near) / (clip-far - clip-near)


 

One main assumption I am making, is that a standard, 4-component position-vector is to be multiplied by this matrix, which has the components named X, Y, Z and W, and the (W) component of which equals (1.0), just as it should. But as you can see, now, the output-vector has a (W) component, which will no longer equal (1.0).

The other assumption which I am making here, is that the rasterizer will divide (W) by (Z), once for every output fragment. This last request is not unreasonable. In the real world, when objects move further away from us, they seem to get smaller in the distance. Well in the game-world, we can expect the same thing. Therefore by default, we would already be dividing (X) and (Y) by (Z), to arrive at screen-coordinates from ( -1.0 … +1.0 ), regardless of what the real-world distances from the camera were, that also led to (Z) values.

This gives the game-engine something which photographic cameras fail to achieve at wide angles: Flat Field. The position from the center of the screen, becomes the tangent-function, of a view-angle from the Z-coordinate.

Well, to divide (X) by (Z), and then to divide (Y) by (Z), would actually be two GPU-operations, where to scalar-multiply the entire output-vector, including (X, Y, Z, W) by (1 / Z), would only be one GPU-operation.

Well in the example above, as (Z -> clip-far), the operation would compute:

 



W = a * Z + b

  = (clip-far * clip-far) / (clip-far - clip-near) -
    (clip-far * clip-near) / (clip-far - clip-near)

  = clip-far * (clip-far - clip-near) /
            (clip-far - clip-near)

  = clip-far

Therefore,
  (W / Z) = (W / clip-far) = 1.0


 

And, when (Z == clip-near), the operation would compute:

 



W = a * Z + b

  = (clip-far * clip-near) / (clip-far - clip-near) -
    (clip-far * clip-near) / (clip-far - clip-near)

  = 0.0


 

Of course I understand that a modern graphics card will have a 32-bit Z-buffer. But then all that needs to be done, for backwards-compatibility with the older system, is to receive a fractional value that has 32 bits instead of 16.

Now, there are two main derivations of this approach, which some game engines offer as features, but which can be achieved just by feeding in a slightly different set of constants to a matrix, which the GPU can work with in an unchanging way:

  • Rendering to infinite world coordinates,
  • Orthogonal camera-views.

The values that are needed for the same matrix will be:

Continue reading Observations about the Z-Buffer

Whether the Columns of Matrices have a Natural Order

This article is meant for readers who, like me, have studied Linear Algebra and who, like me, are curious about Quantum Mechanics.

Are the columns of matrices in a given, natural order, as we write them? Well, if we are using the matrix as a rotation matrix in CGI – i.e. its elements are derived from the trig functions of Euler Angles – then the column order depends, on the order in which we have labeled coordinates to be X, Y and Z. We are not free to change this order in the middle of our calculations, but if we decide that X, Y and Z are supposed to form a different set, then we need to use different matrices as well.

OTOH, We also know that a matrix can be an expression of a system of simultaneous equations, which can be solved manually through Gauss-Jordan Elimination on the matrix. If we have found that our system has infinitely many solutions, then we are inclined to say that certain variables are the “Leading Variables” while the others are the “Free Variables”. It is being taught today, that the Free Variables can also be made our parameters, so that the set of values for the Leading Variables follows from those parameters. But wait. Should it not be arbitrary for certain combinations of variables, which follows from which?

The answer is, that if we simply use Gauss-Jordan Elimination, and if two variables are connected as having possibly infinite combinations of values, then it will always be the variables stated earlier in the equations which become the Leading, and the ones stated later in the equations will become the Free Variables. We could restate the entire equations with the variables in some other order, and then surely enough, the variable that used to be a Free one will have become a new Leading one, and vice-versa. (And if we do so, the parametric equations for the other Leading variables will generally also change.)

The order of the columns, has become the order of discovery.

This could also have ramifications for Quantum Mechanics, where matrices are sometimes used. QM used matrices at first, in an effort to be empirical, and to acknowledge that we as Humans, can only observe a subset of the properties which particles may have. And then what happens in QM, is that some of the matrices used are computed to have Eigenvalues, and if those turn out to be real numbers, they are also thought to correspond to observable properties of particles, while complex Eigenvalues are stated – modestly enough – not to correspond to observable properties of the particle.

Even though this system seems straightforward, it is not foolproof. A Magnetic North Pole corresponds according to Classical Principles, to an angle from which an assumed current is always flowing arbitrarily, clockwise or counter-clockwise. It should follow then, that from a different perspective, a current which was flowing clockwise before, should always be flowing counter-clockwise. And yet according to QM, monopoles should exist.

Continue reading Whether the Columns of Matrices have a Natural Order

A Cheapo Idea To Throw Out There, On Digital Oversampling

In This Posting, I elaborated at length, about Polynomial Approximation that is not overdetermined, but rather exactly defined, by a set of unknown (y) values along a set of known time-coordinates (x). Just to summarize, If the sample-time-points are known to be arbitrary X-coordinates 0, 1, 2 and 3, then the matrix (X1) can state the powers of these coordinates, and If additionally the vector (A) stated the coefficients of a polynomial, then the product ( X1 * A ) would also produce the four y-values as vector (Y).

X1 can be computed before the algorithm is designed, and its inverse, ( X1^-1 ), would be such that ( X1^-1 * Y = A ). Hence, given a prepared matrix, a linear multiplication can derive a set of coefficients easily from a set of variable y-values.

Well this idea can be driven further. There could be another arbitrary set of x-coordinates 1.0, 1.25, 1.5, 1.75 , which are meant to be a 4-point interpolation within the first set. And then another matrix could be prepared before the algorithm is committed, called (X2), which states the powers of this sequence. And then ( X2 * A = Y' ), where (Y') is a set of interpolated samples.

What follows from this is that ( X2 * X1^-1 * Y = Y' ). But wait a moment. Before the algorithm is ever burned onto a chip, the matrix ( X2 * X1^-1 ) can be computed by Human programmers. We could call that our constant matrix (X3).

So a really cheap interpolation scheme could start with a vector of 4 samples (Y), and derive the 4 interpolated samples (Y') just by doing one matrix-multiplication ( X3 * Y = Y' ). It would just happen that

Y'[1] = Y[2]

And so we could already guess off the top of our heads, that the first row of X3 should be equal to ( 0, 1, 0, 0 ).

While this idea would certainly be considered obsolete by standards today, it would correspond roughly to the amount of computational power a single digital chip would have had in real-time, in the late 1980s… ?

I suppose that an important question to ask would be, ‘Aside from just stating that this interpolation smooths the curve, what else does it cause?’ And my answer would be, that Although it Allows for (undesirable) Aliasing of Frequencies to occur during playback, when the encoded ones are close to the Nyquist Frequency, If the encoded Frequencies are about 1/2 that or lower, Very Little Aliasing will still take place. And so, over most of the audible spectrum, this will still act as a kind of low-pass filter, although over-sampling has taken place.

Dirk

Continue reading A Cheapo Idea To Throw Out There, On Digital Oversampling