I have previously written in-depth, about what the rendering pipeline is, by which 3D graphics are rendered to a 2D, perspective view, as part of computer games, or as part of other applications that require 3D, in real time. But one problem with my writing in-depth might be, that people fail to see some relevance in the words, if the word-count goes beyond 500 words.

So I’m going to try to summarize it more-briefly.

Vertex-Positions in 3D can be rotated and translated, using matrices. Matrices can be composited, meaning that if a sequence of multiplications of position-vectors by known matrices accomplishes what we want, then a multiplication by a single, derived matrix can accomplish the same thing.

According to DirectX 9 or OpenGL 2.x , 3D objects consisted of vertices that formed triangles, the positions and normal-vectors of which were transformed and rotated, respectively, and where vertices additionally possessed texture-coordinates, which could all be processed by “Vertex Pipelines”. The output from Vertex Pipelines was then rasterized and interpolated, and fed to “Pixel Pipelines”, that performed per-screen-pixel computations on the interpolated values, and on how these values were applied to Texture Images which were sampled.

All this work was done by dedicated graphics hardware, which is now known as a GPU. It was not done by software.

One difference that exists today, is that the specialization of GPU cores into Vertex- and Pixel-Pipelines no longer exists. Due to something called Unified Shader Model, any one GPU-core can act either as a Vertex- or as a Pixel-Shader, and powerful GPUs possess hundreds of cores.

So the practical question does arise, how any of this applies to 2D applications, such as Desktop Compositing. And the answer would be, that it has always been possible to render a single rectangle, as though oriented in a 3D coordinate system. This rectangle, which is also referred to as a “Quad”, first gets Tessellated, which means that it receives a diagonal subdivision into two triangles, which are still references to the same 4 vertices as before.

When an application receives a drawing surface, onto which it draws its GUI – using CPU-time – the corners of this drawing surface have 2D texture coordinates that are combinations of [ 0 ] and ( +1 ) . The drawing-surfaces themselves can be input to the GPU as though Texture Images. And the 4 vertices that define the position of the drawing surface on the desktop, can simply result from a matrix, that is much simpler than any matrix would have needed to be, that performed rotation in 3D etc., before a screen-positioning could be formed from it. Either way, the Vertex Program only needs to multiply the (notional) positions of the corners of a drawing surface, by a single matrix, before a screen-position results. This matrix does not need to be computed from complicated trig functions in the 2D case.

And the GPU renders the scene to a frame-buffer, just as it rendered 3D games.

A valid question the reader might have, would be of the form:

‘Formally, a transformation of coordinates requires a rotation, which may be represented by a 3×3 matrix-multiplication, followed by a translation, which would just be the addition by another 3-element vector. This operation cannot be composited.’

Something which is done differently in practice, from how it is in theory, is that 4-element vectors are used, and 4×4 matrices. If the vector is a position-vector, then its 4th element is simply set equal to ( +1.0 ) .

When this vector is multiplied by a 4×4 matrix, the last row of which is

{ 0.0 0.0 0.0 +1.0 }

Then the 4th element of the vector being equal to ( +1.0 ) , will simply assure, that the 4th column of the same matrix is *added*, to the result that was obtained, when the first 3 elements of the vector, were multiplied by the inner 3×3 of the matrix. Hence, the inner 3×3 will define a rotation, and the 4th column, except for its 4th element, will define a translation.

Finally, because the 4th element of the vector is also multiplied by the last element in the matrix, the output-vector’s 4th element, will just equal ( +1.0 ) again.

This results in a representation of both rotation and translation, by a single 4×4 matrix, which results when multiplying the correct type of 4-element vector by it. Therefore, multiple transformations can be composited into a single matrix.

Dirk