Musing about Deferred Shading.

One of the subjects which fascinate me is, Computer-Generated Images, CGI, specifically, that render a 3D scene to a 2D perspective. But that subject is still rather vast. One could narrow it by first suggesting an interest in the hardware-accelerated form of CGI, which is also referred to as “Raster-Based Graphics”, and which works differently from ‘Ray-Tracing’. And after that, a further specialization can be made, into a modern form of it, known a “Deferred Shading”.

What happens with Deferred Shading is, that an entire scene is Rendered To Texture, but in such a way that, in addition to surface colours, separate output images also hold normal-vectors, and a distance-value (a depth-value), for each fragment of this initial rendering. And then, the resulting ‘G-Buffer’ can be put through post-processing, which results in the final 2D image. What advantages can this bring?

  • It allows for a virtually unlimited number of dynamic lights,
  • It allows for ‘SSAO’ – “Screen Space Ambient Occlusion” – to be implemented,
  • It allows for more-efficient reflections to be implemented, in the form of ‘SSR’s – “Screen-Space Reflections”.
  • (There could be more benefits.)

One fact which people should be aware of, given traditional strategies for computing lighting, is, that by default, the fragment shader would need to perform a separate computation for each light source that strikes the surface of a model. An exception to this has been possible with some game engines in the past, where a virtually unlimited number of static lights can be incorporated into a level map, by being baked in, as additional shadow-maps. But when it comes to computing dynamic lights – lights that can move and change intensity during a 3D game – there have traditionally been limits to how many of those may illuminate a given surface simultaneously. This was defined by how complex a fragment shader could be made, procedurally.

(Updated 1/15/2020, 14h45 … )

(As of 01/07/2020 ,,, )

Well, if the entire scene is first rendered to a G-Buffer, then light sources can be implemented as objects of sorts, the volume of which encompasses the entire area that they can illuminate. Correspondingly, if they are to illuminate any given screen-pixel, then the coordinates of that screen-pixel can be transformed into the U,V coordinates in the G-Buffer, which the unlit scene has already been rendered to. For each of the fragments, belonging to the light entity, the surface-normal of the G-Buffer can be fetched…

But what the strategy requires from the GPU, in addition to complex rendering, is that the fragment shader have the ability to write depth-values to the output buffer, in full precision, such as, in 32-bit floating-point format. If depth-values were simply written out in the format that was used ages ago, in 16-bit Z-buffers, then one result would be, that two distances could be compared, to determine whether one of them is further than the other, but that nothing else could be done with the resulting format. Instead, the depth-values rendered by the fragment shader must be such, that they can in fact be multiplied by screen coordinates, which range (-1.0 .. +1.0), so that a full set of view coordinates can be expanded from the G-Buffer, simply from the U,V coordinates  of one of its pixels. I.e., they should be linear.

Newer graphics hardware also allows for 3-vectors of 16-bit floating-point numbers to be written to the G-Buffer. (:3)

That expanded set of view coordinates can be used by the fragment shader invocation, together with the per-pixel surface-normal, to apply the illumination of the implied light source.

(Content deleted 01/10/2002, 17h00 because it was based on a false assumption.)

The fact should not be overlooked that according to the way normal-vectors are usually mapped, more-positive Z-values are actually facing the camera, so that positions ‘in front of the camera, where they would be seen’, correspond to negative Z-values… (:2)

(End of Update, 01/08/2020, 10h10 . )

(Edit 01/11/2020, 14h35 : The number of possible lights is equal, to the maximum number of iterations that a single fragment shader invocation can perform within a loop, as long as each iteration only requires that a cheap computation be performed. Merely computing the distance between the centre of one light-source in an array, and the point of the scene geometry that follows from a G-Buffer coordinate set, squared, and determining whether that distance exceeds the radius of the light-source volume, squared, would be an example of a cheap computation.

At the same time, I’ve discovered that the main way that still exists, to implement a ‘camera-shader’, aka a ‘world shader’, is, first to render the scene to a texture, which in this case is mapped to the G-Buffer, but to map this texture image to a quad, which is positioned in such a way in front of another view, that the second mapping is a 1:1 mapping, from pixels first rendered, to pixels rendered again. This last detail means for example, that the tangent of the angle, of the edges of the quad, from the centre of the second camera-Z-axis, will be exactly equal to (1.0), and that therefore, the arc of the second rendering will be 90⁰ both horizontally and vertically.

The vertices of this ‘dummy quad’ can already have as attributes, texture coordinates of (0.0, 0.0), (1.0, 0.0), (0.0, 1.0), and (1.0, 1.0). That way, any explicit conversion of clip-space coordinates to texture, U,V coordinates, can be avoided, as the interpolation registers in the GPU will already interpolate them.)

Simultaneously, a world shader applied to the G-Buffer can cause some amount of ambient, omnipresent lighting to exist.


There exists a known phenomenon in the real world, by which some quantifiable fraction of the light striking surfaces comes from sources too numerous to account for, thus being called ‘ambient’ in CGI, but that, as soon as parts of a surface, which are represented as fragments in CGI, are recessed behind the rest of the surface, those parts of a surface will be lit less, appearing darker, and giving a better perception of rough, detailed surfaces having depth in fact. Those parts of the surface are called ‘occluded’, and the efforts to simulate this are called “Screen Space Ambient Occlusion”.

With scenes rendered with deferred shading, this can be simulated rather easily, as a post-processing effect applied to the G-Buffer. What can be done is that, around each current point being fetched from the G-buffer, a fixed rectangle of texels can be read, which can be referred to as its ‘potential neighbouring points’. For each of these secondary texels, due to the ability to expand their parameters into real, view-space coordinates, two important parameters can be computed, that will be real numbers:

  1. The real distance of the neighbouring point, from the current point, which may decrease the influence that the neighbouring point has on its lighting…
  2. The dot-product of the normalized direction vector, of the neighbouring point relative to the current point, with the normal vector of the current point. The more positive that dot-product becomes, the more the neighbouring point will occlude the current point.

Different numerical means exist to combine this information, either in computationally less-expensive ways, or in ways that mirror Physics better, but always, as a collective result from the entire patch of texels, affecting one current fragment.

And, because of that, an additional capability that SSAO demands of the GPU, is that each of its fragment shader invocations be able to run loops. That might sound trivial. But in fact, this requires some advanced version of shader language, as the oldest hardware was not able to do so.


The rendering of real-time or ‘in-scene’ reflections from many angles around a current point traditionally required either, that the rest of the scene be rendered to a single plane first, which would be sampled by a computed reflection vector, or, that this be done 6 times, to complete an ‘environment cube’, which would be updated in real-time, but which could in turn be sampled by a ‘camera-space reflection vector’, computed just as though being applied to a static, baked-in environment cube.

This meant that the reflections would either need to be confined, as coming from one general direction, or be very expensive computationally.

What can be done, when deferred shading is exploited for its ability to simulate reflections, is that the current fragment’s view coordinates be used as a starting point, and that the back-traced angle of a reflected ray can be computed, from the normal vector, together with the notion of a camera-direction or Z-axis, and that an approximation of reflection can be derived through something which is called “Ray Marching”. The goal is to find at what distance, along the back-traced ray, it intersects with a point defined by the G-buffer. But the fact is also known, that the solution to this problem does not exist analytically, and that it must be found by successive approximation.

One side effect of this is, that only visual details already rendered to the present view, can appear in reflections. No back-sides or hidden surfaces can appear there.

But then, one way in which Ray Marching can be implemented requires, that an initial guess be applied, of at what distance from the starting point, along the back-traced ray, an intersection with the G-Buffer’s implied view-space coordinates might take place. Whether that distance needs to be increased or decreased can be defined, by whether the depth dereferenced from the G-Buffer is greater or smaller, than that of the projected point, along the ray, transformed directly into view-space coordinates.

The degree by which the distance along this back-traced ray is to be changed can probably be computed in one out of several ways. But one way could be, by multiplying the previous distance by the G-Buffer depth, and dividing it by the transformed depth. (:1)

And this set of instructions can just be executed an arbitrary number of times, such as perhaps 10. After that, the similarity of the two depth-values can be compared, and if they are close enough, and if the distance projected is slightly greater than zero, the surface-colour from the G-Buffer can in fact be applied as a reflected colour.

This poses two problems:

  • If the initial guess, of what distance along the ray is to be tested first, is wrong by a large margin of error, a final success is not guaranteed.
  • The method I just suggested, to adjust the distance along the ray, offers no protection against the possibility that a point in view coordinates may result, which no longer falls within the display area, and which therefore also no longer falls within the U,V G-Buffer coordinates, that range [0.0 .. 1.0) .
  • If this should happen, the biggest challenge in shader design might be, to find a way to communicate that the ray resulted in a failure, without complicating the logic with which shaders are programmed. Thus, if a conditional test can be achieved, of whether U or V have gone out of range, then the response could be just to set the distance projected to zero.


As with earlier methods, for each light-source, depth-maps would need to be rendered, in order for complicated shadows to result, and for this reason, the method is not strongest if the light-sources must also be omnidirectional. However, if the required depth-maps can first be rendered, when the fragments are rendered to the G-Buffer, those fragments can also have their coordinates transformed to the U,V coordinates of the depth-map.

Yet, in most cases, omnidirectional light-sources are resolved through surfaces just facing away from them, and thereby, remaining unlit.

(Update 01/11/2020, 17h10 : )

I’ve just learned that the application of matrices, for rendering depth-maps, and then, for computing shadows, cannot take place once per fragment, according to how OpenGL runs code in the shaders. It needs to take place once per vertex, in the vertex shader. The following article explains how to do it:

What this means is that shadow-mapping needs to be carried out at a logical point, before the G-Buffer is rendered to, and that its efficiency will only be as good, as when regular, forward-rendering took place.

Presumably, there would be a separate channel in the G-Buffer, that stores the degree of shadows or illumination, with which the fragments are to be rendered to the final camera view.

This also implies that the maximum number of directional lights in any one view, will be limited to how many the fragment shader allows for – a small number presumably not exceeding ?4?

(End of Update, 1/11/2020, 17h10 . )

Objects With Transparency:

By far, the greatest weakness of this approach is, its failure to support transparency. If a scene is to add such objects, then they must be rendered ‘after’ the opaque elements, thus appearing ‘in front of them’. And in such cases, if the G-Buffer depth-values are closer, than those of the fragments, of the ‘Alpha-Entities’, then the fragments of the latter must be killed.

But in cases where this is done, hypothetically, Ray-Marching can be used again, to simulated refraction.

(Update 01/11/2020, 15h10 : )

I’ve been inspired again, by articles which I found on the Web, as to what the simplest way is, to cull objects and surfaces, that are more distant from the virtual camera-position, i.e., deeper, than fragments belonging to a different buffer, from the buffer which is currently being rendered to. And a solution that seems to work well is that, even though OpenGL will not give programmers access to the Z-buffer in general, in a way useful for content development, OpenGL offers the use of a ‘blitting function’, that runs on the CPU, but that can copy whatever Z-buffer values existed in one buffer, to the second Z-buffer, so that alpha entities – objects with transparency – that are to be rendered to the second output buffer, will be culled, according to whether they were further from the virtual camera position, than objects first rendered to the G-Buffer.

The fact must be taken into consideration, however, that because this blitting function, that does belong to OpenGL, runs on the CPU and not on the GPU, its use will eventually pose a performance bottleneck.

I suppose that the fact could also be mentioned that, while writing to the Z-buffer can be switched on and off arbitrarily for certain objects, which is most-useful for objects with transparency, Z-buffer sensitivity is assumed to remain ‘On’. Thus, what will follow is that, even though the Z-buffer values will occlude parts of an entity that has alpha values less than 1.0, that entity can be switched, not to write its own depth-values to the Z-buffer, when in fact its fragments have been rendered.


I have in fact seen one example, where a prepackaged game-engine switches Z-buffer sensitivity ‘Off’ as well, so that in a region of the scene covered by this entity, the distant sky will become visible. This game-engine had as one of its assumptions, that content designer would surround their entire game level with a sky-cube, or with some other sky-representation, which was to be taken to be at maximum renderable depth, but that a closer surface belonging to the level geometry should actually just represent an opening in that geometry, through which the sky should be visible, even though the character was in some sort of ‘enclosed, indoor’ situation…

Yet, that example also has nothing at all to do with Deferred Shading, nor with rendering alpha entities ‘in front of’ the surface implied by an existing G-Buffer.

(End of Update, 01/11/2020, 15h10 . )



Admittedly, when it comes to ‘SSR’, my first-order attempt to suggest, how the distance can be adjusted by which a point needs to be projected along a direction-vector, in order eventually to intersect with points implied by a G-Buffer, will suffer from another problem. If the surface implied by the G-Buffer is angled greatly with respect to the camera-axis, then the suggested differences will eventually become too great to allow the distances to converge. One way to solve this problem could be, to make an initial estimate of the change, and then to multiply that by the absolute, of the Z-component, of the normal-vector, stored in the G-Buffer:


ds = ((Buff.Z / Transf.Z) - 1.0) * abs(Buff.N.Z);
s *= ds + 1.0;



(Update 01/09/2020, 7h15 : )


I have given some thought in the past, to the question of why, in certain cases, positive Z-values are facing the camera-position, while in other cases, the programming of shaders can proceed, on the assumption that they are positioned ‘in front of’ the camera-position.

My best guess, as to why this does not cause problems in shader coding would be, the possibility that a matrix which gets used in the vertex shader is named the ‘Model-View Matrix’, but actually refers to the ‘Model-View-Projection Matrix’. According to the first posting which I linked to above, the ‘Model-View-Projection Matrix’ should be setting up the scene to be rendered to clip-space, but can also be used as a substitute for the Model-View Matrix, as long as both the horizontal and the vertical arc of the view are equal to 90⁰.

And in that case, the matrix in question will also have taken care of ‘flipping’ the Z-values. If the vertex shader makes no further mention of different matrices that can be used, chances are that the Model-View-Projection Matrix is actually being used.


(Edited 01/09/2020, 17h25 : )

It’s certainly possible to define a set of camera axes, in which X points ‘to the Right’, Y points ‘Up’, and Z points ‘Forwards’. However, this would be a Left-Handed Coordinate System. Most Model Editors and other Scientific Software will use a Right-Handed Coordinate System. If the camera axes use a RHCS, then the Z-axis would follow as pointing ‘towards the viewer’, and positions ‘in front of the camera’ would have negative Z-values.

While it’s easy to define a matrix which will convert a geometry from an RHCS to an LHCS, no combination of rotations and translations will do so. That matrix would need to arise otherwise than through a composited series of rotations and translations. And then, the determinant of its inner 3×3 would be a negative determinant.


(Update 01/10/2020, 7h30 : )

I suppose that what would really matter then, if the Model-View and the Model-View-Projection Matrix are one and the same, for a specific rendering engine, is whether the representation of the Z-axis also matches that, of the Normal Matrix, and the Inverse View Matrix. Additionally, the Light-Source Direction Vectors, for use outside Deferred Shading, would also need to match.



(Update 01/11/2002, 23h25 : )


I suppose that a type of question which could arise would be, ‘What pixel-format in the G-Buffer is better, for Positions, A 3-vector of 16-bit, floating-point numbers, or, A single 32-bit floating-point number, meant to be multiplied by the screen-coordinates? …’


//  tan(HORZFOV/2)
uniform float multx;
//  Horizontal Pixel Size / Vertical Pixel Size
uniform float pixel_aspect;

//  Fetch the 32-bit floating-point number 'fragDepth'
//  from the G-Buffer.

fragPosition.xy = ((gl_FragCoord.xy * 2.0) / viewPortSize.xy) - vec2(1.0, 1.0);
fragPosition.xy = fragPosition.xy * fragDepth * multx;
fragPosition.y = fragPosition.y * viewPortSize.y / viewPortSize.x / pixel_aspect;
fragPosition.z = - fragDepth ;


(Code tweaked 1/15/2020, 14h45.)

And I can think of two reasons, for which the single, 32-bit Depth pixel might be a better choice:

  1. I would consider, which of the two formats was implemented in the earliest graphics cards,
  2. Certain prefabricated rendering engines, which did not arise because the content designer coded OpenGL himself, only provide one (relevant) matrix to the coder of vertex shaders, that matrix being named ‘ModelView…’ , but actually referring to the Model-View-Projection Matrix.

In order to set ‘‘ correctly, the vertex shader would need to transform the model coordinates into view coordinates first, and it would be the Model-View Matrix responsible for doing so. If none has been provided, the following would still work:


layout (location = 0) in vec3 aPos;
layout (location = 1) in vec3 aNormal;
layout (location = 2) in vec2 aTexCoords;

out VS_OUT {
    float Depth;
    vec3 Normal;
    vec2 TexCoords;
} vs_out;

uniform mat4 ModelViewProjMatrix;
uniform mat3 NormalMatrix;

void main() {
    vec4 pos = ModelViewProjMatrix * vec4(aPos, 1.0);
    Depth = pos.w;
    Normal = NormalMatrix * aNormal;
    TexCoords = aTexCoords;
    gl_Position = pos;


On the same subject, a 3-vector of 8-bit integers has long been an available format of normal-maps, where each of the integers represents a floating-point value from [-1.0 .. +1.0], but gets stored as RGB channel-values, in (fractional) offset form:


fragNormal = (texel.rgb - vec3(128, 128, 128)) / 127.0;


Obviously, those old normal-maps were not as precise as the presently available 3-vectors of 16-bit floating-point values. But if the newer format is to be avoided once, it’s to be avoided in every case…



Print Friendly, PDF & Email

One thought on “Musing about Deferred Shading.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>