In the past, when I was writing about hardware-accelerated graphics – i.e., graphics rendered by the GPU – such as in this article, I chose the phrasing, according to which the Fragment Shader eventually computes the color-values of pixels ‘to be sent to the screen’. I felt that this over-simplification could make my topics a bit easier to understand at the time.
A detail which I had deliberately left out, was that the rendering target may not be the screen in any given context. What happens is that memory-allocation, even the allocation of graphics-memory, is still carried out by the CPU, not the GPU. And ‘a shader’ is just another way to say ‘a GPU program’. In the case of a “Fragment Shader”, what this GPU program does can be visualized better as shading, whereas in the case of a “Vertex Shader”, it just consists of computations that affect coordinates, and may therefore be referred to just as easily as ‘a Vertex Program’. Separately, there exists the graphics-card extension, that allows for the language to be the ARB-language, which may also be referred to as defining a Vertex Program. ( :4 )
The CPU sets up the context within which the shader is supposed to run, and one of the elements of this context, is to set up a buffer, to which the given, Fragment Shader is to render its pixels. The CPU sets this up, as much as it sets up 2D texture images, from which the shader fetches texels.
The rendering target of a given shader-instance may be, ‘what the user finally sees on his display’, or it may not. Under OpenGL, the rendering target could just be a Framebuffer Object (an ‘FBO’), which has also been set up by the CPU as an available texture-image, from which another shader-instance samples texels. The result of that would be Render To Texture (‘RTT’).
But an impression I also got from modern graphics cards, was that their designers have reshaped them into an organization which more-closely resembles a sub-computer, with their own GPU, and with their own memory (‘VRAM’), according to which the memory is more homogenous in its organization, than would have been the case with graphics card from the late 1990s or early Y2000s. And one result of that would be, that the rendering-target which the graphics hardware finally rasterizes to the display device – continuously – is just one out of numerous rendering-targets, with few special features.
I suppose the only question left to ask would be, whether the framebuffer which is sent to the display device needs to be formatted in some special way, in which the others are not. And I believe the answer is No. It could be that because OpenGL is not the primary platform, rather being a secondary hardware-design-consideration after DirectX, this one buffer may not be an FBO, but rather needs to be of the ‘pbuffer’ variety, specifically on an O/S based on OpenGL. But OpenGL has traditionally been able to render to ‘pbuffer’ as easily, as it has more-recently been able to render to an FBO.
And Yes, it follows that if Render-To-Texture is taking place, the rendering target must nevertheless posses its own (per-pixel) Z-buffer, in addition to the per-pixel color values. But I think that in the case of an FBO, the programmer and thereby the CPU have greater latitude, in specifying these multiple planes of pixels, such that they may have formats which no longer ‘make sense’, for use as visual displays. ( :1 )
One such format would be a depth-buffer, which states an accurate 32-bit depth value per pixel, which is important with deferred rendering, where accurate depth-values are ostensibly different from the depth-values which are normally stored in the Z-buffer. The assumption with deferred rendering is, that some sort of post-processing needs to take place on the resulting output, finally to result in something visual. And this post-processing could then also be a CPU-based stage, more-similar to ray-tracing, than to how a GPU-based Fragment Shader creates raster-based output-pixels.
And another case in which the render-target format would not make sense visually, deliberately as set up by the CPU, would be in the case of ‘R2VB‘.
1: ) I believe that when using an FBO, the programmer is allowed to set up an arbitrary number of parallel layers, as long as certain system-defined layers, such as the Z-buffer, are provided.
In the case of deferred rendering, these layers must as a minimum include:
- A accurate depth-buffer.
- A per-pixel normal-vector, which has its own signed X, Y and Z components.
- A per-pixel color-value, which has its own, unsigned R, G and B components.
Beyond that, the question could be asked, how specular highlights are supposed to be implemented, with deferred rendering. And I suspect that by default, they are not. The main advantage of the deferred rendering in practice, is the ability to include an almost-unlimited number of dynamic – i.e. moving – light-sources, where each light-source is direct-rendered as the post-processing component, but where each light-source draws data from the output-buffers produced when the deferred rendering stage did its work. ( :2 )
Another practical advantage of deferred rendering, is the ability to implement SSAO.
However, if deferred rendering was to implement specular highlights, then additional per-pixel values would need to be output to the FBO, that include:
- Gloss – The power-function with which the specular highlight gets narrowed,
- The amount of specular highlight,
- The amount of metallicity in the specular highlight, which is also the degree with which a surface filters the specular highlight through its own surface-color.
It has often been a weakness in how specular highlights were managed, that a content-designer was able to specify all these parameters, once per entity – i.e. once per model – so that models designed to represent people would have an exaggeratedly-plastic look, if any part of their surface was supposed to be plastic-like. As an alternative, if these parameters were defined once per pixel, a single model could appear plastic-like in some places, matte in other places, and metallic in other places… This is a valid consideration to give deferred rendering scenarios, but which I have never seen any attention being given to.
The required number of output-layers would be possible as far as the OpenGL protocol is concerned, but would also require some intense work on the part of game-engine designers to implement, rather than on the part of content-designers really, and may also be constrained by how powerful real graphics-cards are.
2: ) AFAIK, the simplest way to achieve that, is to define each light-source as possessing a spherical geometry, the extent of which is not supposed to be visible to the end-user, but the 3D radius of which also defines the range of the light-source. This will cause fragments to be output, that are disk-shaped either on the display, or on any other post-processing stage which might still follow. And a Fragment Shader can then interrogate the depth-buffer from the deferred rendering example, based on matching 2D positions, to determine ‘real distance’, as well as everything else needed, finally to render the deferred example.
Additionally, close ‘2D distance’ to the light source’s output-position, can cause a visual representation of the light-source itself to be rendered.
3: ) In practice, deferred rendering examples often also include shadow-mapping. To understand how this is possible, one must understand that the shadow-map itself, exists as a separate 2D image, from the depth-map used to compute each shadow-map.
In this case, the depth-map is computed per shadow-casting light-source, but in a way that precedes the deferred rendering, being performed directly on the 3D geometries, and ignoring their screen-space representation. Yet, the resulting amount of shadow cast, is passed to the post-processing stage as a combined shadow-map. And one convenient way to do this, could be by using the stencil buffer to communicate the resulting shadow-map.
4: ) In most of my articles, I assume hardware-rendering, especially when the subject is a consumer-grade graphics card. The existence of software-rendering must also be acknowledged, and because that also exists, Software Shaders also exist. But as the technology currently stands, a software shader must be programmed as such from the onset, in order to run. There is no magic available in today’s technology, which will allow the code to run on the CPU, that was specifically written to run on the GPU.
Software-Rendering usually takes the form of (backwards-) ray-tracing, but may also take the form of (forwards-traced) photonic rendering, and some combination thereof can give realism closer to photo-realism, than present hardware-rendering can. This is also why Hollywood will often use software-rendering in the production of movies.
But then the reason fw software-rendering has not taken over all of computer graphics, is that when it exists by itself, it usually cannot run in real-time. Ray-tracing a scene can take 100 times as much time, as the playing-time of the scene, even when performed on powerful hardware. And there exist applications which absolutely need to render in real-time, such as games, but also desktop compositing, and various other applications.
In my opinion, the example that I gave above of ‘hybrid rendering’, in which deferred rendering is first performed by the GPU, but in which the post-processing of the scene in screen-space is then CPU-based, is the best candidate for applying software-rendering in real-time. It offers a normal-vector per-pixel, in a situation where to compute the camera-space reflection vector is especially cheap computationally. And this reflection vector can then be used to sample a starry sky, or environment maps of various kinds, as needed, including static background images.
5: ) Personally, I don’t really have access to the highest-price software. But at one point in time, I experimented with a Demo Program named Houdini Apprentice, and therefore also obtained a view, of the kind of technology Professionals take for granted, which I feel I need to explain, in order for my own account to be 100% consistent with what the high-end software does.
Because Software Shaders and GPU-based Shaders each need to be completely written as such, Advanced systems exist, which possess a GPU-shader, as well as a Ray-Tracing Algorithm, each of which is meant to be the equivalent to what one of the other kind does.
And so the assumption of this software could be, that an eventual Software-Rendered version of a scene is to be created – at great computational cost. But then this sort of software also has an Editing Viewport, which provides a preview of what its user is creating, In Real-Time.
This arrangement can lead me, as well as other people, scrambling to find ways to implement or denote an effect in real-time, on the GPU, which we know can eventually be provided at full-quality using Ray-Tracing and complicated software. Houdini is capable of both simulating and graphically rendering volume-based fluids – such as fire or water – as well as displaying a diagrammatic preview of the fact in real-time. This observation by itself, once left me without an explanation, for how ‘an Iso-Surface‘ could be implemented on the GPU. The assumption is not standard, that GPUs are capable of volume-based Physics simulations.
This workflow is partially duplicated in how Blender works, which might provoke the following question: ‘Why would somebody decide to accomplish a project using Blender – a free program – if a higher-quality solution exists in a proprietary program?’ And the main reason would be, that by the time a content-author has a product he wishes to sell, or simply to distribute, Houdini Apprentice will require he change the license he is working under, to a paid-for license.
If the content-author has decided from the onset, to stick with FOSS, then the Blender software-developers will not try to force him to switch to a paid-for version of their software, even if the user does want to distribute his work.