Some trivia about how GPU registers are organized.

I have written about the GPU – the Graphics Processing Unit – at length. And just going by what I wrote so far, my readers might think that its registers are defined the same way, as those of any main CPU. But to the contrary, GPU registers are organized differently at the hardware level, in a way most-optimized for raster-based graphics output.

Within GPU / graphics-oriented / shader coding, there exists a type of language which is ‘closest to the machine’, and which is a kind of ‘Assembler Language for GPUs’, that being called “ARB“. Few shader-designers actually use it anymore, instead using a high-level language, such as ‘HLSL’ for the DirectX platform, or such as ‘GLSL’ for the OpenGL platform… Yet, especially since drivers have been designed that use the GPU for general-purpose (data-oriented) programming, it might be good to glance at what ARB defines.

And so one major difference that exists between main CPU registers, and GPU registers by default, is that each GPU register is organized into a 4-element vector of 32-bit, floating-point numbers. The GPU is designed at the hardware level, to be able to perform certain Math operations on the entire 4-element vector, in one step if need be. And within ARB, a notation exists by which the register name can be given a dot, and can then be followed by such pieces of text as:

  • .xyz – Referring to the set of the first 3 elements (for scene or model coordinates),
  • .uv – Referring to the set of the first 2 elements (for textures),
  • .rst – Referring to the set of the first 3 elements – again (for 3D textures, volume-texture coordinates).

Also, notations exist in which the order of these elements gets switched around. Therefore, if the ARB code specifies this:

  • r0.uv

It is specifying not only register (0), but the first 2, 32-bit, floating-point elements, within (r0), in their natural order.

This observation needs to be modified somewhat, before an accurate representation of modern GPU registers has been defined.

Firstly, I have written elsewhere on my blog, that as data passes from a Vertex Shader to a Fragment Shader, that data, which may contain texture coordinates by default, but which can really consist of virtually any combination of values, needs to be interpolated (:1), so that the interpolated value gets used by the FS, to render one pixel to the screen. This interpolation is carried out by specialized hardware in a GPU core group, and for that reason, some upward limit exists, on how many such registers can be interpolated.

(Updated 5/04/2019, 23h35 … )

Continue reading Some trivia about how GPU registers are organized.

A bit of my personal history, experimenting in 3D game design.

I was wide-eyed and curious. And much before the year 2000, I only owned Windows-based computers, purchased most of my software for money, and also purchased a license of 3D Game Studio, some version of which is still being sold today. The version that I purchased well before 2000 was using the ‘A4′ game engine, where all the 3DGS versions have a game engine specified by the latter ‘A’ and a number.

That version of 3DGS was based on DirectX 7 because Microsoft owns and uses DirectX, and DirectX 7 still had as one of its capabilities to switch back into software-mode, even though it was perhaps one of the earliest APIs that offered hardware-rendering, provided that is, that the host machine had a graphics card capable of hardware-rendering.

I created a simplistic game using that engine, which had no real title, but which I simply referred to as my ‘Defeat The Guard Game’. And in so doing I learned a lot.

The API which is referred to as OpenGL, offers what DirectX versions offer. But because Microsoft has the primary say in how the graphics hardware is to be designed, OpenGL versions are frequently just catching up to what the latest DirectX versions have to offer. There is a loose correspondence in version numbers.

Shortly after the year 2000, I upgraded to a 3D Game Studio version with their ‘A6′ game engine. This was a game engine based on DirectX 9.0c, which was also standard with Windows XP, which no longer offered any possibility of software rendering, but which gave the customers of this software product their first opportunity to program shaders. And because I was playing with the ‘A6′ game engine for a long time, in addition owning a PC that ran Windows XP for a long time, the capabilities of DirectX 9.0c became etched in my mind. However, as fate would have it, I never actually created anything significant with this game engine version – only snippets of code designed to test various capabilities.

Continue reading A bit of my personal history, experimenting in 3D game design.

Quickie: How 2D Graphics is just a Special Case of 3D Graphics

I have previously written in-depth, about what the rendering pipeline is, by which 3D graphics are rendered to a 2D, perspective view, as part of computer games, or as part of other applications that require 3D, in real time. But one problem with my writing in-depth might be, that people fail to see some relevance in the words, if the word-count goes beyond 500 words. :-)

So I’m going to try to summarize it more-briefly.

Vertex-Positions in 3D can be rotated and translated, using matrices. Matrices can be composited, meaning that if a sequence of multiplications of position-vectors by known matrices accomplishes what we want, then a multiplication by a single, derived matrix can accomplish the same thing.

According to DirectX 9 or OpenGL 2.x , 3D objects consisted of vertices that formed triangles, the positions and normal-vectors of which were transformed and rotated, respectively, and where vertices additionally possessed texture-coordinates, which could all be processed by “Vertex Pipelines”. The output from Vertex Pipelines was then rasterized and interpolated, and fed to “Pixel Pipelines”, that performed per-screen-pixel computations on the interpolated values, and on how these values were applied to Texture Images which were sampled.

All this work was done by dedicated graphics hardware, which is now known as a GPU. It was not done by software.

One difference that exists today, is that the specialization of GPU cores into Vertex- and Pixel-Pipelines no longer exists. Due to something called Unified Shader Model, any one GPU-core can act either as a Vertex- or as a Pixel-Shader, and powerful GPUs possess hundreds of cores.

So the practical question does arise, how any of this applies to 2D applications, such as Desktop Compositing. And the answer would be, that it has always been possible to render a single rectangle, as though oriented in a 3D coordinate system. This rectangle, which is also referred to as a “Quad”, first gets Tessellated, which means that it receives a diagonal subdivision into two triangles, which are still references to the same 4 vertices as before.

When an application receives a drawing surface, onto which it draws its GUI – using CPU-time – the corners of this drawing surface have 2D texture coordinates that are combinations of [ 0 ] and ( +1 ) . The drawing-surfaces themselves can be input to the GPU as though Texture Images. And the 4 vertices that define the position of the drawing surface on the desktop, can simply result from a matrix, that is much simpler than any matrix would have needed to be, that performed rotation in 3D etc., before a screen-positioning could be formed from it. Either way, the Vertex Program only needs to multiply the (notional) positions of the corners of a drawing surface, by a single matrix, before a screen-position results. This matrix does not need to be computed from complicated trig functions in the 2D case.

And the GPU renders the scene to a frame-buffer, just as it rendered 3D games.

Continue reading Quickie: How 2D Graphics is just a Special Case of 3D Graphics

More about Framebuffer Objects

In the past, when I was writing about hardware-accelerated graphics – i.e., graphics rendered by the GPU – such as in this article, I chose the phrasing, according to which the Fragment Shader eventually computes the color-values of pixels ‘to be sent to the screen’. I felt that this over-simplification could make my topics a bit easier to understand at the time.

A detail which I had deliberately left out, was that the rendering target may not be the screen in any given context. What happens is that memory-allocation, even the allocation of graphics-memory, is still carried out by the CPU, not the GPU. And ‘a shader’ is just another way to say ‘a GPU program’. In the case of a “Fragment Shader”, what this GPU program does can be visualized better as shading, whereas in the case of a “Vertex Shader”, it just consists of computations that affect coordinates, and may therefore be referred to just as easily as ‘a Vertex Program’. Separately, there exists the graphics-card extension, that allows for the language to be the ARB-language, which may also be referred to as defining a Vertex Program. ( :4 )

The CPU sets up the context within which the shader is supposed to run, and one of the elements of this context, is to set up a buffer, to which the given, Fragment Shader is to render its pixels. The CPU sets this up, as much as it sets up 2D texture images, from which the shader fetches texels.

The rendering target of a given shader-instance may be, ‘what the user finally sees on his display’, or it may not. Under OpenGL, the rendering target could just be a Framebuffer Object (an ‘FBO’), which has also been set up by the CPU as an available texture-image, from which another shader-instance samples texels. The result of that would be Render To Texture (‘RTT’).

Continue reading More about Framebuffer Objects