I have written about the GPU – the Graphics Processing Unit – at length. And just going by what I wrote so far, my readers might think that its registers are defined the same way, as those of any main CPU. But to the contrary, GPU registers are organized differently at the hardware level, in a way most-optimized for raster-based graphics output.
Within GPU / graphics-oriented / shader coding, there exists a type of language which is ‘closest to the machine’, and which is a kind of ‘Assembler Language for GPUs’, that being called “ARB“. Few shader-designers actually use it anymore, instead using a high-level language, such as ‘HLSL’ for the DirectX platform, or such as ‘GLSL’ for the OpenGL platform… Yet, especially since drivers have been designed that use the GPU for general-purpose (data-oriented) programming, it might be good to glance at what ARB defines.
And so one major difference that exists between main CPU registers, and GPU registers by default, is that each GPU register is organized into a 4-element vector of 32-bit, floating-point numbers. The GPU is designed at the hardware level, to be able to perform certain Math operations on the entire 4-element vector, in one step if need be. And within ARB, a notation exists by which the register name can be given a dot, and can then be followed by such pieces of text as:
- .xyz – Referring to the set of the first 3 elements (for scene or model coordinates),
- .uv – Referring to the set of the first 2 elements (for textures),
- .rst – Referring to the set of the first 3 elements – again (for 3D textures, volume-texture coordinates).
Also, notations exist in which the order of these elements gets switched around. Therefore, if the ARB code specifies this:
- r0.uv
It is specifying not only register (0), but the first 2, 32-bit, floating-point elements, within (r0), in their natural order.
This observation needs to be modified somewhat, before an accurate representation of modern GPU registers has been defined.
Firstly, I have written elsewhere on my blog, that as data passes from a Vertex Shader to a Fragment Shader, that data, which may contain texture coordinates by default, but which can really consist of virtually any combination of values, needs to be interpolated (:1), so that the interpolated value gets used by the FS, to render one pixel to the screen. This interpolation is carried out by specialized hardware in a GPU core group, and for that reason, some upward limit exists, on how many such registers can be interpolated.
(Updated 5/04/2019, 23h35 … )
Continue reading Some trivia about how GPU registers are organized.