Please note that this posting does not describe
- Android GPUs, or
- Graphics Chips on PCs and Laptops, which use shared memory.
I am writing about the big graphics cards which power-users and gamers install into their PCs, which have a special bus-slot, and which cost as much money in themselves, as some computers cost.
The way those are organized physically, they possess one or more GPU, and DDR Graphics RAM, which loosely correspond to the CPU and RAM on the motherboard of your PC.
The GPU itself contains registers, which are essentially of two types:
- Per-core, and
When coding shaders for 3D games, the GPU-registers do not fulfill the same function, as addresses in GRAM. The addresses in Graphics RAM typically store texture images, vertex arrays in their various formats, and index buffers, as well as frame-buffers for the output. In other words, the GRAM typically stores model-geometry and 2D or 3D images. The registers on the GPU are typically used as temporary storage-locations, for the work of shaders, which are again, separately loaded onto the GPUs, after they are compiled by the device-drivers.
A major feature which the designers of graphics cards have given them, is to extend the system memory of the PC onto the graphics card, in such a way that most of its memory actually has hardware-addresses as well.
This might not include the GPU-registers that are specific to one core, but I think does include shared GPU-registers.
When shaders are compiled, variables declared in their source-code, which could be GLSL for OpenGL, or which could be HLSL for DirectX, are typically stored on GPU-registers, and a constant that matters to the part of the device-driver that manages shaders, is either register-number.
There is an extremely limited subset of machine-instructions for the GPU, which was once meant to be used to allow shaders to be programmed in a language resembling Assembler Language, and which gets referred to as ARB. The full set of operations the GPU-cores support goes beyond that specification, and ARB is not used much anymore for the coding of shaders. But within ARB, we would see that we are specifying registers for the shader to use, that begin with a letter that identifies them either as a per-core or a shared register, followed by a register number.
- Core Register zero repeats itself for every core.
- Shared Register zero repeats itself for every core-group, even on one GPU, as one GPU could easily have 8 core-groups.
Each set of register-numbers goes up to some maximum. So what shader-developers might think, is that their most-important piece of information about a variable they declared, is which register-number it was assigned to. But in reality, this is often not interesting, to anything other than the shader-compiler, which is a part of the device-driver system.
What can be much more interesting in high-end GPU-computing, is what hardware-address each object has, that is stored on the graphics hardware. This can extend to the shared registers on all the GPUs, does extend to the GRAM, but does not reasonably extend to per-core registers.
Once we start to program with OpenCL or CUDA, we are given a syntax that possesses function prototypes, which require parameters to be of a specific type. And then according to the function prototype, a specific function may expect an address for a shared location. If it does, just like with C or C++, we can often write the name of the object in question, and precede it with an ampersand – i.e. with a & .
Some people might be quick to point out, that in some cases they can omit the & , and that the context of their language implies whether the value or the address of the object is required. But then I would point out, that this is just an example of the high-level GPU-code not adhering to the more-strict aspects of how C works.
If we have written shader-code and not CUDA, then one interesting question to ask might be, how to communicate the hardware-address of a shared GPU-register to the CPU, so that the CPU can read and write it, and so that the CPU can do so to a known memory address. The answer to that question is not as simple as the question itself. It requires that we would write specific code in our shader first, which extracts the address, and which outputs that address to a memory location which the CPU can read by more-traditional means.
When we feed a shader to a device-driver, it is not common that it reports back to our OpenGL application, what the memory-address of everything was. More commonly, the device-driver returns either a success-code, or an error-code. Since it is easy to make mistakes when writing shaders, error-codes are actually helpful to help their coders discover their mistakes. Additionally, 3D applications are typically aware of the addresses in GRAM, of buffers and texture-images, which they must read from and write to respectively. Buffers can also be reused as textures on the same GPU, without the CPU having to copy – or Blit – their pixels over from one to the other.
To use OpenCL or CUDA, accesses a different part of the device-driver, than gets accessed for OpenGL or DirectX. And while it was once true that the GPU-computing platforms would use features meant for graphics, but bent for use in computing, by now, the use in GPU-computing, has become a primary feature in itself.