The PC Graphics Cards have specifically been made Memory-Addressable.

Please note that this posting does not describe

  • Android GPUs, or
  • Graphics Chips on PCs and Laptops, which use shared memory.

I am writing about the big graphics cards which power-users and gamers install into their PCs, which have a special bus-slot, and which cost as much money in themselves, as some computers cost.

The way those are organized physically, they possess one or more GPU, and , which loosely correspond to the CPU and RAM on the motherboard of your PC.

The GPU itself contains registers, which are essentially of two types:

  • Per-core, and
  • Shared

When coding shaders for 3D games, the GPU-registers do not fulfill the same function, as addresses in . The addresses in typically store texture images, vertex arrays in their various formats, and index buffers, as well as frame-buffers for the output. In other words, the typically stores model-geometry and 2D or 3D images. The registers on the GPU are typically used as temporary storage-locations, for the work of shaders, which are again, separately loaded onto the GPUs, after they are compiled by the device-drivers.

A major feature which the designers of graphics cards have given them, is to extend the system memory of the PC onto the graphics card, in such a way that most of its memory actually has hardware-addresses as well.

This might not include the GPU-registers that are specific to one core, but I think does include shared GPU-registers.

Continue reading The PC Graphics Cards have specifically been made Memory-Addressable.

“Hardware Acceleration” is a bit of a Misnomer.

The term gets mentioned quite frequently, that certain applications offer to give the user services, with “Hardware Acceleration”. This terminology can in fact be misleading – in a way that has no consequences – because computations that are hardware-accelerated, are still being executed according to software that has either been compiled or assembled into micro-instructions. Only, those micro-instructions are not to be executed on the main CPU of the machine.

Instead, those micro-instructions are to be executed either on the GPU, or on some other coprocessor, which provide the accelerating hardware.

Often, the compiling of code meant to run on a GPU – even though the same, in theory as regular software – has its own special considerations. For example, this code often consists of only a few micro-instructions, over which great care must be taken to make sure that they run correctly on as many GPUs as possible. I.e., when we are coding a shader, is often a main paradigm. And the possibility crops up often in practice, that even though the code is technically correct, it does not run correctly on a given GPU.

I do not really know how it is with SIMD coprocessors.

But this knowledge would be useful to have, in order to understand this posting of mine.

Of course, there exists a major contradiction to what I just wrote, in OpenCL and CUDA.

Continue reading “Hardware Acceleration” is a bit of a Misnomer.

Some GPU Stats about Two Of My Computers

I own a Windows 7 tower-computer I name ‘Mithral’, which has an NVIDIA GeForce GTX460 graphics card. That was state-of-the-art around 2011. I read that its GPU was identical to that of the GTX470, except that the GPU was supposed to possess 8 core-groups. In the factory, they tested the GPUs, and if they found that one of the core-groups was defective, they used a laser to deactivate that one, and sold the graphics card for a lower price, as a GTX460. According to the first screen-shot, which was obtained using “GPU-Z”, it has 7 * 48 = 336 cores.

I also own a Linux-based laptop named ‘Klystron’, with a nonspecific AMD / ATI chipset – both CPU and GPU – which was state-of-the-art around 2013. The second and third attachment seem to show that it possesses 6 * 64 = 384 cores. The second screen-shot was obtained using “KInfoCenter”, and the last text-quotation was obtained from the OpenCL toolkit installed on the same laptop.

Continue reading Some GPU Stats about Two Of My Computers


The concept seems rather intuitive, by which a single object or entity can be translucent. But another concept which is less intuitive, is that the degree to which it is so can be stated once per pixel, through an alpha-channel.

Just as every pixel can possess one channel for each of the three additive primary colors: Red, Green and Blue, It can possess a 4th channel named Alpha, which states on a scale from [ 0.0 … 1.0 ] , how opaque it is.

This does not just apply to the texture images, whose pixels are named texels, but also to Fragment Shader output, as well as to the pixels actually associated with the drawing surface, which provide what is known as destination alpha, since the drawing surface is also the destination of the rendering, or its target.

Hence, there exist images whose pixels have a 4channel format, as opposed to others, with a mere 3-channel format.

Now, there is no clear way for a display to display alpha. In certain cases, alpha in an image being viewed is hinted by software, as a checkerboard pattern. But what we see is nevertheless color-information and not transparency. And so a logical question can be, what the function of this alpha-channel is, which is being rendered to.

There are many ways in which the content from numerous sources can be blended, but most of the high-quality ones require, that much communication takes place between rendering-stages. A strategy is desired in which output from rendering-passes is combined, without requiring much communication between the passes. And alpha-blending is a de-facto strategy for that.

By default, closer entities, according to the position of their origins in view space, are rendered first. What this does is put closer values into the Z-buffer as soon as possible, so that the Z-buffer can prevent the rendering of the more distant entities as efficiently as possible. 3D rendering starts when the CPU gives the command to ‘draw’ one entity, which has an arbitrary position in 3D. This may be contrary to what 2D graphics might teach us to predict.

Alas, alpha-entities – aka entities that possess alpha textures – do not write the Z-buffer, because if they did, they would prevent more-distant entities from being rendered. And then, there would be no point in the closer ones being translucent.

The default way in which alpha-blending works, is that the alpha-channel of the display records the extent to which entities have been left visible, by previous entities which have been rendered closer to the virtual camera.

Continue reading Alpha-Blending