The concept seems rather intuitive, by which a single object or entity can be translucent. But another concept which is less intuitive, is that the degree to which it is so can be stated once per pixel, through an alpha-channel.
Just as every pixel can possess one channel for each of the three additive primary colors: Red, Green and Blue, It can possess a 4th channel named Alpha, which states on a scale from [ 0.0 … 1.0 ] , how opaque it is.
This does not just apply to the texture images, whose pixels are named texels, but also to Fragment Shader output, as well as to the pixels actually associated with the drawing surface, which provide what is known as destination alpha, since the drawing surface is also the destination of the rendering, or its target.
Hence, there exist images whose pixels have a 4channel format, as opposed to others, with a mere 3-channel format.
Now, there is no clear way for a display to display alpha. In certain cases, alpha in an image being viewed is hinted by software, as a checkerboard pattern. But what we see is nevertheless color-information and not transparency. And so a logical question can be, what the function of this alpha-channel is, which is being rendered to.
There are many ways in which the content from numerous sources can be blended, but most of the high-quality ones require, that much communication takes place between rendering-stages. A strategy is desired in which output from rendering-passes is combined, without requiring much communication between the passes. And alpha-blending is a de-facto strategy for that.
By default, closer entities, according to the position of their origins in view space, are rendered first. What this does is put closer values into the Z-buffer as soon as possible, so that the Z-buffer can prevent the rendering of the more distant entities as efficiently as possible. 3D rendering starts when the CPU gives the command to ‘draw’ one entity, which has an arbitrary position in 3D. This may be contrary to what 2D graphics might teach us to predict.
Alas, alpha-entities – aka entities that possess alpha textures – do not write the Z-buffer, because if they did, they would prevent more-distant entities from being rendered. And then, there would be no point in the closer ones being translucent.
The default way in which alpha-blending works, is that the alpha-channel of the display records the extent to which entities have been left visible, by previous entities which have been rendered closer to the virtual camera.
Hence, if an alpha-texture is to be rendered, the destination alpha (of the screen-pixel) is multiplied by the alpha-value of the current texel or fragment, to determine how strongly the current fragment is still visible. The R, G, B color-vector is then modulated by this value, and the resulting color added to the screen-pixel. After that, the current-fragment Alpha is subtracted from that of the screen-pixel just rendered to.
Of course, once screen-alpha reaches zero, no more-distant fragments will render to the same pixel.
One aspect of alpha-blending which can make it more interesting, is the fact that non-default blending modes can be specified, again without requiring complex algorithms. Modes which use source and destination -alpha in different ways are predefined, as well as modes which make the result subtractive instead of additive, modes that leave the destination-alpha set to some arbitrary value – for later entities to process according to their own logic, etc.. So this scheme can do more than just accumulate constantly-diminishing visibility, due to opacity.
I think that one way in which the output from more than one GPU-core group can be combined, is via alpha-blending, ?
And one goal towards which Alpha-Blending is often applied, even under Linux, is “Desktop Compositing”. This is a case in which output is generated by unrelated programs, in which often there is no Vertex-Algorithm – i.e. it is a 2D application for the Fragment Shader logic, which is more often associated with 3D graphics, which renders to rectangles on the screen that have no 3D orientation – and in which the output gets sent to the same screen-pixels, effectively allowing for output from several programs to overlap.
And the result then, is widgets that seem to have gloss or shine, or that otherwise display overlaid effects, at low cost to the CPU. Another common result is a type of glow which surrounds my currently-active window, to tell me which window that is, which is translucent in front of the desktop wallpaper and other desktop elements, but the mode of which has been set to ‘Additive’, so that instead of matting the background, it seems to light up as if in front of it.
The KDE / Plasma desktop on my older computer ‘Phoenix’, which is also my server, uses the Wayland Compositor, but contrarily to what the WiKi states, has not replaced the X-server with it, which is still running.
But, one way Linux used to fake the existence of Desktop Compositing, was by giving individual applications access to the image of the X-server wallpaper which their windows occupied, so that X windows could use software to blend this BG image into their own content, without any Desktop Compositing being active in fact.
One way in which Linux users were once able to detect this, was by moving the application window, and watching the background which seemed translucently behind it, move with the window, until we let go of the window with our mouse, at which point the BG repositioned itself into its correct place, consistent with the wallpaper once more.
With modern Linux and real Compositing, this no longer happens.
Note: It is entirely possible to run a Fragment Shader, without any Vertex Shader, taking input from one or more Texture Images and outputting to a 2D rectangle on the screen. This is also referred to as 2D, hardware acceleration. Additionally, there used to exist the “Fixed-Function Pipeline”, for which no explicit VS code was written, but which assumed that a standard set of Vertex Attributes was given, for 3D rendering. But, if we write a Vertex Shader, we must also attach a Fragment Shader. And, if we write a Geometry Shader, we must also attach a Vertex Shader.
Under DirectX 11, the use of a Fixed-Function Pipeline is no longer permitted, and one reason seems to be the fact, that modern graphics hardware would need to implement what the FFP used to do, using a standard VS anyway, which consumes a GPU core anyway. So aside from that GPU core out of many, modern graphics cards lack the additional logic-circuits, that once made a Vertex Pipeline unnecessary for much of their rendering.
These are the corresponding screenshots, on my newer laptop named ‘Klystron':
dirk@Klystron:~$ clinfo Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (1912.5) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 MESA 11.1.1 Platform Name: Clover Platform Vendor: Mesa Platform Extensions: cl_khr_icd (...) Platform Name: Clover Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Max compute units: 6 Max work items dimensions: 3 Max work items: 256 Max work items: 256 Max work items: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 0Mhz Address bits: 32 Max memory allocation: 268435456 Image support: No Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 268435456 Max number of constant args: 13 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 0 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f2ec0657620 Name: AMD ARUBA (DRM 2.43.0, LLVM 3.5.0) Vendor: AMD Device OpenCL C version: OpenCL C 1.1 Driver version: 11.1.1 Profile: FULL_PROFILE Version: OpenCL 1.1 MESA 11.1.1 Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64
On that machine, I seem to have 6 * 64 = 384 cores.
This is the corresponding screenshot on my Windows 7 tower-computer ‘Mithral':
I seem to have 7 * 48 = 336 cores on this machine.