The concept seems rather intuitive, by which a single object or entity can be translucent. But another concept which is less intuitive, is that the degree to which it is so can be stated once per pixel, through an alpha-channel.

Just as every pixel can possess one channel for each of the three additive primary colors: Red, Green and Blue, It can possess a 4th channel named Alpha, which states on a scale from [ 0.0 … 1.0 ] , how opaque it is.

This does not just apply to the texture images, whose pixels are named texels, but also to Fragment Shader output, as well as to the pixels actually associated with the drawing surface, which provide what is known as destination alpha, since the drawing surface is also the destination of the rendering, or its target.

Hence, there exist images whose pixels have a 4-channel format, as opposed to others, with a mere 3-channel format.

Now, there is no clear way for a display to display alpha. In certain cases, alpha in an image being viewed is hinted by software, as a checkerboard pattern. But what we see is nevertheless color-information and not transparency. And so a logical question can be, what the function of this alpha-channel is, which is being rendered to.

There are many ways in which the content from numerous sources can be blended, but most of the high-quality ones require, that much communication takes place between rendering-stages. A strategy is desired in which output from rendering-passes is combined, without requiring much communication between the passes. And alpha-blending is a de-facto strategy for that.

By default, closer entities, according to the position of their origins in view space, are rendered first. What this does is put closer values into the Z-buffer as soon as possible, so that the Z-buffer can prevent the rendering of the more distant entities as efficiently as possible. 3D rendering starts when the CPU gives the command to ‘draw’ one entity, which has an arbitrary position in 3D. This may be contrary to what 2D graphics might teach us to predict.

Alas, alpha-entities – aka entities that possess alpha textures – do not write the Z-buffer, because if they did, they would prevent more-distant entities from being rendered. And then, there would be no point in the closer ones being translucent.

The default way in which alpha-blending works, is that the alpha-channel of the display records the extent to which entities have been left visible, by previous entities which have been rendered closer to the virtual camera.

Hence, if an alpha-texture is to be rendered, the destination alpha (of the screen-pixel) is multiplied by the alpha-value of the current texel or fragment, to determine how strongly the current fragment is still visible. The R, G, B color-vector is then modulated by this value, and the resulting color added to the screen-pixel. After that, the current-fragment Alpha is subtracted from that of the screen-pixel just rendered to.

Of course, once screen-alpha reaches zero, no more-distant fragments will render to the same pixel.

One aspect of alpha-blending which can make it more interesting, is the fact that non-default blending modes can be specified, again without requiring complex algorithms. Modes which use source and destination -alpha in different ways are predefined, as well as modes which make the result subtractive instead of additive, modes that leave the destination-alpha set to some arbitrary value – for later entities to process according to their own logic, etc.. So this scheme can do more than just accumulate constantly-diminishing visibility, due to opacity.

I think that one way in which the output from more than one GPU-core group can be combined, is via alpha-blending, ?

And one goal towards which Alpha-Blending is often applied, even under Linux, is “Desktop Compositing”. This is a case in which output is generated by unrelated programs, in which often there is no Vertex-Algorithm – i.e. it is a 2D application for the Fragment Shader logic, which is more often associated with 3D graphics, which renders to rectangles on the screen that have no 3D orientation – and in which the output gets sent to the same screen-pixels, effectively allowing for output from several programs to overlap.

And the result then, is widgets that seem to have gloss or shine, or that otherwise display overlaid effects, at low cost to the CPU. Another common result is a type of glow which surrounds my currently-active window, to tell me which window that is, which is translucent in front of the desktop wallpaper and other desktop elements, but the mode of which has been set to ‘Additive’, so that instead of matting the background, it seems to light up as if in front of it.

The KDE / Plasma desktop on my older computer ‘Phoenix’, which is also my server, uses the Wayland Compositor, but contrarily to what the WiKi states, has not replaced the X-server with it, which is still running.



But, one way Linux used to fake the existence of Desktop Compositing, was by giving individual applications access to the image of the X-server wallpaper which their windows occupied, so that X windows could use software to blend this BG image into their own content, without any Desktop Compositing being active in fact.

One way in which Linux users were once able to detect this, was by moving the application window, and watching the background which seemed translucently behind it, move with the window, until we let go of the window with our mouse, at which point the BG repositioned itself into its correct place, consistent with the wallpaper once more.

With modern Linux and real Compositing, this no longer happens.


Note: It is entirely possible to run a Fragment Shader, without any Vertex Shader, taking input from one or more Texture Images and outputting to a 2D rectangle on the screen. This is also referred to as 2D, hardware acceleration. Additionally, there used to exist the “Fixed-Function Pipeline”, for which no explicit VS code was written, but which assumed that a standard set of Vertex Attributes was given, for 3D rendering. But, if we write a Vertex Shader, we must also attach a Fragment Shader. And, if we write a Geometry Shader, we must also attach a Vertex Shader.

Under DirectX 11, the use of a Fixed-Function Pipeline is no longer permitted, and one reason seems to be the fact, that modern graphics hardware would need to implement what the FFP used to do, using a standard VS anyway, which consumes a GPU core anyway. So aside from that GPU core out of many, modern graphics cards lack the additional logic-circuits, that once made a Vertex Pipeline unnecessary for much of their rendering.

These are the corresponding screenshots, on my newer laptop named ‘Klystron':





dirk@Klystron:~$ clinfo
Number of platforms:                             2
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.1 MESA 11.1.1
  Platform Name:                                 Clover
  Platform Vendor:                               Mesa
  Platform Extensions:                           cl_khr_icd


  Platform Name:                                 Clover
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Max compute units:                             6
  Max work items dimensions:                     3
    Max work items[0]:                           256
    Max work items[1]:                           256
    Max work items[2]:                           256
  Max work group size:                           256
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 2
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     4
  Native vector width double:                    2
  Max clock frequency:                           0Mhz
  Address bits:                                  32
  Max memory allocation:                         268435456
  Image support:                                 No
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               No
    Round to +ve and infinity:                   No
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    None
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            1073741824
  Constant buffer size:                          268435456
  Max number of constant args:                   13
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    0
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x7f2ec0657620
  Name:                                          AMD ARUBA (DRM 2.43.0, LLVM 3.5.0)
  Vendor:                                        AMD
  Device OpenCL C version:                       OpenCL C 1.1 
  Driver version:                                11.1.1
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.1 MESA 11.1.1
  Extensions:                                    cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64


On that machine, I seem to have 6 * 64 = 384 cores.

This is the corresponding screenshot on my Windows 7 tower-computer ‘Mithral':


I seem to have 7 * 48 = 336 cores on this machine.


Print Friendly, PDF & Email

8 thoughts on “Alpha-Blending”

  1. I was just looking for this info for a while. After 6 hours of continuous Googleing, finally I got it in your site. I wonder what is the lack of Google strategy that don’t rank this type of informative sites in top of the list. Generally the top sites are full of garbage.

  2. With every little thing that seems to be building inside this specific subject material, a significant percentage of viewpoints tend to be relatively refreshing. Nevertheless, I appologize, but I can not give credence to your whole strategy, all be it radical none the less. It appears to me that your remarks are generally not totally justified and in actuality you are yourself not really totally convinced of your assertion. In any case I did enjoy reading it.

    1. I actually need to admit that the posting you commented on contains a major error. When alpha-blending is applied in real-time, hardware-accelerated graphics, then the same convention is applied, as in the software-rendered case, where the more-distant entities are rendered first. I tried to comment on this in A follow-up posting. Yet, even this second posting of mine was not 100%.

      When game-development uses real-time 3D, it usually divides the scene into more than one rendering group, with one for non-alpha entities, and with a later rendering group for alpha entities. This allows the alpha-entities to be rendered back-to-front, or in some other arbitrary order, even though the opaque scene elements of the earlier rendering group are still being rendered front-to-back, to increase speed, through the use of the Z-buffer.

      But I think that it’s good, that you pointed out the fact, that there were some weaknesses in what I wrote.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>