A clarification about (Linux) Mesa / Nouveau Drivers

Two of the subjects which I like to blog about, are direct-rendering and Linux graphics drivers.

Well in This Earlier Posting, I had essentially written, that on the Debian 9 , Debian /Stretch computer I name ‘Plato’, I have the ‘Mesa’ Drivers installed, and that therefore, that computer cannot benefit from OpenCL, massively-parallel GPU-computing.

What may confuse some readers about this is the fact that elsewhere on the Internet, there is speak about ‘Nouveau’ Drivers, but less so about Mesa Drivers.

‘Mesa’, which I referred to, is a Debian set of meta-packages, that is all open-source. It installs several drivers, and selects the drivers based on which graphics hardware we may have. But, because ‘Plato’ does in fact have an nVidia graphics card, the Mesa package automatically selects the Nouveau drivers, which is one of the drivers it contains. Hence, when I wrote about using the Mesa Drivers, I was in fact writing about the Nouveau Drivers.

One of the reasons I have to keep using these Nouveau Drivers, is the fact that presently, ‘Plato’ is extremely stable. There would be some performance-improvements if I was to switch to the proprietary drivers, but making the transition can be a nightmare. It involves black-lists, etc..

Another reason for me to keep using the Nouveau Drivers, is the fact that unlike how it was years ago, today, those drivers support real OpenGL 3, hardware-rendering. Therefore, I’m already getting partial benefit from the hardware-rendering which the graphics card has, while using the open-source driver.

The only two things which I do not get, is OpenCL or CUDA computing capabilities, as Nouveau does not support that. Therefore, anything which I write about that subject, will have to remain theoretical for now.

I suppose that on my laptop ‘Klystron’, because I have the AMD chip-set more-correctly installed, I could be using OpenCL…

Also, ‘Plato’ is not fully a ‘Kanotix’ system. When I installed ‘Plato’, I borrowed a core system from Kanotix, before Kanotix was ready for Debian / Stretch. This means that certain features which Kanotix would normally have, which make it easier to switch between graphics drivers, are not installed on ‘Plato’. And that really makes the idea daunting, to try to switch…



One way, in which my earlier description of CUDA was out of touch, with the real-world implementation.

One of the subjects which many programmers have been studying, is not only, how to write highly parallel code, but how to write that code for the GPU, since the GPU is also the most-readily-available highly-parallel processor. In fact, any user with a powerful graphics card, may already have the basis to program using CUDA or using OpenCL.

I had written an earlier posting, in which I ended up trying to devise a way, by which the compiler of the synthesized C or C++, would detect that each variable is being used as ‘rvalues’ or ‘lvalues’ in different parts of a loop, and by which the compiler would then choose, to allocate a local register, allocate a shared register, or to make a local copy of a value once provided in a shared register.

According to what I think I’ve learned, this thinking was erroneous, simply because a CUDA or an OpenCL compiler, does not take this responsibility off the hands of the coder. In other words, the coder needs to declare explicitly and each time, whether a variable is to be allocated in a local or a shared register, and must also keep track of how his code can change the value in a shared register, from other threads than the current thread, which may produce errors in how the current thread computes.

But, a command which CUDA offers, and which needs to exist, is a ‘__syncthreads()’ function, which suspends the current thread, until all the threads running in one core-group have executed the ‘__sycnthreads()’ instruction, after which point, they may all resume again.

One fact which disappoints about the real ‘__syncthreads()’ instruction is, that it offers little in the way of added capabilities. One thing which I had written this function may do however, is actually give the CPU a chance to run briefly, in a way not obvious to CUDA code.

But then there exist capabilities which a CUDA or an OpenCL programmer might want, which have no direct support from the GPU, and one of those capabilities might be, to lock an arbitrary object, so that the current thread can perform some computation which reads the object – after having obtained a lock on it – and which then writes changes to the object, before giving up its lock on it.

(Updated 04/19/2018 : )

Continue reading One way, in which my earlier description of CUDA was out of touch, with the real-world implementation.

Finding Out, How Many GPU Cores we have, Under Linux

One question which I see written about often on the Web, is how to find out certain stats about our GPU, under Linux. Under Windows, we had GUI-based programs such as ‘GPU-Z’, etc., but under Linux, the information can be just a bit harder to find.

I think that one tool which helps, is to have ‘OpenCL’ installed, as well as the command-line utility ‘clinfo’, which exists as one out of several packages, and as an actual, resulting command-name.

If we’re serious about programming our GPU, then having a GUI won’t help us much. We’d need to get dirty with code in that case, and then to have text-based solutions is suitable. But, if we’re just spectators in this sport, then two stats we may nevertheless want to know are:

  1. How many GPU-Core-Groups do we have – since GPU-Cores are organized as Groups, and
  2. How many actual Shader-Cores do we have in each Group?

Interestingly, the grouping of shader-cores, also represents how many vector-processors such GPU-computing tools as OpenCL see. And so, on the computer which I name ‘Klystron’, which is running Debian / Jessie, when typing in these commands as user, I get the following results:


dirk@Klystron:~$ clinfo | grep units
  Max compute units:                             4
  Max compute units:                             6
dirk@Klystron:~$ clinfo | grep multiple
  Kernel Preferred work group size multiple:     1
  Kernel Preferred work group size multiple:     64


This needs some explaining. On ‘Klystron’, I have the proprietary, AMD packages for OpenCL installed, since that computer has both an AMD CPU and a Radeon GPU. And this means that the OpenCL version will be able to carry out computing on both. And so I have the stats for both.

In this case, the second entries reveal that I have 6×64 cores on the GPU.

Continue reading Finding Out, How Many GPU Cores we have, Under Linux

The PC Graphics Cards have specifically been made Memory-Addressable.

Please note that this posting does not describe

  • Android GPUs, or
  • Graphics Chips on PCs and Laptops, which use shared memory.

I am writing about the big graphics cards which power-users and gamers install into their PCs, which have a special bus-slot, and which cost as much money in themselves, as some computers cost.

The way those are organized physically, they possess one or more GPU, and DDR Graphics RAM, which loosely correspond to the CPU and RAM on the motherboard of your PC.

The GPU itself contains registers, which are essentially of two types:

  • Per-core, and
  • Shared

When coding shaders for 3D games, the GPU-registers do not fulfill the same function, as addresses in GRAM. The addresses in Graphics RAM typically store texture images, vertex arrays in their various formats, and index buffers, as well as frame-buffers for the output. In other words, the GRAM typically stores model-geometry and 2D or 3D images. The registers on the GPU are typically used as temporary storage-locations, for the work of shaders, which are again, separately loaded onto the GPUs, after they are compiled by the device-drivers.

A major feature which the designers of graphics cards have given them, is to extend the system memory of the PC onto the graphics card, in such a way that most of its memory actually has hardware-addresses as well.

This might not include the GPU-registers that are specific to one core, but I think does include shared GPU-registers.

Continue reading The PC Graphics Cards have specifically been made Memory-Addressable.