Finding Out, How Many GPU Cores we have, Under Linux

One question which I see written about often on the Web, is how to find out certain stats about our GPU, under Linux. Under Windows, we had GUI-based programs such as ‘GPU-Z’, etc., but under Linux, the information can be just a bit harder to find.

I think that one tool which helps, is to have ‘OpenCL’ installed, as well as the command-line utility ‘clinfo’, which exists as one out of several packages, and as an actual, resulting command-name.

If we’re serious about programming our GPU, then having a GUI won’t help us much. We’d need to get dirty with code in that case, and then to have text-based solutions is suitable. But, if we’re just spectators in this sport, then two stats we may nevertheless want to know are:

  1. How many GPU-Core-Groups do we have – since GPU-Cores are organized as Groups, and
  2. How many actual Shader-Cores do we have in each Group?

Interestingly, the grouping of shader-cores, also represents how many vector-processors such GPU-computing tools as OpenCL see. And so, on the computer which I name ‘Klystron’, which is running Debian / Jessie, when typing in these commands as user, I get the following results:

 


dirk@Klystron:~$ clinfo | grep units
  Max compute units:                             4
  Max compute units:                             6
dirk@Klystron:~$ clinfo | grep multiple
  Kernel Preferred work group size multiple:     1
  Kernel Preferred work group size multiple:     64
dirk@Klystron:~$

 

This needs some explaining. On ‘Klystron’, I have the proprietary, AMD packages for OpenCL installed, since that computer has both an AMD CPU and a Radeon GPU. And this means that the OpenCL version will be able to carry out computing on both. And so I have the stats for both.

In this case, the second entries reveal that I have 6×64 cores on the GPU.

Continue reading Finding Out, How Many GPU Cores we have, Under Linux

The PC Graphics Cards have specifically been made Memory-Addressable.

Please note that this posting does not describe

  • Android GPUs, or
  • Graphics Chips on PCs and Laptops, which use shared memory.

I am writing about the big graphics cards which power-users and gamers install into their PCs, which have a special bus-slot, and which cost as much money in themselves, as some computers cost.

The way those are organized physically, they possess one or more GPU, and , which loosely correspond to the CPU and RAM on the motherboard of your PC.

The GPU itself contains registers, which are essentially of two types:

  • Per-core, and
  • Shared

When coding shaders for 3D games, the GPU-registers do not fulfill the same function, as addresses in . The addresses in typically store texture images, vertex arrays in their various formats, and index buffers, as well as frame-buffers for the output. In other words, the typically stores model-geometry and 2D or 3D images. The registers on the GPU are typically used as temporary storage-locations, for the work of shaders, which are again, separately loaded onto the GPUs, after they are compiled by the device-drivers.

A major feature which the designers of graphics cards have given them, is to extend the system memory of the PC onto the graphics card, in such a way that most of its memory actually has hardware-addresses as well.

This might not include the GPU-registers that are specific to one core, but I think does include shared GPU-registers.

Continue reading The PC Graphics Cards have specifically been made Memory-Addressable.

“Hardware Acceleration” is a bit of a Misnomer.

The term gets mentioned quite frequently, that certain applications offer to give the user services, with “Hardware Acceleration”. This terminology can in fact be misleading – in a way that has no consequences – because computations that are hardware-accelerated, are still being executed according to software that has either been compiled or assembled into micro-instructions. Only, those micro-instructions are not to be executed on the main CPU of the machine.

Instead, those micro-instructions are to be executed either on the GPU, or on some other coprocessor, which provide the accelerating hardware.

Often, the compiling of code meant to run on a GPU – even though the same, in theory as regular software – has its own special considerations. For example, this code often consists of only a few micro-instructions, over which great care must be taken to make sure that they run correctly on as many GPUs as possible. I.e., when we are coding a shader, is often a main paradigm. And the possibility crops up often in practice, that even though the code is technically correct, it does not run correctly on a given GPU.

I do not really know how it is with SIMD coprocessors.

But this knowledge would be useful to have, in order to understand this posting of mine.

Of course, there exists a major contradiction to what I just wrote, in OpenCL and CUDA.

Continue reading “Hardware Acceleration” is a bit of a Misnomer.

Some GPU Stats about Two Of My Computers

I own a Windows 7 tower-computer I name ‘Mithral’, which has an NVIDIA GeForce GTX460 graphics card. That was state-of-the-art around 2011. I read that its GPU was identical to that of the GTX470, except that the GPU was supposed to possess 8 core-groups. In the factory, they tested the GPUs, and if they found that one of the core-groups was defective, they used a laser to deactivate that one, and sold the graphics card for a lower price, as a GTX460. According to the first screen-shot, which was obtained using “GPU-Z”, it has 7 * 48 = 336 cores.

I also own a Linux-based laptop named ‘Klystron’, with a nonspecific AMD / ATI chipset – both CPU and GPU – which was state-of-the-art around 2013. The second and third attachment seem to show that it possesses 6 * 64 = 384 cores. The second screen-shot was obtained using “KInfoCenter”, and the last text-quotation was obtained from the OpenCL toolkit installed on the same laptop.

Continue reading Some GPU Stats about Two Of My Computers