One question which I see written about often on the Web, is how to find out certain stats about our GPU, under Linux. Under Windows, we had GUI-based programs such as ‘GPU-Z’, etc., but under Linux, the information can be just a bit harder to find.
I think that one tool which helps, is to have ‘OpenCL’ installed, as well as the command-line utility ‘clinfo’, which exists as one out of several packages, and as an actual, resulting command-name.
If we’re serious about programming our GPU, then having a GUI won’t help us much. We’d need to get dirty with code in that case, and then to have text-based solutions is suitable. But, if we’re just spectators in this sport, then two stats we may nevertheless want to know are:
- How many GPU-Core-Groups do we have – since GPU-Cores are organized as Groups, and
- How many actual Shader-Cores do we have in each Group?
Interestingly, the grouping of shader-cores, also represents how many vector-processors such GPU-computing tools as OpenCL see. And so, on the computer which I name ‘Klystron’, which is running Debian / Jessie, when typing in these commands as user, I get the following results:
dirk@Klystron:~$ clinfo | grep units Max compute units: 4 Max compute units: 6 dirk@Klystron:~$ clinfo | grep multiple Kernel Preferred work group size multiple: 1 Kernel Preferred work group size multiple: 64 dirk@Klystron:~$
This needs some explaining. On ‘Klystron’, I have the proprietary, AMD packages for OpenCL installed, since that computer has both an AMD CPU and a Radeon GPU. And this means that the OpenCL version will be able to carry out computing on both. And so I have the stats for both.
In this case, the second entries reveal that I have 6×64 cores on the GPU.
But if I try the same experiment on the recently-installed computer ‘Plato’, which is running Debian / Stretch, I only get partial results, with the error-message ‘Invalid Source':
dirk@Plato:~$ clinfo | grep units === CL_PROGRAM_BUILD_LOG === invalid source Max compute units 7 dirk@Plato:~$ clinfo | grep multiple === CL_PROGRAM_BUILD_LOG === invalid source Preferred work group size multiple invalid source dirk@Plato:~$
It can tell me how many Groups I have – In the above box, some readers may have to scroll sideways to see this, but not how many Cores, in each group…
And the first-order explanation for why would be, that the ‘clinfo’ command is a mini-OpenCL program, which is written in C for the GPU, but which merely fetches the stats. Apparently, on ‘Plato’, some of the source-code written in ‘clinfo’ is not recognized by the OpenCL framework.
A closer look at the problem explains it better. On ‘Klystron’, my OpenCL installation is the proprietary AMD version of the packages – including ‘
amd-clinfo‘, while on ‘Plato’, I only have the Mesa-Drivers, even for OpenCL, installed.
For this type of thing, of course it’s always better to be using the proprietary drivers. But in order to install the NVIDIA OpenCL Drivers on ‘Plato’, I’d also need to have several G.P. Graphics Drivers installed, including the proprietary as well as the generic ones, as I do on ‘Klystron’, and this means I’d need to install the ‘…alternatives…’ packages on ‘Plato’ as well, that presently allow me to switch back and forth between drivers, on ‘Klystron’.
For the moment, I think that would threaten what I have going with ‘Plato’ too much – which has been, a very stable experience. But this also means that with the Mesa Drivers alone, I cannot rely on doing any OpenCL computing, since even the demands of the generic ‘
clinfo‘ package go beyond, what those OpenCL drivers can compile.
dirk@Plato:~$ clinfo Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 13.0.6 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name NVC4 Device Vendor NVIDIA Device Vendor ID 0x10de Device Version OpenCL 1.1 Mesa 13.0.6 Driver Version 13.0.6 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 7 Max clock frequency 512MHz Max work item dimensions 3 Max work item sizes 1024x1024x64 Max work group size 1024 === CL_PROGRAM_BUILD_LOG === invalid source Preferred work group size multiple invalid source Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 0 / 0 (n/a) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 1099511627776 (1024GiB) Error Correction support No Max memory allocation 1099511627776 (1024GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 49152 (48KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 15 Max size of kernel argument 4096 (4KiB) Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name NVC4 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name NVC4 ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1 dirk@Plato:~$