## Some trivia about how GPU registers are organized.

I have written about the GPU – the Graphics Processing Unit – at length. And just going by what I wrote so far, my readers might think that its registers are defined the same way, as those of any main CPU. But to the contrary, GPU registers are organized differently at the hardware level, in a way most-optimized for raster-based graphics output.

Within GPU / graphics-oriented / shader coding, there exists a type of language which is ‘closest to the machine’, and which is a kind of ‘Assembler Language for GPUs’, that being called “ARB“. Few shader-designers actually use it anymore, instead using a high-level language, such as ‘HLSL’ for the DirectX platform, or such as ‘GLSL’ for the OpenGL platform… Yet, especially since drivers have been designed that use the GPU for general-purpose (data-oriented) programming, it might be good to glance at what ARB defines.

And so one major difference that exists between main CPU registers, and GPU registers by default, is that each GPU register is organized into a 4-element vector of 32-bit, floating-point numbers. The GPU is designed at the hardware level, to be able to perform certain Math operations on the entire 4-element vector, in one step if need be. And within ARB, a notation exists by which the register name can be given a dot, and can then be followed by such pieces of text as:

• .xyz – Referring to the set of the first 3 elements (for scene or model coordinates),
• .uv – Referring to the set of the first 2 elements (for textures),
• .rst – Referring to the set of the first 3 elements – again (for 3D textures, volume-texture coordinates).

Also, notations exist in which the order of these elements gets switched around. Therefore, if the ARB code specifies this:

• r0.uv

It is specifying not only register (0), but the first 2, 32-bit, floating-point elements, within (r0), in their natural order.

This observation needs to be modified somewhat, before an accurate representation of modern GPU registers has been defined.

Firstly, I have written elsewhere on my blog, that as data passes from a Vertex Shader to a Fragment Shader, that data, which may contain texture coordinates by default, but which can really consist of virtually any combination of values, needs to be interpolated (:1), so that the interpolated value gets used by the FS, to render one pixel to the screen. This interpolation is carried out by specialized hardware in a GPU core group, and for that reason, some upward limit exists, on how many such registers can be interpolated.

(Updated 5/04/2019, 23h35 … )

## ArrayFire 3.5.1 Compiled, using GCC 4.9.2 and CUDA – Success!

A project which has fixated me for several days has been, to get the API which is named “ArrayFire” custom-compiled, specifically version 3.5.1 of that API, to enable CUDA support. This poses a special problem because when we use the Debian / Stretch repositories to install the CUDA Run-Time, we are limited to installing version 8.0.44. This run-time is too old, to be compatible with the standard GCC / CPP / C++ compiler-set available under the same distribution of Linux. Therefore, it can be hard for users and Debian Maintainers alike to build projects that use CUDA, and which are compatible with -Stretch.

Why use ArrayFire? Because writing kernels for parallel computing on the GPU is hard. ArrayFire is supposed to make the idea more-accessible to people like me, who have no specific training in writing highly parallel code.

When we install ArrayFire from the repositories, OpenCL support is provided, but not CUDA support.

I’ve hatched a plan by which I can install an alternative compiler set on the computer I name ‘Phosphene’, that being GCC / CPP / C++ version 4.9.2, and with which I can switch my compilers to that version, so that maybe, I can compile projects in the future, that do use CUDA, and that are sophisticated enough to require a full compiler-suite to build.

What I’ve found was that contrarily to how it went the previous time, this time I’ve scored a success.   Not only did the project compile without errors, but then, a specific Demo that ships with the source-code version, that uses the ability of CUDA to pass graphics to the OpenGL rendering of the GPU, also ran without errors…

So now I can be sure that the tool-chain which I’ve installed, is up to the task of compiling highly-complex CUDA projects.

(Update 5/03/2019, 7h30 : )

## Compatibility GCC Installed

One of the more frustrating facts about Debian / Stretch is that its maintainers have broken with tradition, by no longer providing any compatibility versions of the main compilers, GCC, CPP and C++, which provide CC and C++ -language support, useful in 90% (+) of all programming that takes place. Instead, Debian / Stretch provides GCC / CPP / C++ version 6.3 alone. What I had already written about was, that the version of the CUDA Run-Time and Toolkit available from the standard repositories, has remained v8.0.44 for the time being. This CUDA Version does not support CC or C++ version 6 because version 6 of these compilers is too high!

One way in which power-users could try to remedy this situation would be, to install some sort of compatibility version of CC / C++, even though none is offered in the standard repositories. But, when we try to custom-compile let’s say, GCC v5.3, which would already be low enough for CUDA 8.0.44 to support, we find that GCC 6.3 is just plain unable to compile GCC 5.3, no matter what.

And so another way to solve the same problem can be, to add the old Debian Jessie / Oldstable repositories to our sources list, and then just to install from there.

I find this to be an extremely bad idea.

First of all, Debian differs from Ubuntu, in that Debian never provided GCC 5.3. In Debian / Jessie, what we got was GCC 4.8, or maybe even v4.9. But more importantly, simply sandwiching two incompatible repositories together can create a fatal set of problems.

What I was finally able to do, was just to download roughly a dozen packages as binaries, from the Debian Repository Web-site, which finally provided GCC, CPP and C++ v4.8. The path I took required that I run into the error message numerous times that dependencies could not be satisfied, because under Debian, neither ‘/usr/bin/gcc’ nor ‘/usr/bin/c++’ are provided by a single, binary package. Each is provided by packages, that depend uniquely on other packages, that are also not in the repositories.

Further, once the power-user has in fact installed binaries, after making sure that none of their file-names overlap, he must also create a system of Debian Alternatives, that allow him to switch between compilers easily. The problem with that is the fact that because, under Debian / Stretch, no provision was ever made by Package Maintainers for alternatives to exist, automatic mechanisms have also not been provided, to install ‘Link Groups’. The Link Groups ‘cc’, ‘cpp’ and ‘c++’ exist, but only in such a way as to provide one executable each.

As I was doing my best to install the Link Groups, I made a mistake, which simply over-wrote ‘/usr/bin/gcc’ with a different symlink, and which therefore forced me to (1) delete the link-group, and (2) reinstall GCC 6.3 from the package manager. After that, a new attempt to set up the link-groups succeeded:


dirk@Phosphene:~$su Password: root@Phosphene:/home/dirk# update-alternatives --config cc There are 2 choices for the alternative cc (providing /usr/bin/cc). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/gcc-6 20 auto mode 1 /usr/bin/gcc-4.8 10 manual mode 2 /usr/bin/gcc-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# update-alternatives --config cpp There are 2 choices for the alternative cpp (providing /usr/bin/cpp). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/cpp-6 20 auto mode 1 /usr/bin/cpp-4.8 10 manual mode 2 /usr/bin/cpp-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# update-alternatives --config c++ There are 2 choices for the alternative c++ (providing /usr/bin/c++). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/g++-6 20 auto mode 1 /usr/bin/g++-4.8 10 manual mode 2 /usr/bin/g++-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# exit exit dirk@Phosphene:~$



Note: The first link above, named ‘cc’, has a corresponding slave-link named ‘gcc’, thereby forming the only real ‘Link Group’. The others are just plain Links.

I am reasonably certain that none of these link-groups are broken. But what my reader should be able to infer from what I’ve written, is that It would be a hare-brained attempt, to duplicate what I’ve done, entirely based on this blog posting.

(Edit 5/03/2019, 11h45 : )

Just to prove how hare-brained this idea really is, I just uninstalled the alternative compilers, and replaced them with the GCC / CPP / C++ tool-chain, version 4.9, and made that part of the update-alternatives system as above! (:3)

(End of Edit.)

So what does this provide me with (hopefully)?

(Updated 5/02/2019, 12h15 … )

## A bit of my personal history, experimenting in 3D game design.

I was wide-eyed and curious. And much before the year 2000, I only owned Windows-based computers, purchased most of my software for money, and also purchased a license of 3D Game Studio, some version of which is still being sold today. The version that I purchased well before 2000 was using the ‘A4′ game engine, where all the 3DGS versions have a game engine specified by the latter ‘A’ and a number.

That version of 3DGS was based on DirectX 7 because Microsoft owns and uses DirectX, and DirectX 7 still had as one of its capabilities to switch back into software-mode, even though it was perhaps one of the earliest APIs that offered hardware-rendering, provided that is, that the host machine had a graphics card capable of hardware-rendering.

I created a simplistic game using that engine, which had no real title, but which I simply referred to as my ‘Defeat The Guard Game’. And in so doing I learned a lot.

The API which is referred to as OpenGL, offers what DirectX versions offer. But because Microsoft has the primary say in how the graphics hardware is to be designed, OpenGL versions are frequently just catching up to what the latest DirectX versions have to offer. There is a loose correspondence in version numbers.

Shortly after the year 2000, I upgraded to a 3D Game Studio version with their ‘A6′ game engine. This was a game engine based on DirectX 9.0c, which was also standard with Windows XP, which no longer offered any possibility of software rendering, but which gave the customers of this software product their first opportunity to program shaders. And because I was playing with the ‘A6′ game engine for a long time, in addition owning a PC that ran Windows XP for a long time, the capabilities of DirectX 9.0c became etched in my mind. However, as fate would have it, I never actually created anything significant with this game engine version – only snippets of code designed to test various capabilities.