## There can be curious gaps, in what some people understand.

One of the concepts which once dominated CGI was, that textures assigned to 3D models needed to include a “Normal-Map”, so that even early in the days of 3D gaming, textured surfaces would seem to have ‘bumps’, and these normal-maps were more significant, than displacement-maps – i.e., height- or depth-maps – because shaders were actually able to compute lighting subtleties more easily, using the normal-maps. But additionally, it was always quite common that ordinary 8x8x8 (R,G,B) texel-formats needed to store the normal-maps, just because images could more-easily be prepared and loaded with that pixel-format. (:1)

The old-fashioned way to code that was, that the 8-bit integer (128) was taken to symbolize (0.0), that (255) was taken to symbolize a maximally positive value, and that the integer (0) was decoded to (-1.0). The reason for this, AFAIK, was the use by the old graphics cards, of the 8-bit integer, as a binary fraction.

In the spirit of recreating that, and, because it’s sometimes still necessary to store an approximation of a normal-vector, using only 32 bits, the code has been offered as follows:


Out.Pos_Normal.w = dot(floor(normal * 127.5 + 127.5), float3(1 / 256.0, 1.0, 256.0));

float3 normal = frac(Pos_Normal.w * float3(1.0, 1 / 256.0, 1 / 65536.0)) * 2.0 - 1.0;



There’s an obvious problem with this backwards-emulation: It can’t seem to reproduce the value (0.0) for any of the elements of the normal-vector. And then, what some people do is, to throw their arms in the air, and to say: ‘This problem just can’t be solved!’ Well, what about:


//  Assumed:
normal = normalize(normal);

Out.Pos_Normal.w = dot(floor(normal * 127.0 + 128.5), float3(1 / 256.0, 1.0, 256.0));



A side effect of this will definitely be, that no uncompressed value belonging to the interval [-1.0 .. +1.0] will lead to a compressed series of 8 zeros.

Mind you, because of the way the resulting value was now decoded again, the question of whether zero can actually result, is not as easy to address. And one reason is the fact that, for all the elements except the first, additional bits after the first 8 fractional bits, have not been removed. But that’s just a problem owing to the one-line decoding that was suggested. That could be changed to:


float3 normal = floor(Pos_Normal.w * float3(256.0, 1.0, 1 / 256.0));
normal = frac(normal * (1 / 256.0)) * (256.0 / 127.0) - (128.0 / 127.0);



Suddenly, the impossible has become possible.

N.B.  I would not use the customized decoder, unless I was also sure, that the input floating-point value, came from my customized encoder. It can easily happen that the shader needs to work with texture images prepared by an external program, and then, because of the way their channel-values get normalized today, I might use this as the decoder:


float3 normal = texel.rgb * (255.0 / 128.0) - 1.0;



However, if I did, a texel-value of (128) would still be required, to result in a floating-point value of (0.0)

(Updated 5/10/2020, 19h00… )

## Some trivia about how GPU registers are organized.

I have written about the GPU – the Graphics Processing Unit – at length. And just going by what I wrote so far, my readers might think that its registers are defined the same way, as those of any main CPU. But to the contrary, GPU registers are organized differently at the hardware level, in a way most-optimized for raster-based graphics output.

Within GPU / graphics-oriented / shader coding, there exists a type of language which is ‘closest to the machine’, and which is a kind of ‘Assembler Language for GPUs’, that being called “ARB“. Few shader-designers actually use it anymore, instead using a high-level language, such as ‘HLSL’ for the DirectX platform, or such as ‘GLSL’ for the OpenGL platform… Yet, especially since drivers have been designed that use the GPU for general-purpose (data-oriented) programming, it might be good to glance at what ARB defines.

And so one major difference that exists between main CPU registers, and GPU registers by default, is that each GPU register is organized into a 4-element vector of 32-bit, floating-point numbers. The GPU is designed at the hardware level, to be able to perform certain Math operations on the entire 4-element vector, in one step if need be. And within ARB, a notation exists by which the register name can be given a dot, and can then be followed by such pieces of text as:

• .xyz – Referring to the set of the first 3 elements (for scene or model coordinates),
• .uv – Referring to the set of the first 2 elements (for textures),
• .rst – Referring to the set of the first 3 elements – again (for 3D textures, volume-texture coordinates).

Also, notations exist in which the order of these elements gets switched around. Therefore, if the ARB code specifies this:

• r0.uv

It is specifying not only register (0), but the first 2, 32-bit, floating-point elements, within (r0), in their natural order.

This observation needs to be modified somewhat, before an accurate representation of modern GPU registers has been defined.

Firstly, I have written elsewhere on my blog, that as data passes from a Vertex Shader to a Fragment Shader, that data, which may contain texture coordinates by default, but which can really consist of virtually any combination of values, needs to be interpolated (:1), so that the interpolated value gets used by the FS, to render one pixel to the screen. This interpolation is carried out by specialized hardware in a GPU core group, and for that reason, some upward limit exists, on how many such registers can be interpolated.

(Updated 5/04/2019, 23h35 … )

## ArrayFire 3.5.1 Compiled, using GCC 4.9.2 and CUDA – Success!

A project which has fixated me for several days has been, to get the API which is named “ArrayFire” custom-compiled, specifically version 3.5.1 of that API, to enable CUDA support. This poses a special problem because when we use the Debian / Stretch repositories to install the CUDA Run-Time, we are limited to installing version 8.0.44. This run-time is too old, to be compatible with the standard GCC / CPP / C++ compiler-set available under the same distribution of Linux. Therefore, it can be hard for users and Debian Maintainers alike to build projects that use CUDA, and which are compatible with -Stretch.

Why use ArrayFire? Because writing kernels for parallel computing on the GPU is hard. ArrayFire is supposed to make the idea more-accessible to people like me, who have no specific training in writing highly parallel code.

When we install ArrayFire from the repositories, OpenCL support is provided, but not CUDA support.

I’ve hatched a plan by which I can install an alternative compiler set on the computer I name ‘Phosphene’, that being GCC / CPP / C++ version 4.9.2, and with which I can switch my compilers to that version, so that maybe, I can compile projects in the future, that do use CUDA, and that are sophisticated enough to require a full compiler-suite to build.

What I’ve found was that contrarily to how it went the previous time, this time I’ve scored a success.   Not only did the project compile without errors, but then, a specific Demo that ships with the source-code version, that uses the ability of CUDA to pass graphics to the OpenGL rendering of the GPU, also ran without errors…

So now I can be sure that the tool-chain which I’ve installed, is up to the task of compiling highly-complex CUDA projects.

(Update 5/03/2019, 7h30 : )

## Compatibility GCC Installed

One of the more frustrating facts about Debian / Stretch is that its maintainers have broken with tradition, by no longer providing any compatibility versions of the main compilers, GCC, CPP and C++, which provide CC and C++ -language support, useful in 90% (+) of all programming that takes place. Instead, Debian / Stretch provides GCC / CPP / C++ version 6.3 alone. What I had already written about was, that the version of the CUDA Run-Time and Toolkit available from the standard repositories, has remained v8.0.44 for the time being. This CUDA Version does not support CC or C++ version 6 because version 6 of these compilers is too high!

One way in which power-users could try to remedy this situation would be, to install some sort of compatibility version of CC / C++, even though none is offered in the standard repositories. But, when we try to custom-compile let’s say, GCC v5.3, which would already be low enough for CUDA 8.0.44 to support, we find that GCC 6.3 is just plain unable to compile GCC 5.3, no matter what.

And so another way to solve the same problem can be, to add the old Debian Jessie / Oldstable repositories to our sources list, and then just to install from there.

I find this to be an extremely bad idea.

First of all, Debian differs from Ubuntu, in that Debian never provided GCC 5.3. In Debian / Jessie, what we got was GCC 4.8, or maybe even v4.9. But more importantly, simply sandwiching two incompatible repositories together can create a fatal set of problems.

What I was finally able to do, was just to download roughly a dozen packages as binaries, from the Debian Repository Web-site, which finally provided GCC, CPP and C++ v4.8. The path I took required that I run into the error message numerous times that dependencies could not be satisfied, because under Debian, neither ‘/usr/bin/gcc’ nor ‘/usr/bin/c++’ are provided by a single, binary package. Each is provided by packages, that depend uniquely on other packages, that are also not in the repositories.

Further, once the power-user has in fact installed binaries, after making sure that none of their file-names overlap, he must also create a system of Debian Alternatives, that allow him to switch between compilers easily. The problem with that is the fact that because, under Debian / Stretch, no provision was ever made by Package Maintainers for alternatives to exist, automatic mechanisms have also not been provided, to install ‘Link Groups’. The Link Groups ‘cc’, ‘cpp’ and ‘c++’ exist, but only in such a way as to provide one executable each.

As I was doing my best to install the Link Groups, I made a mistake, which simply over-wrote ‘/usr/bin/gcc’ with a different symlink, and which therefore forced me to (1) delete the link-group, and (2) reinstall GCC 6.3 from the package manager. After that, a new attempt to set up the link-groups succeeded:


dirk@Phosphene:~$su Password: root@Phosphene:/home/dirk# update-alternatives --config cc There are 2 choices for the alternative cc (providing /usr/bin/cc). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/gcc-6 20 auto mode 1 /usr/bin/gcc-4.8 10 manual mode 2 /usr/bin/gcc-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# update-alternatives --config cpp There are 2 choices for the alternative cpp (providing /usr/bin/cpp). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/cpp-6 20 auto mode 1 /usr/bin/cpp-4.8 10 manual mode 2 /usr/bin/cpp-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# update-alternatives --config c++ There are 2 choices for the alternative c++ (providing /usr/bin/c++). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/g++-6 20 auto mode 1 /usr/bin/g++-4.8 10 manual mode 2 /usr/bin/g++-6 20 manual mode Press to keep the current choice[*], or type selection number: root@Phosphene:/home/dirk# exit exit dirk@Phosphene:~$



Note: The first link above, named ‘cc’, has a corresponding slave-link named ‘gcc’, thereby forming the only real ‘Link Group’. The others are just plain Links.

I am reasonably certain that none of these link-groups are broken. But what my reader should be able to infer from what I’ve written, is that It would be a hare-brained attempt, to duplicate what I’ve done, entirely based on this blog posting.

(Edit 5/03/2019, 11h45 : )

Just to prove how hare-brained this idea really is, I just uninstalled the alternative compilers, and replaced them with the GCC / CPP / C++ tool-chain, version 4.9, and made that part of the update-alternatives system as above! (:3)

(End of Edit.)

So what does this provide me with (hopefully)?

(Updated 5/02/2019, 12h15 … )