## Trying to turn an ARM-64 -based, Android-hosted, prooted Linux Guest System, into a software development platform.

In a preceding posting I described, how I had used an Android app that does not require or benefit from having ‘root’, to install a Linux Guest System on a tablet, that has an ARM-64 CPU, which is referred to more precisely as an ‘aarch64-linux-gnu’ architecture. The Android app sets up a basic Linux system, but the user can use apt-get to extend it – if he chose a Debian 10 / Buster -based system as I did. And then, for the most part, the user’s ability to run software depends on how well the Debian package maintainers cross-compiled their packages to ‘AARCH64′. Yet, on some occasions, even in this situation, a user might want to write and then run his own code.

To make things worse, the main alternative to a pure text interface, is a VNC Session, based on ‘TightVNC’, by the choice of the developers of this app. On a Chromebook, I chose differently, by setting up a ‘TigerVNC’ desktop instead, but on this tablet, the choice was up to the Android developers alone. What this means is, that the Linux applications are forced to render purely in software mode.

Many factors work against writing one’s own code, that include, the fact that executables will result, that have been compiled for the ‘ARM’ CPU, and linked against Linux libraries!

But one of the immediate handicaps could be, that the user might want to program in Python, but can’t get any good IDEs to run. Every free IDE I could try would segfault, and I don’t even believe that these segfaults are due to problems with my Python libraries. The IDEs were themselves written in Python, using Qt5, Gtk3 or wxWidgets modules. These types of libraries are as notorious as the Qt5 Library, for relying on GPU acceleration, which is nowhere to be found, and one reason I think this is most often the culprit, is the fact that one of the IDE’s – “Eric” – actually manages to report with a gasp, that it could not create an OpenGL rendering surface – and then Segfaults. (:3)

(Edit 9/15/2020, 13h50: )

I want to avoid any misinterpretations of what I just wrote. This does not happen out of nowhere, because an application developer decided to build his applications using ‘python3-pyqt5′ etc… When I give the command:


# apt install eric



Doing so pulls in many dependencies, including an offending package. (:1) Therefore, the application developer who wrote ‘Eric’ not only chose to use one of the Python GUI libraries, but chose to use OpenGL as well.

Of course, after I next give the command to remove ‘eric’, I also follow up with the command:


# apt autoremove



Just so that the offending dependencies are no longer installed.

(End of Edit, 9/15/2020, 13h50.)

Writing convoluted code is more agreeable, if at the very least we have an IDE in front of us, that can highlight certain syntax errors, and scan includes for code completion, etc. (:2)

Well, there is a Text Editor cut out for that exact situation, named “CudaText“. I must warn the reader though, that there is a learning curve with this text editor. But, just to prove that the AARCH64-ported Python 3.7 engine is not itself buggy, the text editor’s plug-in framework is written in Python 3, and as soon as the user has learned his first lesson in how to configure CudaText, the plug-in system comes to full life, and without any Segfaults, running the Guest System’s Python engine. I think CudaText is based on Gtk2.

This might just turn out to be the correct IDE for that tablet.

(Updated 9/19/2020, 20h10… )

## There can be curious gaps, in what some people understand.

One of the concepts which once dominated CGI was, that textures assigned to 3D models needed to include a “Normal-Map”, so that even early in the days of 3D gaming, textured surfaces would seem to have ‘bumps’, and these normal-maps were more significant, than displacement-maps – i.e., height- or depth-maps – because shaders were actually able to compute lighting subtleties more easily, using the normal-maps. But additionally, it was always quite common that ordinary 8x8x8 (R,G,B) texel-formats needed to store the normal-maps, just because images could more-easily be prepared and loaded with that pixel-format. (:1)

The old-fashioned way to code that was, that the 8-bit integer (128) was taken to symbolize (0.0), that (255) was taken to symbolize a maximally positive value, and that the integer (0) was decoded to (-1.0). The reason for this, AFAIK, was the use by the old graphics cards, of the 8-bit integer, as a binary fraction.

In the spirit of recreating that, and, because it’s sometimes still necessary to store an approximation of a normal-vector, using only 32 bits, the code has been offered as follows:


Out.Pos_Normal.w = dot(floor(normal * 127.5 + 127.5), float3(1 / 256.0, 1.0, 256.0));

float3 normal = frac(Pos_Normal.w * float3(1.0, 1 / 256.0, 1 / 65536.0)) * 2.0 - 1.0;



There’s an obvious problem with this backwards-emulation: It can’t seem to reproduce the value (0.0) for any of the elements of the normal-vector. And then, what some people do is, to throw their arms in the air, and to say: ‘This problem just can’t be solved!’ Well, what about:


//  Assumed:
normal = normalize(normal);

Out.Pos_Normal.w = dot(floor(normal * 127.0 + 128.5), float3(1 / 256.0, 1.0, 256.0));



A side effect of this will definitely be, that no uncompressed value belonging to the interval [-1.0 .. +1.0] will lead to a compressed series of 8 zeros.

Mind you, because of the way the resulting value was now decoded again, the question of whether zero can actually result, is not as easy to address. And one reason is the fact that, for all the elements except the first, additional bits after the first 8 fractional bits, have not been removed. But that’s just a problem owing to the one-line decoding that was suggested. That could be changed to:


float3 normal = floor(Pos_Normal.w * float3(256.0, 1.0, 1 / 256.0));
normal = frac(normal * (1 / 256.0)) * (256.0 / 127.0) - (128.0 / 127.0);



Suddenly, the impossible has become possible.

N.B.  I would not use the customized decoder, unless I was also sure, that the input floating-point value, came from my customized encoder. It can easily happen that the shader needs to work with texture images prepared by an external program, and then, because of the way their channel-values get normalized today, I might use this as the decoder:


float3 normal = texel.rgb * (255.0 / 128.0) - 1.0;



However, if I did, a texel-value of (128) would still be required, to result in a floating-point value of (0.0)

(Updated 5/10/2020, 19h00… )

## Some trivia about how GPU registers are organized.

I have written about the GPU – the Graphics Processing Unit – at length. And just going by what I wrote so far, my readers might think that its registers are defined the same way, as those of any main CPU. But to the contrary, GPU registers are organized differently at the hardware level, in a way most-optimized for raster-based graphics output.

Within GPU / graphics-oriented / shader coding, there exists a type of language which is ‘closest to the machine’, and which is a kind of ‘Assembler Language for GPUs’, that being called “ARB“. Few shader-designers actually use it anymore, instead using a high-level language, such as ‘HLSL’ for the DirectX platform, or such as ‘GLSL’ for the OpenGL platform… Yet, especially since drivers have been designed that use the GPU for general-purpose (data-oriented) programming, it might be good to glance at what ARB defines.

And so one major difference that exists between main CPU registers, and GPU registers by default, is that each GPU register is organized into a 4-element vector of 32-bit, floating-point numbers. The GPU is designed at the hardware level, to be able to perform certain Math operations on the entire 4-element vector, in one step if need be. And within ARB, a notation exists by which the register name can be given a dot, and can then be followed by such pieces of text as:

• .xyz – Referring to the set of the first 3 elements (for scene or model coordinates),
• .uv – Referring to the set of the first 2 elements (for textures),
• .rst – Referring to the set of the first 3 elements – again (for 3D textures, volume-texture coordinates).

Also, notations exist in which the order of these elements gets switched around. Therefore, if the ARB code specifies this:

• r0.uv

It is specifying not only register (0), but the first 2, 32-bit, floating-point elements, within (r0), in their natural order.

This observation needs to be modified somewhat, before an accurate representation of modern GPU registers has been defined.

Firstly, I have written elsewhere on my blog, that as data passes from a Vertex Shader to a Fragment Shader, that data, which may contain texture coordinates by default, but which can really consist of virtually any combination of values, needs to be interpolated (:1), so that the interpolated value gets used by the FS, to render one pixel to the screen. This interpolation is carried out by specialized hardware in a GPU core group, and for that reason, some upward limit exists, on how many such registers can be interpolated.

(Updated 5/04/2019, 23h35 … )

## ArrayFire 3.5.1 Compiled, using GCC 4.9.2 and CUDA – Success!

A project which has fixated me for several days has been, to get the API which is named “ArrayFire” custom-compiled, specifically version 3.5.1 of that API, to enable CUDA support. This poses a special problem because when we use the Debian / Stretch repositories to install the CUDA Run-Time, we are limited to installing version 8.0.44. This run-time is too old, to be compatible with the standard GCC / CPP / C++ compiler-set available under the same distribution of Linux. Therefore, it can be hard for users and Debian Maintainers alike to build projects that use CUDA, and which are compatible with -Stretch.

Why use ArrayFire? Because writing kernels for parallel computing on the GPU is hard. ArrayFire is supposed to make the idea more-accessible to people like me, who have no specific training in writing highly parallel code.

When we install ArrayFire from the repositories, OpenCL support is provided, but not CUDA support.

I’ve hatched a plan by which I can install an alternative compiler set on the computer I name ‘Phosphene’, that being GCC / CPP / C++ version 4.9.2, and with which I can switch my compilers to that version, so that maybe, I can compile projects in the future, that do use CUDA, and that are sophisticated enough to require a full compiler-suite to build.

What I’ve found was that contrarily to how it went the previous time, this time I’ve scored a success.   Not only did the project compile without errors, but then, a specific Demo that ships with the source-code version, that uses the ability of CUDA to pass graphics to the OpenGL rendering of the GPU, also ran without errors…

So now I can be sure that the tool-chain which I’ve installed, is up to the task of compiling highly-complex CUDA projects.

(Update 5/03/2019, 7h30 : )