Some Suggested Code

In This Earlier Posting, I had written at first some observations about Bluetooth-pairing, but then branched out on the subject, of whether a Diffie-Hellman Key Exchange could be easier to compute, if it was somehow simplified into using a 32-bit modulus. Obviously, my assumption was that a 64-bit by 32-bit divide instruction would be cheap on the CPU, while arbitrary-precision integer operations are relatively expensive, and actually cause some observable lag on CPUs which I’ve used.

And so, because I don’t only want to present theory in a form that some people may not be able to visualize, what I did next was to write a C++ program, that actually only uses C, that assumes the user only has a 32-bit CPU, and yet that performs a 64-bit by 64-bit division.

This has now been tested and verified.

One problem in writing this code is the fact that, depending on whether the divisor, which is formatted as a 64-bit field, contains an actual 64-bit, 32-bit, 24-bit, or 16-bit value, a different procedure needs to be selected, and even this fixed-precision format cannot assume that the bits are always positioned in the correct place.

I invite people to look at this sample-code:

(Update 06/10/2018, 23h30 : )

I needed to correct mistakes which I made in the same piece of code. However, I presently know the code to be correct.

Just to test my premises, I’m going to assume that the following division is to be carried out, erroneously as a simple division, but assuming a word-size of 32 bits:

Continue reading Some Suggested Code

Getting Steam to run with proprietary nVidia.

According to this earlier posting, I had switched my Debian / Stretch, Debian 9 -based computer named ‘Plato’ from the open-source ‘Nouveau’ drivers, which are delivered via ‘Mesa’ packages, to the ‘proprietary nVidia drivers’, because the latter offer more power, in several ways.

But then one question which we’d want an answer to, is how to get “Steam” to run. Just from the Linux package manager, the available games are slim picking, and through a Steam membership, we can buy Linux-versions of at least some powerful games, meaning, to pay for with money.

But, when I tried to launch Steam naively, which used to launch, I only got a message-box which said, that Steam could not find the 32-bit version of ‘’ – and then Steam died. This temporary result ‘makes sense’, because I had only installed the default, 64-bit libraries, that go with the proprietary packages. Steam is a 32-bit application by default, and I have a multi-arch setup, as a prerequisite.

And so my next project became, to create a 32-bit as well as the existing, 64-bit interface to the rendering system.

The steps that I took assume, that I had previously chosen to install the ‘GLVND’ version of the GLX binaries, and unless the reader has done same, the following recipe will be incorrect. Only, the ‘GLVND’ packages which I initially installed, are not listed in the posting linked to above; they belonged to the suggested packages, which I wrote I had written down on paper, and then added to the command-line, which transformed my graphics system.

When I installed the additional, 32-bit libraries, I did get a disturbing error message, but my box still runs.

Continue reading Getting Steam to run with proprietary nVidia.

Why OpenShot will Not Run on my Linux Tablet

In This earlier posting, I had written, that although I had already deemed it improbable that the sort of Linux application will run on my Linux tablet, I would nevertheless try, and see if I could get such a thing to run. And as I wrote, I had considerable problems with ‘LiVES’, where, even if I had gotten the stuttering preview-playback under control, I could not have put LiVES into multi-tracking mode, thereby rendering the effort futile. I had also written that on my actual Linux laptop, LiVES just runs ~perfectly~.

And so a natural question which might come next would be, ‘Could OpenShot be made to run in that configuration?’ And the short answer is No.

‘OpenShot’, as well as ‘KDEnlive’, use a media library named ‘mlt’, but which is also referred to as ‘MeLT’, to perform their video compositing actions. I think that the main problem with my Linux tablet, when asked to run such applications, is that it is only a 32-bit quad-core, and an ARM CPU at that. The ARM CPUs are designed in such a way, that they are optimal when running Dalvik Bytecode, which I just learned has been succeeded by ART, through the interpreter and compiler that Android provides, and in certain cases, at running Codecs in native code, which are specialized. They do not have ‘MMX’ extensions etc., because they are RISC-Chips.

When we try to run CPU-intensive applications on an ARM CPU that have been compiled in native code, we suffer from an additional performance penalty.

The entire ‘mlt’ library is already famous, for requiring a high amount of CPU usage, in order to be versatile in applying effects to video time-lines. There have been stuttering issues, when trying to get it to run on ‘real Linux computers’, though not mine. My Linux laptop is a 64-bit quad-core, AMD-Family CPU, with many extensions. That CPU can handle what I throw at it.

What I also observe when trying to run OpenShot on my Linux tablet, is that if I right-click on an imported video-clip, and then left-click on Preview, the CPU usage is what it is, and I already get some mild stuttering / crackling of the audio. But if I then drag that clip onto a time-line, and ask the application to preview the time-line, the CPU usage is double what it would otherwise be, and I get severe playback-slowdown, as well as audio-stuttering.

In itself, this result ‘makes sense’, because even if we have not elected to put many effects into the time-line, the processing that takes place, when we preview it, is as it would be, if we had put an arbitrary number of effects. I.e., the processing is inherently slow, for the eventuality that we’d put many effects. So slow, that the application doesn’t run on a 32-bit, ARM-quad-core, when compiled in native code.

(Updated 10/09/2017 : )

Continue reading Why OpenShot will Not Run on my Linux Tablet

64-bit FORTH

Before I describe a 64-bit FORTH version, I need to explain something about the more-established, general 32-bit version of this low-level language. The 32-bit FORTH had an accepted method of storing 64-bit, so-called ‘double-width’ numbers, in two positions on the stack, with the most-significant word ‘on top’, and the less-significant word in second position from the ‘top’. Correspondingly, ‘normal’ 32-bit FORTH possesses special operators that can either perform full, double-width arithmetic, which treats two consecutive stack-positions as defining a single number, or mixed-width arithmetic, in which two single-width numbers can lead to a double-width product, or by which a double-width number can be divided by a single-width, to arrive at a single-width quotient, and optionally, also to arrive at a single-width modulus / remainder.

This is a fashion in which 32-bit CPUs have generally been able to perform 64-bit arithmetic, partially. And if the reader is not familiar with how this is accessible under FORTH, I can suggest This External Article as a source of reference.

But, if the reader has installed the 64-bit GNU FORTH, which is also just called ‘gforth’ under Linux, then I should call to his attention, that now, each stack-position is capable of holding a 64-bit number, and that all the operators on those numbers are possible, which would otherwise be available for 32-bit numbers, with no special naming.

The following is a small text-session-clip, that illustrates how this works:


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
1 cells . 8  ok
hex  ok
$0123456789ABCDEF $100000000 /mod .s <2> 89ABCDEF 1234567  ok
$20 lshift + .s <1> 123456789ABCDEF  ok
dup  ok
m* .s <2> -235A1DF76F0D5ADF 14B66DC33F6AC  ok
d. 14B66DC33F6ACDCA5E20890F2A521  ok


So the ‘/mod’, the ‘lshift’ and the ‘+’ operators are spelled exactly as they would have been for 32-bit FORTH, but operate on potential 64-bit numbers. ‘gforth’ still preserves the double-width operators with the special naming, but in this case, double-width actually means 128-bit ! Its implementation of the standard Fetch operator, which is still named ‘@’, now fetches a 64-bit value from RAM. And I have already documented this slight incompatibility in This Earlier Posting.

If we can assume that our source-code is to be compiled on 64-bit FORTH, we can just perform 64-bit operations on single stack-positions, at will.

It should also be noted that in FORTH comments and stack-traces, the topmost stack-position is written on the right-hand side of the textual list. The apparent negative number above, in the second stack position from the logical top, after ‘ m* .s ‘ , is the result of the most-significant bit of that word being a (1) and not a (0). By convention, in signed integers, this will trigger that a negative number is meant, using two’s complement. And this is still the case, in hexadecimal. But, because this word is the less-significant of the two listed, its most-significant bit will no longer be the most-significant, after it has been combined with the other word, thereby again forming a positive number when printed out as a single 128-bit, signed, integer.

(Edit 07/31/2017 : )

One fact which I have blatantly ignored in my own coding, was that the way in which I chose to separate a single numeric value into two bit-fields – through a modulus-division – is not the most efficient in terms of how many CPU-clock-cycles it consumes. A preferable way to do the same thing, is by using ‘rshift’, and then masking.

The reason for this is the fact that when a CPU is instructed to left-shift or right-shift a binary register-content, doing so takes up about 1 clock-cycle per bit-position shifted. What people may not realize, is that although addition and subtraction can easily be performed in one step by logic-circuits, multiplication and division may not be, assuming a generic, general-purpose CPU. To multiply two 64-bit numbers, actually means to perform 64 additions optionally, each depending on the value of one bit. And to divide a 64-bit number by another, actually means to perform 64 subtractions optionally, each depending on the outcome of a comparison. Maybe for 32-bit or 16-bit registers, we don’t care. But by the time we’re using 64-bit numbers, it penalizes our CPU-load twice as much.

Continue reading 64-bit FORTH