Performed another experiment, with Genetic Algorithms.

When working with Genetic Algorithms, I know of essentially two types. One is an example, where a mutation algorithm generates a unit, which is itself an algorithm. A Windows-based program which used to work in this way, a long time ago, was named ‘Discipulus‘.

Another type of Genetic Algorithm which can exist, which can also be described as an ‘Evolutionary Programming’ example, is one which generates a unit, which is really just an arbitrary array of data, for which an externally-defined program must determine a fitness-level, so that the mutation algorithm can try to find units, which achieve greatest possible fitness. An example of such a system, which is still maintained today, is named ‘µGP3‘.

I would guess that something like ‘µGP3′ is more useful in Engineering, where a deterministic approach can be taken to determine how well a hypothetical machine would work, which was tweaked by the Evolved Data-Set, but according to a set of rules, which is not known to have an inverse.

‘Discipulus’ might be of greater use in AI, where the Genetic Algorithm is assumed to take a range of input parameters, and is required either to take an action based on those, or to arrive at an interpretation of those parameters, for which the AI was trained using numerous examples of input-value-sets, and for which a most-accurate result is known for each (training) set of simultaneous input-values. In the case of ‘Discipulus’, there exist two types of training exercises: Approximation, or Classification. And a real-world example where such a form would be useful, is in the computerized recognition of faces. Or of shapes, from other sorts of images.

Actually, I think that the way facial recognition works in practice today is, that a 2D Fourier Transform is computed of a rectangle, the dimensions of which in pixels have been tweaked, but in such a way that the conformity of the Fourier Transform to known Fourier Transforms pretty well guarantees that a given face is to be recognized.

But other examples may exist, in which the relationship between Input variables and Output Values is essentially of an initially-unknown nature. And then, even if we might not want to embed an actual GA into our AI, the use of GAs may provide some insight, as to how Input Values are in fact related to Output Values – through Human Interpretation of the GAs which result.

Continue reading Performed another experiment, with Genetic Algorithms.

Debiasing Bit-Streams

One subject which has received a lot of attention in popular computing, is how to generate random numbers, that are not just old-fashioned pseudo-random numbers from the old days, but that are truly random, and of sufficiently good quality to be used in cryptography.

In order to achieve this goal, there exist certain packages such as

  • ‘bit-babbler’ (Requires specialized USB-keys that can be bought.)
  • ‘rng-tools’ (Assumes a special CPU-feature, which only the most recent CPUs will have.)
  • ‘haveged’ (Is only present in Debian / Jessie repos, and later. Not present in Debian / Lenny repos.)
  • ‘randomsound’ (An age-old solution, present in all the older repos.)

An interesting observation about ‘rng-tools’ that I will make, is that as of Kernel-version 3.17, this daemon doesn’t strictly need to be installed, because the kernel has intrinsic support for the CPU, random-number generator, if any is detected. The higher kernel-versions will incorporate its output into ‘/dev/random’ automatically, but not exclusively. Whether the reader has this H/W feature can be determined by installing the package ‘cpuid’ and then running:

cpuid | grep DRAND

The only real advantage which might remain to using ‘rng-tools‘, would be the configurability of this package for user-defined sources of random data. In other words, a clever user could write a custom script, which looks for random data anywhere to his liking, which hashes that, and which then offers the hash as a source in his configuration of ‘rng-tools’.

If the user has the ALSA sound-system installed, then the following script might work:

 

#!/bin/bash

NOISE_CMD="sudo -u user arecord --format=S16_LE --duration=1 -q"

if [ -e /opt/entropy ]
then
  rm -f /opt/entropy || exit 1
fi

mkfifo /opt/entropy || exit 1
exec 3<> /opt/entropy

head -c 256 /dev/urandom >/opt/entropy

sleep 300

while true
do

  read -n1 -t 1 FIFO_HAS </opt/entropy

  if [ $? -eq 0 ]
  then
    sleep 2

  else
    $NOISE_CMD >>/dev/null || exit 1

    n=1
    while [ $n -le 8 ]
    do
      $NOISE_CMD | \
        shasum -b -a 512 - | cut -c -128 | xxd -r -p >/opt/entropy
      n=$(( n+1 ))
    done
  fi
done


 

Because this script would run as root, especially on PulseAudio-based systems, it’s important to replace ‘user’ above with the main user’s username.

(Edit 01/04/2018 : One tricky fact which needs to be considered, if the above script is to be of any practical use, is that it needs to be started before ‘rng-tools’ is, so that when the later daemon starts, the object ‘/opt/entropy’ will already exist.

If my script was just to delete an existing object by that name, and create a new one, after ‘rng-tools’ was running, then the later script will simply be left with an invalid handle, for the deleted object. The newly-created pipe will not replace it, within the usage by ‘rng-tools’.

By now I have debugged the script and tested it, and it works 100%.

I leave it running all day, even though I have no use for the generated ‘/opt/entropy’ .

Further, I’ve added a test, before running the command 8 times, to verify that accessing the sound device does not result in an error-condition. The default-behavior of ‘arecord’ is blocking, which is ideal for me, because it means that if the device is merely busy, my script’s invocation of ‘arecord’ will simply wait, until the device is available again. )

(Edit 01/06/2018 : If an external application starts to read from the named pipe at an unpredicted point in time, and depletes it, the maximum amount of time the above script will wait, before replenishing the pipe, is 5 seconds, assuming that the CPU has the necessary amount of idle-time.

The script may spend 2 seconds ‘sleep’ing, then, 1 second trying to read a byte from the pipe, then, 1 second testing the function that’s supposed to generate the ‘Noise’, and then 1 more second, actually generating the first block of random bits, out of 8 consecutive blocks.


 

I should also add, that the OSS sound system still exists, although its use has largely been made obsolete on mainstream PCs, due to either ‘PulseAudio’, or ‘ALSA’, which coexist just fine. But there is a simple reason to avoid trying to access the device-files of the sound-device directly:

On-board sound devices usually default to 8-bit mode. And in that mode, the probability is much too high, that sequences of samples will actually have an unchanging value, because there is actually a limit to how poor the behavior of the analog amplifiers can be, that act as input to the A/D converter. And so it is critical that the sound-device be switched into some 16-bit mode. This is much easier to do using the ‘arecord’ command, than it would be with direct access to device-files.

Given nothing but sequences of 8-bit sound-samples, it should come as no surprise if eventually, the resulting hash-codes actually repeat. :-)  )


 

I’m going to focus my attention on ‘randomsound’ during the rest of this posting, even though on most of my computers, I have better solutions installed. What the package ‘randomsound’ does, is put the sound-input device, that presumably has a typical, cheap A/D converter, into Unsigned, 16-bit, Mono, 8kHz mode, and to extract only the Least-Significant Bit from each sound sample. In order for that to work well, the bits extracted in this way need to be “debiased”.

What this means is that on old sound cards, the A/D converter is so bad, that even the LSB does not have a probability of exactly 50%, of being either a 1 or a 0. Thus, the bits obtained from the hardware in this way need to be measured first, for what the probability is of this bit being 1, and must then be processed further in a way that compensates for a non-50% probability, before bits are derived, which are supposedly random.

I’ve given some private thought, on how the debiasing of this bit-stream might work, and can think of two ways:

  1. A really simple way, that only uses the XOR function, between the current input-bit, and the most-recent output-bit,
  2. A more-complex way, based on the idea that if 8 biased bits are combined arbitrarily into a byte, on the average, the value of this byte will not equal 128.

The first approach would have as a disadvantage, that if the bias was strong, a sequence of bits would result, which would still not be adequately random.

Continue reading Debiasing Bit-Streams