Debiasing Bit-Streams

One subject which has received a lot of attention in popular computing, is how to generate random numbers, that are not just old-fashioned pseudo-random numbers from the old days, but that are truly random, and of sufficiently good quality to be used in cryptography.

In order to achieve this goal, there exist certain packages such as

  • ‘bit-babbler’ (Requires specialized USB-keys that can be bought.)
  • ‘rng-tools’ (Assumes a special CPU-feature, which only the most recent CPUs will have.)
  • ‘haveged’ (Is only present in Debian / Jessie repos, and later. Not present in Debian / Lenny repos.)
  • ‘randomsound’ (An age-old solution, present in all the older repos.)

An interesting observation about ‘rng-tools’ that I will make, is that as of Kernel-version 3.17, this daemon doesn’t strictly need to be installed, because the kernel has intrinsic support for the CPU, random-number generator, if any is detected. The higher kernel-versions will incorporate its output into ‘/dev/random’ automatically, but not exclusively. Whether the reader has this H/W feature can be determined by installing the package ‘cpuid’ and then running:

cpuid | grep DRAND

The only real advantage which might remain to using ‘rng-tools‘, would be the configurability of this package for user-defined sources of random data. In other words, a clever user could write a custom script, which looks for random data anywhere to his liking, which hashes that, and which then offers the hash as a source in his configuration of ‘rng-tools’.

If the user has the ALSA sound-system installed, then the following script might work:



NOISE_CMD="sudo -u dirk arecord --file-type=raw --format=S16_LE --duration=1 -q"

if [ -e /opt/entropy ]
  rm -f /opt/entropy || exit 1

mkfifo /opt/entropy || exit 1
exec 3<> /opt/entropy

head -c 256 /dev/urandom >/opt/entropy

sleep 300

while true

  read -n1 -t 1 FIFO_HAS </opt/entropy

  if [ $? -eq 0 ]
    sleep 2

    $NOISE_CMD >>/dev/null || exit 1

    while [ $n -le 8 ]
      $NOISE_CMD | \
        shasum -b -a 512 - | cut -c -128 | xxd -r -p >/opt/entropy
      n=$(( n+1 ))


Because this script would run as root, especially on PulseAudio-based systems, it’s important to replace ‘user’ above with the main user’s username.

(Edit 01/04/2018 : One tricky fact which needs to be considered, if the above script is to be of any practical use, is that it needs to be started before ‘rng-tools’ is, so that when the later daemon starts, the object ‘/opt/entropy’ will already exist.

If my script was just to delete an existing object by that name, and create a new one, after ‘rng-tools’ was running, then the later script will simply be left with an invalid handle, for the deleted object. The newly-created pipe will not replace it, within the usage by ‘rng-tools’.

By now I have debugged the script and tested it, and it works 100%.

I leave it running all day, even though I have no use for the generated ‘/opt/entropy’ .

Further, I’ve added a test, before running the command 8 times, to verify that accessing the sound device does not result in an error-condition. The default-behavior of ‘arecord’ is blocking, which is ideal for me, because it means that if the device is merely busy, my script’s invocation of ‘arecord’ will simply wait, until the device is available again. )

(Edit 01/06/2018 : If an external application starts to read from the named pipe at an unpredicted point in time, and depletes it, the maximum amount of time the above script will wait, before replenishing the pipe, is 5 seconds, assuming that the CPU has the necessary amount of idle-time.

The script may spend 2 seconds ‘sleep’ing, then, 1 second trying to read a byte from the pipe, then, 1 second testing the function that’s supposed to generate the ‘Noise’, and then 1 more second, actually generating the first block of random bits, out of 8 consecutive blocks.


I should also add, that the OSS sound system still exists, although its use has largely been made obsolete on mainstream PCs, due to either ‘PulseAudio’, or ‘ALSA’, which coexist just fine. But there is a simple reason to avoid trying to access the device-files of the sound-device directly:

On-board sound devices usually default to 8-bit mode. And in that mode, the probability is much too high, that sequences of samples will actually have an unchanging value, because there is actually a limit to how poor the behavior of the analog amplifiers can be, that act as input to the A/D converter. And so it is critical that the sound-device be switched into some 16-bit mode. This is much easier to do using the ‘arecord’ command, than it would be with direct access to device-files.

Given nothing but sequences of 8-bit sound-samples, it should come as no surprise if eventually, the resulting hash-codes actually repeat. :-)  )


I’m going to focus my attention on ‘randomsound’ during the rest of this posting, even though on most of my computers, I have better solutions installed. What the package ‘randomsound’ does, is put the sound-input device, that presumably has a typical, cheap A/D converter, into Unsigned, 16-bit, Mono, 8kHz mode, and to extract only the Least-Significant Bit from each sound sample. In order for that to work well, the bits extracted in this way need to be “debiased”.

What this means is that on old sound cards, the A/D converter is so bad, that even the LSB does not have a probability of exactly 50%, of being either a 1 or a 0. Thus, the bits obtained from the hardware in this way need to be measured first, for what the probability is of this bit being 1, and must then be processed further in a way that compensates for a non-50% probability, before bits are derived, which are supposedly random.

I’ve given some private thought, on how the debiasing of this bit-stream might work, and can think of two ways:

  1. A really simple way, that only uses the XOR function, between the current input-bit, and the most-recent output-bit,
  2. A more-complex way, based on the idea that if 8 biased bits are combined arbitrarily into a byte, on the average, the value of this byte will not equal 128.

The first approach would have as a disadvantage, that if the bias was strong, a sequence of bits would result, which would still not be adequately random.

Continue reading Debiasing Bit-Streams