Debiasing Bit-Streams

One subject which has received a lot of attention in popular computing, is how to generate random numbers, that are not just old-fashioned pseudo-random numbers from the old days, but that are truly random, and of sufficiently good quality to be used in cryptography.

In order to achieve this goal, there exist certain packages such as

  • ‘bit-babbler’ (Requires specialized USB-keys that can be bought.)
  • ‘rng-tools’ (Assumes a special CPU-feature, which only the most recent CPUs will have.)
  • ‘haveged’ (Is only present in Debian / Jessie repos, and later. Not present in Debian / Lenny repos.)
  • ‘randomsound’ (An age-old solution, present in all the older repos.)

An interesting observation about ‘rng-tools’ that I will make, is that as of Kernel-version 3.17, this daemon doesn’t strictly need to be installed, because the kernel has intrinsic support for the CPU, random-number generator, if any is detected. The higher kernel-versions will incorporate its output into ‘/dev/random’ automatically, but not exclusively. Whether the reader has this H/W feature can be determined by installing the package ‘cpuid’ and then running:

cpuid | grep DRAND

The only real advantage which might remain to using ‘rng-tools‘, would be the configurability of this package for user-defined sources of random data. In other words, a clever user could write a custom script, which looks for random data anywhere to his liking, which hashes that, and which then offers the hash as a source in his configuration of ‘rng-tools’.

If the user has the ALSA sound-system installed, then the following script might work:

 

#!/bin/bash

NOISE_CMD="sudo -u user arecord --format=S16_LE --duration=1 -q"

if [ -e /opt/entropy ]
then
  rm -f /opt/entropy || exit 1
fi

mkfifo /opt/entropy || exit 1
exec 3<> /opt/entropy

head -c 256 /dev/urandom >/opt/entropy

sleep 300

while true
do

  read -n1 -t 1 FIFO_HAS </opt/entropy

  if [ $? -eq 0 ]
  then
    sleep 2

  else
    $NOISE_CMD >>/dev/null || exit 1

    n=1
    while [ $n -le 8 ]
    do
      $NOISE_CMD | \
        shasum -b -a 512 - | cut -c -128 | xxd -r -p >/opt/entropy
      n=$(( n+1 ))
    done
  fi
done


 

Because this script would run as root, especially on PulseAudio-based systems, it’s important to replace ‘user’ above with the main user’s username.

(Edit 01/04/2018 : One tricky fact which needs to be considered, if the above script is to be of any practical use, is that it needs to be started before ‘rng-tools’ is, so that when the later daemon starts, the object ‘/opt/entropy’ will already exist.

If my script was just to delete an existing object by that name, and create a new one, after ‘rng-tools’ was running, then the later script will simply be left with an invalid handle, for the deleted object. The newly-created pipe will not replace it, within the usage by ‘rng-tools’.

By now I have debugged the script and tested it, and it works 100%.

I leave it running all day, even though I have no use for the generated ‘/opt/entropy’ .

Further, I’ve added a test, before running the command 8 times, to verify that accessing the sound device does not result in an error-condition. The default-behavior of ‘arecord’ is blocking, which is ideal for me, because it means that if the device is merely busy, my script’s invocation of ‘arecord’ will simply wait, until the device is available again. )

(Edit 01/06/2018 : If an external application starts to read from the named pipe at an unpredicted point in time, and depletes it, the maximum amount of time the above script will wait, before replenishing the pipe, is 5 seconds, assuming that the CPU has the necessary amount of idle-time.

The script may spend 2 seconds ‘sleep’ing, then, 1 second trying to read a byte from the pipe, then, 1 second testing the function that’s supposed to generate the ‘Noise’, and then 1 more second, actually generating the first block of random bits, out of 8 consecutive blocks.


 

I should also add, that the OSS sound system still exists, although its use has largely been made obsolete on mainstream PCs, due to either ‘PulseAudio’, or ‘ALSA’, which coexist just fine. But there is a simple reason to avoid trying to access the device-files of the sound-device directly:

On-board sound devices usually default to 8-bit mode. And in that mode, the probability is much too high, that sequences of samples will actually have an unchanging value, because there is actually a limit to how poor the behavior of the analog amplifiers can be, that act as input to the A/D converter. And so it is critical that the sound-device be switched into some 16-bit mode. This is much easier to do using the ‘arecord’ command, than it would be with direct access to device-files.

Given nothing but sequences of 8-bit sound-samples, it should come as no surprise if eventually, the resulting hash-codes actually repeat. :-)  )


 

I’m going to focus my attention on ‘randomsound’ during the rest of this posting, even though on most of my computers, I have better solutions installed. What the package ‘randomsound’ does, is put the sound-input device, that presumably has a typical, cheap A/D converter, into Unsigned, 16-bit, Mono, 8kHz mode, and to extract only the Least-Significant Bit from each sound sample. In order for that to work well, the bits extracted in this way need to be “debiased”.

What this means is that on old sound cards, the A/D converter is so bad, that even the LSB does not have a probability of exactly 50%, of being either a 1 or a 0. Thus, the bits obtained from the hardware in this way need to be measured first, for what the probability is of this bit being 1, and must then be processed further in a way that compensates for a non-50% probability, before bits are derived, which are supposedly random.

I’ve given some private thought, on how the debiasing of this bit-stream might work, and can think of two ways:

  1. A really simple way, that only uses the XOR function, between the current input-bit, and the most-recent output-bit,
  2. A more-complex way, based on the idea that if 8 biased bits are combined arbitrarily into a byte, on the average, the value of this byte will not equal 128.

The first approach would have as a disadvantage, that if the bias was strong, a sequence of bits would result, which would still not be adequately random.

The second approach seems better to me, but will also require that 8x as many samples be read in from the sound-card, than bits are to be added to the entropy pool.

The way my own, mental version of the debiasing could work, is that at first, 256x 8 bits could be read from the sound card, just to calibrate one block of random bits. The byte-values could be averaged, to arrive at a temporary ‘bias’-value.

Then, the same 256x 8 bits could be read in from the sound card, so that each of the resulting bytes is compared with the value ‘bias’, and if the byte read-in is greater than that value, a single 1 could be derived. Otherwise, a single 0 would be derived.

The disadvantage of doing things the harder way would be, that 2048 samples of sound would need to be input, to produce only 256 bits of truly random noise.

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.