## Generating a Germain Prime efficiently, using Python and an 8-core CPU.

I have spent extensive time, as well as previous blog postings, exploring the subject of how to generate a Germain prime (number), in the Python (3.5) programming language, but taking advantage of parallel computing to some slight degree.

Before I go any further with this subject, I’d like to point out that generally, for production-ready applications, Python is not the best language to use. The main reason is the fact that Python is an interpreted language, even though many modern interpreted languages are compiled into bytecode before being interpreted. This makes a Python script slower by nature, than very well-written C or even C++. But what I aim to do is to use Lego-Blocks to explore this exercise, yet, to use tools which would be used professionally.

The main way I am generating prime numbers is, to start with a pseudo-random, 512-bit number (just as a default, the user can specify different bit-lengths), and then to attempt to divide this number by a list of known, definite primes, that by now only extend to 4096 (exclusively, of course), in an attempt to disprove that number prime. In about 1/8 of all cases, the number survives this test, after which I undertake a more robust, Miller-Rabin approach to try disproving it prime 192 times, probabilistically. If the number has survived these tests, my program assumes it to be prime.

Even though I’ve never been told this, the probability of a non-prime Candidate surviving even a single Miller-Rabin Test, or Witness, is very small, smaller than 1/8. This could be due to the fact that the Witness in question is raised to a high, odd exponent, in its 512-bit modulus etc., after which it would be squared some number of times. Because the first Candidate in the modulus of 4 is 3, that one actually gets squared a subsequent total of zero times. And after each exponentiation, the result could be any number in the modulus, with the possible exception of zero. It needs to become either (1) or (n-1) in the modulus of (n), for the Candidate to survive the test. (:1)

Further, there is no need for the numbers that get used as witnesses, which are pseudo-random, to be of the same, high, cryptographic quality of pseudo-random, as the candidate is, which is being tested.

But there is a sub-category of prime numbers which have recently been of greater interest to me, which is known as the Germain prime numbers, such that the Totient of that main candidate, divided by two, should also be prime. And so, if the density of prime numbers roughly equal to (n) is (1 / log(n)), and if we can assume a uniform distribution of random numbers, then the probability of finding a Germain prime is roughly (1 / (log (n))2), assuming that our code was inefficient enough actually to test all numbers. The efficiency can be boosted by making sure that the random number modulo 4 equals 3.

But one early difficulty I had with this project was, that if I needed to start with a new pseudo-random number for each candidate, on my Linux computers, I’d actually break ‘/dev/urandom’ ! Therefore, the slightly better approach which I needed to take next was, to make my initial random number the pseudo-random one, and then just to keep incrementing it by 4, until the code plodded into a Germain prime.

Even when all this is incorporated into the solution I find that with Python, I need the added advantage of parallel computing. Thus, I next learned about the GIL – The Global Interpreter Lock – and the resulting pitfalls of multi-threaded Python, which is not concurrent. Multi-threading under Python tells the O/S to allocate CPU cores as usual, but then only allows 1 core to be running at any one time! But, even under those constraints, I found that I was benefiting from the fact that my early code was able to test at least 2 candidates for primality simultaneously, those being the actual candidate, as well as its Totient divided by 2. And as soon as either candidate was disproved prime, testing on the other could be stopped quickly. This reduced the necessary number of computations dramatically, to make up for the slowness of multi-threaded Python, and I felt that I was on the right track.

The only hurdle which remained was, how to convert my code into multi-processing code, no longer merely multi-threaded, while keeping the ability for two processes, then, to send each other a shutdown-command, as soon as the present process disproved its number to be prime.

(Updated 6/01/2019, 17h35 … )

## A Hypothetical Way, to Generate Bigger Random Numbers, using the GMP Library.

In recent days I wrote some Python scripts, which generate 1024-bit prime numbers. But the next stage in my own thinking is, to try to accomplish the same thing in C++, using the GMP Multi-Precision Library, because GMP seems to be a well-supported and overall favorite C++ Multi-Precision Librrary. But when I explored this subject further, I noticed something which surprised me:

GMP is still using the ‘Linear Congruent algorithm’, as its main source of strong, pseudo-random numbers. The reason this fact surprises me is the fact that the Linear Congruent algorithm was invented as early as in the 1970s, as a cheap way to achieve pseudo-randomness, that would be good enough for games to surprise players, but which was never meant to provide crypto-quality random numbers. Actually, back in the 1970s, the registers on which this algorithm was used, may have been 16-bit or 32-bit registers, while today they are 256-bit registers, for which reason a careful and random-looking choice for the two constants is important. In fact, GMP defines the following functions, to initialize a ‘state_t’ object, to become a Linear Congruent RNG:

int gmp_randinit_lc_2exp_size (gmp_randstate_t state, mp_bitcnt_t
size)

void gmp_randinit_lc_2exp (gmp_randstate_t state, const_mpz_t a,
unsigned long c, mp_bitcnt_t m2exp)

For people who did not know, the generality of the algorithm is:

m2exp == 2 * size

X := aX + c mod 2m2exp

The first of the two initializations above uses the ‘size’ parameter, in order to look up in a static, known table, what the ‘ideal’ values for the constants (a) and (c) are, to achieve maximum randomness. The second initialization allows the programmer to specify those constants himself, and poses no restrictions on what ‘m2exp’ will be.

One of the first approaches a cryptographic programmer might want to pursue, in order to generate a prime number eventually, is to read some random bits from the device-file ‘/dev/random’ (on a Linux computer), use the first initialization above, which will lead to an RNG, and then seed this RNG once from the system-provided random number, with which the programmer can then suggest both prime candidates and witnesses to determine whether the candidates are prime, until one prime number is ‘proven’.

But I see a potential ambition for any programmer who may want to go that route:

• Given that (a) and (c) are to be chosen from a known table, this presents a vulnerability, because a hypothetical attacker against this crypto-system may use these constants to gain knowledge about the internal state of the ‘state_t’ object, and therefore become aware of a limited number of prime numbers that can result, thereby narrowing his attack against eventual public keys, by only trying to prime-factorize or otherwise decrypt, using the narrowed set of primes.
• Even if the constants (a) and (c) are secure in nature and not themselves hacked, the table presently only extends to a ‘size’ of 128 bits, which will actually mean that the modulus ‘m2exp’ is 2256. And so, ‘the maximum amount of randomness’ – i.e., the Entropy – which even a 2048-bit public-key modulus can achieve, will be 256 bits. And this would also mean that the strength of the key-pair is only equivalent to a 128-bit, symmetrical AES key, regardless of how complex it is.
• Some programmers might actually want to work with a modulus of 2512.

At the same time, there are reasons why the obvious solution, just to read all random bits from the device-file ‘/dev/urandom’, poses its own problems. One of the reasons is the fact that potentially, 300 (+) prime-number candidates may need to be generated, each of which will be 1024 bits long, and tested 200 (+) times, and that the quality of the randomness ‘/dev/urandom’ provides under those conditions may also be sub-optimal, because that source, too, is pseudo-random, and will only become minimally based on the physically-measured randomness which ‘/dev/random’ represents. And yet, ‘/dev/random’ will typically block if more than ~2048 bits are to be read from it.

I can think of an approach to solving this problem, which may overcome most of the hurdles…

(Updated 5/12/2019, 22h30 … )

## Performed another experiment, with Genetic Algorithms.

When working with Genetic Algorithms, I know of essentially two types. One is an example, where a mutation algorithm generates a unit, which is itself an algorithm. A Windows-based program which used to work in this way, a long time ago, was named ‘Discipulus‘.

Another type of Genetic Algorithm which can exist, which can also be described as an ‘Evolutionary Programming’ example, is one which generates a unit, which is really just an arbitrary array of data, for which an externally-defined program must determine a fitness-level, so that the mutation algorithm can try to find units, which achieve greatest possible fitness. An example of such a system, which is still maintained today, is named ‘µGP3‘.

I would guess that something like ‘µGP3′ is more useful in Engineering, where a deterministic approach can be taken to determine how well a hypothetical machine would work, which was tweaked by the Evolved Data-Set, but according to a set of rules, which is not known to have an inverse.

‘Discipulus’ might be of greater use in AI, where the Genetic Algorithm is assumed to take a range of input parameters, and is required either to take an action based on those, or to arrive at an interpretation of those parameters, for which the AI was trained using numerous examples of input-value-sets, and for which a most-accurate result is known for each (training) set of simultaneous input-values. In the case of ‘Discipulus’, there exist two types of training exercises: Approximation, or Classification. And a real-world example where such a form would be useful, is in the computerized recognition of faces. Or of shapes, from other sorts of images.

Actually, I think that the way facial recognition works in practice today is, that a 2D Fourier Transform is computed of a rectangle, the dimensions of which in pixels have been tweaked, but in such a way that the conformity of the Fourier Transform to known Fourier Transforms pretty well guarantees that a given face is to be recognized.

But other examples may exist, in which the relationship between Input variables and Output Values is essentially of an initially-unknown nature. And then, even if we might not want to embed an actual GA into our AI, the use of GAs may provide some insight, as to how Input Values are in fact related to Output Values – through Human Interpretation of the GAs which result.

## Debiasing Bit-Streams

One subject which has received a lot of attention in popular computing, is how to generate random numbers, that are not just old-fashioned pseudo-random numbers from the old days, but that are truly random, and of sufficiently good quality to be used in cryptography.

In order to achieve this goal, there exist certain packages such as

• ‘bit-babbler’ (Requires specialized USB-keys that can be bought.)
• ‘rng-tools’ (Assumes a special CPU-feature, which only the most recent CPUs will have.)
• ‘haveged’ (Is only present in Debian / Jessie repos, and later. Not present in Debian / Lenny repos.)
• ‘randomsound’ (An age-old solution, present in all the older repos.)

An interesting observation about ‘rng-tools’ that I will make, is that as of Kernel-version 3.17, this daemon doesn’t strictly need to be installed, because the kernel has intrinsic support for the CPU, random-number generator, if any is detected. The higher kernel-versions will incorporate its output into ‘/dev/random’ automatically, but not exclusively. Whether the reader has this H/W feature can be determined by installing the package ‘cpuid’ and then running:

cpuid | grep DRAND

The only real advantage which might remain to using ‘rng-tools‘, would be the configurability of this package for user-defined sources of random data. In other words, a clever user could write a custom script, which looks for random data anywhere to his liking, which hashes that, and which then offers the hash as a source in his configuration of ‘rng-tools’.

If the user has the ALSA sound-system installed, then the following script might work:

#!/bin/bash

NOISE_CMD="sudo -u user arecord --format=S16_LE --duration=1 -q"

if [ -e /opt/entropy ]
then
rm -f /opt/entropy || exit 1
fi

mkfifo /opt/entropy || exit 1
exec 3<> /opt/entropy

head -c 256 /dev/urandom >/opt/entropy

sleep 300

while true
do

read -n1 -t 1 FIFO_HAS </opt/entropy

if [ $? -eq 0 ] then sleep 2 else$NOISE_CMD >>/dev/null || exit 1

n=1
while [ $n -le 8 ] do$NOISE_CMD | \
shasum -b -a 512 - | cut -c -128 | xxd -r -p >/opt/entropy
n=\$(( n+1 ))
done
fi
done




Because this script would run as root, especially on PulseAudio-based systems, it’s important to replace ‘user’ above with the main user’s username.

(Edit 01/04/2018 : One tricky fact which needs to be considered, if the above script is to be of any practical use, is that it needs to be started before ‘rng-tools’ is, so that when the later daemon starts, the object ‘/opt/entropy’ will already exist.

If my script was just to delete an existing object by that name, and create a new one, after ‘rng-tools’ was running, then the later script will simply be left with an invalid handle, for the deleted object. The newly-created pipe will not replace it, within the usage by ‘rng-tools’.

By now I have debugged the script and tested it, and it works 100%.

I leave it running all day, even though I have no use for the generated ‘/opt/entropy’ .

Further, I’ve added a test, before running the command 8 times, to verify that accessing the sound device does not result in an error-condition. The default-behavior of ‘arecord’ is blocking, which is ideal for me, because it means that if the device is merely busy, my script’s invocation of ‘arecord’ will simply wait, until the device is available again. )

(Edit 01/06/2018 : If an external application starts to read from the named pipe at an unpredicted point in time, and depletes it, the maximum amount of time the above script will wait, before replenishing the pipe, is 5 seconds, assuming that the CPU has the necessary amount of idle-time.

The script may spend 2 seconds ‘sleep’ing, then, 1 second trying to read a byte from the pipe, then, 1 second testing the function that’s supposed to generate the ‘Noise’, and then 1 more second, actually generating the first block of random bits, out of 8 consecutive blocks.

I should also add, that the OSS sound system still exists, although its use has largely been made obsolete on mainstream PCs, due to either ‘PulseAudio’, or ‘ALSA’, which coexist just fine. But there is a simple reason to avoid trying to access the device-files of the sound-device directly:

On-board sound devices usually default to 8-bit mode. And in that mode, the probability is much too high, that sequences of samples will actually have an unchanging value, because there is actually a limit to how poor the behavior of the analog amplifiers can be, that act as input to the A/D converter. And so it is critical that the sound-device be switched into some 16-bit mode. This is much easier to do using the ‘arecord’ command, than it would be with direct access to device-files.

Given nothing but sequences of 8-bit sound-samples, it should come as no surprise if eventually, the resulting hash-codes actually repeat.   )

I’m going to focus my attention on ‘randomsound’ during the rest of this posting, even though on most of my computers, I have better solutions installed. What the package ‘randomsound’ does, is put the sound-input device, that presumably has a typical, cheap A/D converter, into Unsigned, 16-bit, Mono, 8kHz mode, and to extract only the Least-Significant Bit from each sound sample. In order for that to work well, the bits extracted in this way need to be “debiased”.

What this means is that on old sound cards, the A/D converter is so bad, that even the LSB does not have a probability of exactly 50%, of being either a 1 or a 0. Thus, the bits obtained from the hardware in this way need to be measured first, for what the probability is of this bit being 1, and must then be processed further in a way that compensates for a non-50% probability, before bits are derived, which are supposedly random.

I’ve given some private thought, on how the debiasing of this bit-stream might work, and can think of two ways:

1. A really simple way, that only uses the XOR function, between the current input-bit, and the most-recent output-bit,
2. A more-complex way, based on the idea that if 8 biased bits are combined arbitrarily into a byte, on the average, the value of this byte will not equal 128.

The first approach would have as a disadvantage, that if the bias was strong, a sequence of bits would result, which would still not be adequately random.