In recent days I wrote some Python scripts, which generate 1024-bit prime numbers. But the next stage in my own thinking is, to try to accomplish the same thing in C++, using the GMP Multi-Precision Library, because GMP seems to be a well-supported and overall favorite C++ Multi-Precision Librrary. But when I explored this subject further, I noticed something which surprised me:

GMP is still using the ‘Linear Congruent algorithm’, as its main source of strong, pseudo-random numbers. The reason this fact surprises me is the fact that the Linear Congruent algorithm was invented as early as in the 1970s, as a cheap way to achieve pseudo-randomness, that would be good enough for games to surprise players, but which was never meant to provide crypto-quality random numbers. Actually, back in the 1970s, the registers on which this algorithm was used, may have been 16-bit or 32-bit registers, while today they are 256-bit registers, for which reason a careful and random-looking choice for the two constants is important. In fact, GMP defines the following functions, to initialize a ‘state_t’ object, to become a Linear Congruent RNG:

int gmp_randinit_lc_2exp_size (gmp_randstate_t state, mp_bitcnt_t

size)

void gmp_randinit_lc_2exp (gmp_randstate_t state, const_mpz_t a,

unsigned long c, mp_bitcnt_t m2exp)

For people who did not know, the generality of the algorithm is:

m2exp == 2 * size

X := aX + c mod 2^{m2exp}

The first of the two initializations above uses the ‘size’ parameter, in order to look up in a static, known table, what the ‘ideal’ values for the constants (a) and (c) are, to achieve maximum randomness. The second initialization allows the programmer to specify those constants himself, and poses no restrictions on what ‘m2exp’ will be.

One of the first approaches a cryptographic programmer might want to pursue, in order to generate a prime number eventually, is to read some random bits from the device-file ‘/dev/random’ (on a Linux computer), use the first initialization above, which will lead to an RNG, and then seed this RNG once from the system-provided random number, with which the programmer can then suggest both prime candidates and witnesses to determine whether the candidates are prime, until one prime number is ‘proven’.

But I see a potential ambition for any programmer who may want to go that route:

- Given that (a) and (c) are to be chosen from a known table, this presents a vulnerability, because a hypothetical attacker against this crypto-system may use these constants to gain knowledge about the internal state of the ‘state_t’ object, and therefore become aware of a limited number of prime numbers that can result, thereby narrowing his attack against eventual public keys, by only trying to prime-factorize or otherwise decrypt, using the narrowed set of primes.
*Even if*the constants (a) and (c) are secure in nature and not themselves hacked, the table presently only extends to a ‘size’ of 128 bits, which will actually mean that the modulus ‘m2exp’ is 2^{256}. And so, ‘the maximum amount of randomness’ – i.e., the Entropy – which even a 2048-bit public-key modulus can achieve, will be 256 bits. And this would also mean that the strength of the key-pair is*only*equivalent to a 128-bit, symmetrical AES key, regardless of how complex it is.- Some programmers might actually want to work with a modulus of 2
^{512}.

At the same time, there are reasons why the obvious solution, just to read all random bits from the device-file ‘/dev/urandom’, poses its own problems. One of the reasons is the fact that potentially, 300 (+) prime-number candidates may need to be generated, each of which will be 1024 bits long, and tested 200 (+) times, and that the quality of the randomness ‘/dev/urandom’ provides under those conditions may also be sub-optimal, because that source, too, is pseudo-random, and will only become minimally based on the physically-measured randomness which ‘/dev/random’ represents. And yet, ‘/dev/random’ will typically block if more than ~2048 bits are to be read from it.

I can think of an approach to solving this problem, which may overcome most of the hurdles…

(Updated 5/12/2019, 22h30 … )

(As of 10/05/2018 … )

In addition to the ‘Linear Congruent algorithm’, GMP offers the ‘Mersenne Twister algorithm’, which is known to be insecure, but which nevertheless provides some layer of randomization, and which is fast to compute. Its period of repetition still greatly exceeds the number of times any one prime-number candidate will be tested. The way such an RNG is initialized with GMP is like this:

void gmp_randinit_mt (gmp_randstate_t state)

What I would suggest should happen is, that first, two ‘state_t’ objects should be created, named ‘rng1′ and ‘rng2′. rng1 should be initialized as a Mersenne Twister RNG, using the third of the function prototypes I provided above. Then, a custom-function can be made to read the desired number of random bits (*i.e., 512*), from ‘/dev/random’, and only once per prime number generated. These bits can be referred to as the ‘global_seed’, and should be used to seed ‘rng1′. Then, the second prototype above – the one that requires the programmer provide the constants (a) and (c) – can be used to initialize the object named ‘rng2′, to the Linear Congruent algorithm. The value of (a) required to do so can be a *quantity of 512 bits* output from ‘rng1′, but should be OR’ed bit-wise, with the numeral (1) before being used as (a).

(5 626 616 701 960 718 857) could simply be used as (c). It’s a 63-bit prime number. And finally, the previously-read ‘global_seed’ can be used again, to seed ‘rng2′.

After that, the programmer can go about his business, of providing prime-number candidates, always using ‘rng2′, but then testing each candidate in a function, that uses ‘rng1′ to determine the witnesses, within the required ranges, to test the primeness of the generated candidates as many times as needed. This can take place, as usual, until a candidate is sufficiently attested to being prime. (:2)

One fact which has been observed about the Linear Congruent algorithm is, that its lower bits exhibit poor randomness. For this reason, when random numbers are actually generated, only the higher half of the available modulus is used. If the number of bits requested exceeds (m2exp / 2), the function simply invokes itself several times, and concatenates the results.

For such reasons, and the fact that the degree of randomness will reach an eventual limit in spite of best efforts, I wouldn’t recommend a ‘m2exp’ of more than 512 bits. Also, 512 bits will deplete the computer’s entropy pool less, to seed, than say, 1024 bits would.

(Update 10/06/2018, 18h50 : )

About the Mersenne Twister algorithm, I must admit that I do not know for the moment, how exactly it works. But there is an article about it, which was written in the context that the language Python was being used, which gave me some brief overview about the algorithm.

A key statement in this article is:

>>>

The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

<<<

The question which this statement poses is, in the context of pseudo-random numbers, ‘What constitutes a completely deterministic random-number generator? Aren’t they all deterministic by nature?’ And so I would offer another observation to answer that question:

As far back as the 1960s, a method was suggested to generate pseudo-random numbers, which stated to ‘Square the previous random number, and then to take the middle of that square, as the next random number.’ In other words, if 32-bit random numbers were needed, square the last one, right-shift the resulting bits by 16, and put everything back into the modulus of 2^{32}.

The problems with this approach do not include, that the next state of the generator, would be totally dependent on the previous state. *All* the pseudo-random number generators have *that* as a property. The real problem with this ancient suggestion would be twofold:

- It makes the actual state of the generator visible in every random number it outputs,
- It seems to lack another variable. Each output depends on the previous output, but not on arbitrary constants such as (a) or (c) above. Therefore, all the outputs are predictable from just knowing one previous output.

Additionally, there was a weakness in how some, low-quality random number generators used to work back in the 1970s, IIRC, which was that the Linear Congruent method was being used, but instead of only outputting the higher half of the state, the entire 32-bit register was output each time. And so a problem which still existed was:

- It makes the actual state of the generator visible in every random number it outputs.

Not only that, but if a small range of integers was actually required, such as a choice between 4 from [0 .. 3], because the most-standard way in which those could be determined from a 32-bit integer was:

out = X mod 2^{2}

The values of (‘out’) actually generated were impacted, by how low-quality the randomness of the least-significant bits of (X) were. Thus, it might have actually made more sense back then to compute:

out = X // 2^{16} mod 2^{2}

Really, the main reason *I’d* have not to use the Mersenne Twister algorithm, to suggest prime-number candidates would be, that because the version of it which I’m familiar with, only generates 53-bit floats as its basis, which means that if a number 1024 bits long was requested, this algorithm would probably only be trusted to produce a 32-bit integer in one shot, and *would need to be* asked to do so, 32 times, just so that the concatenated result was a 1024 bit suggestion. And if one accepted that, one could just as easily accept a Linear Congruent generator, that had been initialized with (size = 32). Since this posting assumes that (size = 128) *is not good enough*, there is no reason to think that (size = 32) would be good enough.

Further, if the implementation of the Mersenne Twister algorithm was *low-quality enough*, this floating-point number might simply be multiplied by the range *once*, the floor function applied, and any lowest value in the requested range added to the result. (:1) And I guess that in that case, the unsuitability for cryptographic purposes would be obvious enough.

That would mean, that if a random number was going to be picked out of a 1024-bit range, then the resulting ‘stride’, between integers that could possibly be selected, would actually be on the order of ~2^{971}. *Odd as well as even integers should still end up being selected*. But, because I have already concluded that *some* 1024-bit numbers were prime, simply by performing 256 tests on them, I’m not sure that my probabilistic method of doing so would become any stronger, if the entire 1024-bit range became available, as witnesses, for testing. (:3)

(Update 10/08/2018, 0h30 : )

1:)

The way in which such a computation would be carried out using GMP is probably different, from how it might be done using Python. It could be that (a ∈ [2 .. n-1]) can be used as a witness to test whether (n) is prime. The methods available in Python make this painless to achieve. But using GMP, if the range of the integer output is [0 .. n – 1], then a direct porting of the previous method would involve subtracting (2), performing the pseudo-random number generation, and then adding (2) to the result again. In this multi-precision arithmetic, addition or subtraction of even trivial numbers such as (2) to and from 1024-bit numbers, costs significant CPU operations. In addition, these subtractions and additions might need to be performed, for every witness (a) chosen.

Using GMP, it might actually make more sense to apply:

```
a = 0
while (a ≤ 2^32) { choose a ∈ [0 .. n) }
choose b ∈ [0 .. 2^32)
a -= b
```

I suppose that when (a) initially results as less than or equal to (2^{32}), this wastes some time. But if we can assume that (n ~= 2^{1024}), this will only happen infrequently, so that as many witnesses are chosen, this latter approach is more likely *to save* time.

(Update 10/06/2018, 20h40 : )

2:)

There would be an additional observation to make about this approach. The possibility, however slim, could arise, that the value of (a) used to seed ‘rng2′ could be such, that no actual prime numbers can result from that instance of ‘rng2′, even though it is capable of producing candidates. In such a case, because ‘rng2′ is still generating values that were only seeded once, from the system’s ‘real’ source of entropy, an eventual need might arise to query ‘/dev/random’ again, and thus to reseed everything, if the count of failed candidates exceeds some threshold.

And so a useful parameter to know, would be what the average number of attempts should be, to find a prime number of length (n) in bits. It has been stated that for an integer (m), the probability of primeness is ( 1 / ln(m) ). But since the number of bits needed to store an integer (m) is already (log_{2}(m) == n), the answer to that is ( ½n ln(2) ) attempts, where ( ln(t) ) expresses the natural logarithm of (t), assuming that only odd numbers are resulting in candidates, and assuming that the only reason for rejection, was provable non-primeness. This reflects a base-change in the logarithm. Hence, the threshold to set might be ( n ln(2) ) attempts.

When (n = 1024), this would be a threshold of (710).

(Update 10/08/2018, 7h10 : )

3:)

As is often the case, there exists A WiKiPedia Article, About the Mersenne Twister algorithm. This article offers further insight.

While the article describes the algorithm as using a 32-bit or a 64-bit integer, often, such integers can be presented as binary fractions, by actual implementations, which are in turn output as floating-point numbers. Therefore, I see no contradiction really in what this article writes, and what the earlier article wrote about the Python implementation. The earlier article seemed to suggest, that it had as its basis a 53-bit, double-precision floating-point number. But it might just as easily only have been using the 32 MSBs of that floating-point number.

If I was to guess that the version of the Mersenne Twister algorithm used by GMP had a 64-bit basis, instead of a 53- or a 32-bit basis, *If a single iteration* of this algorithm was used to generate an integer in the range of [0 .. 2^{1024}), it would just mean that the stride between possible outputs would become (2^{960} integer values).

I now think that with GMP the programmer can choose whether he or she wants a single iteration of this algorithm, or multiple iterations to be applied, by selecting which of the following two functions to use, when generating actual output:

void mpz_urandomm (mpz_t rop, gmp_randstate_t state, const_mpz_t n)

void mpz_urandomb (mpz_t rop, gmp_randstate_t state, mp_bitcnt_t n)

If the version used in GMP was the 64-bit version, then the number of iterations required to output a bit-count of 1024, would be 16. *The ‘total amount of randomness’ would then be capped at 64 bits, according to the size of the seed*.

(Update 10/08/2018, 8h55 : )

I should add, that If the goal of the exercise was to take an example like this, in which there are exaggeratedly few integer values from the RNG, and to transform those into the range [0 .. n] , *where stress is on the range being inclusive*, there are straightforward ways to achieve that, in arbitrary-precision integer Math. One way would be to take the integer output from the RNG and multiply it by (2n), and then, if that integer had as basis, say, a modulus of (2^{32}), to divide the result by (*2 ^{32} – 1*), using integer division. Next, (1) could be added to the result, and then (1/2) computed, again, using integer arithmetic. This is roughly equivalent to adding (0.5) to the final result.

But then one side-effect of this would be, that if the range was more modest, such as maybe (n = 10), the probability of an actual (0) or an actual (10) occurring, would be (1/2) the probability with which the other numbers in the range occur. *If the stress was on providing “uniform” probabilities*, then the only way I can see to proceed would be, to multiply by (n + 1), to divide by (2^{32}). But in that case, if (n >> 2^{32}), the result of (n) might never occur.

(Update 10/09/2018, 19h55 : )

* A Major Weakness in This Approach*:

The most important weakness I see in this approach is related to the relatively small number of bits used to seed the Mersenne Twister RNG (usually, 32). This would lead to the situation that there can only be ~ 4 billion possible values, as the coefficient (a), of the Linear Congruent RNG which I chose to label ‘rng2′.

What can ‘break’ some RNGs, is a set of parameters that cause convergence of the outputs, towards some subset of values that the hypothetical attacker could predict, *without* getting to see the history of outputs, or the initial seed. And some of the ~ 4 billion coefficients produced as above, may lead to this. An attacker could compute *them all*, and then ‘hope’ that his target has ended up using such a defective one.

But if the programmer is keen on removing this vulnerability as well, there are 512 bits of entropy in (‘global_seed’) !

I do not recommend just XOR’ing the entire ‘global_seed’ into the generated coefficient (a), because, the first operation would be to multiply this seed, as the current (X), with the (a) that was just XOR’ed. The result of that could be even more-predictable, than the result would have been, to multiply by the unmodified version of (a).

But what should be somewhat safe to do, is that the higher half of ‘global_seed’ could be XOR’ed into the higher half of (a). I think that the benefit of that would be, that the result of multiplying both higher halves, will always produce output ‘left-shifted beyond the modulus of the generator state’. But, the product of the lower half of the previous (X) with the higher half of (a) would affect the numbers generated, without causing this coincidence. And the result should be a workable version of (a), that the attacker cannot predict. That version of (a) has an equal probability of being a defective one; it just wouldn’t be a predictable, defective one.

(Update 10/10/2018, 8h35 : )

There is a detail about the Linear Congruent algorithm which I did not previously explain, on the assumption that my reader already knows it. But maybe, because the reader may not, I’ll explain it now.

Assuming that the algorithm has a workable coefficient which I labeled (a), the randomness of its outputs will stem from the randomness of *the initial seed value*, which I labeled ‘global_seed’, not, from the coefficient (a).

One step which improves the likelihood of this functioning with a pseudo-random (a), was just to set the least-significant bit of (a), as I explained directly at the beginning of this posting. The reason for that is, the fact of the LSB being set, will preserve some factor of ‘global_seed’ being represented in (X) for a potentially infinite number of iterations. *Approximately, the Linear Congruent algorithm is computing some power of (a), which will then also act, approximately, as the factor of (X)*.

However, some probability of a non-workable coefficient (a) occurring stays uniform, regardless of what methods are used to generate (a). For example, if the lower half of (a) was just to consist of all-zeroes, except for that one LSB being set, we’d have a non-workable version of (a). The main defense I’ve suggested against that was, ~~to limit the number of times that an attempt is made to generate a prime number, before reseeding~~.

(Update 10/11/2018, 14h20 : )

There is a rational basis not to think, that the concatenated outputs from even a 32-bit Mersenne Twister RNG, would produce a coefficient named (a), that will be defective in some way, as in lacking randomness.

According to the WiKi-article I linked to above, the Mersenne Twister algorithm will not fall into a rut, let’s say of always outputting 32-bit fields of zero, because it did so once. The reason for this is the fact that the state of ‘rng1′ consists of the past 624 outputs – in the 32-bit implementation – and not just, of the last 1 or 2 outputs. In other words, the chance that this algorithm will generate 32 bits of zeroes, is just as high as it generating any other 32-bit value, and just as high, of such a 32-bit value occurring in a truly random way.

So, If we assume that all possible 512-bit coefficients (a) are to be generated, and that there exist 4 billion of them, then the odds that any one 32-bit field would consist entirely of zeroes, will again be, 1 in 4 billion. But, because the 512-bit value that results would be a concatenation of 16 such fields, the chances that any 512-bit output contains one, are 16 out of 4 billion: ~16 possible outputs will contain such a field, and ~8 of them in such a way, that this field belongs to the lower half.

The odds that any possible, 512-bit output will actually contain *two* such fields, should be exponentially smaller. And thus it seems likely that none of the 4 billion possible coefficients named (a) will actually be broken examples.

Therefore, the step should never actually become necessary, which I outlined above, to mitigate the dangers, from a broken ‘rng2′.

(Update 10/12/2018, 6h50 : )

According to what I read, the Mersenne Twister algorithm must not be seeded with the 32-bit value or the 64-bit value zero.

Because the GMP library can set an arbitrary-precision integer, from an unsigned long integer, instead of always from a string, it should come as no additional computational expense, to find the maximum between one such integer, and the value (1). Therefore, a minimum of (1) can be made the seed easily, without OR’ing the arbitrary-precision seed with (1). Therefore, the full range of 4 billion seed-values can be kept, more or less, even over the need not to seed with the value (0).

(Update 10/13/2018, 13h10 : )

Because I can find no public documentation which would answer the question with certainty, I should warn my reader, that the possibility exists, that GMP only implements a 32-bit Mersenne Twister algorithm, even when the library has been compiled to run on a 64-bit CPU. This would happen because To implement a 64-bit Mersenne Twister algorithm, would actually require that the source code be rearranged to some extent. And so the approach which I would recommend to seed it would be:

```
unsigned long int mt_seed_np = 1;
mt_seed_np = mpz_get_ui( global_seed );
if (! (mt_seed_np & 0xFFFFFFFF)) {
mt_seed_np++;
}
gmp_randseed_ui(rng1, mt_seed_np);
```

(Update 5/12/2019, 22h30 : )

Conversations I’ve had about the subject of the last few paragraphs, with certain people, suggest that they find it confusing. *I can hypothesize* that an algorithm exists, which will generate a single integer as output, that includes the integers being used to define the range, both when that range is smaller than the bit-length of the PRNG being used to generate output, as well as when the range has greatly more bits. And it could do so, based on a single application of the PRNG. But doing so would essentially be a question of conditionally executing one out of two subroutines, depending on whether the range specified in the API was a 32-bit range, for example, or a range with more than 32 bits for each integer output.

Further, if the output range has more bits than the PRNG does, it’s no longer necessary or beneficial to add the equivalent of (1/2) to the output as described above. I had only suggested doing so in order to achieve results congruent with what would be expected, when the output range has fewer bits than the PRNG did.

But I suspect that what the official libraries do instead is to accept that the interval output is ultimately a half-open interval defined as [ 0 .. n-1 ) , unless numerous outputs are to be concatenated. Based on that, an API that ~~promises~~ to include (n-1), will just add (1) to (n-1) when defining the range, even though in the case where (n) has far more bits than the PRNG did, (n-1) will still not occur in the output.

Some people were so confused by this subject, that *they thought the problem would only take place if floating-point numbers are being used*. So because the output is not formatted as a floating-point number, *some people felt that this behaviour will just not take place*. The problem with this idea is the fact that a floating-point number is really just a binary fraction, with a variable power of (2), which will therefore have the same limits to its precision that an integer will have, with the same number of bits. This behaviour will take place either way, if the output is expected to have more bits of precision than the PRNG did, and if the output is expected to be derived from a single application of the PRNG.

This idea can be reinforced by doing a little thought-experiment, in which the PRNG has as maximum value (2^{16}-1), which is a series of 16 ones. In one example the strategy works, to apply a multiplier of (2^{16}), when the maximum output integer we want to include is (2^{16}-1). But accordingly, if we use a multiplier of *(2 ^{16}+1)* when the maximum output integer we want to include is

*(2*, then this strategy fails:

^{16})

```
1111111111111111
* 1 0000000000000000
= 1111111111111111 0000000000000000
// 1 0000000000000000
= 1111111111111111
Success!
But,
1111111111111111
* 1 0000000000000001
= 1111111111111111 1111111111111111
// 1 0000000000000000
= 1111111111111111
Fail!
```

Dirk