# What I’ve learned about RSA Encryption and Large Prime Numbers – How To Generate

One of the ways in which I function, is to write down thoughts in this blog, that may seem clear to me at first, but which, once written down, require further thought and refinement.

I’ve written numerous times about Public Key Cryptography, in which the task needs to be solved, to generate 1024-bit prime numbers – or maybe even, larger prime numbers – And I had not paid much attention to the question, of how exactly to do that efficiently. Well only yesterday, I read a posting of another blogger, that inspired me. This blogger explained in common-sense language, that a probabilistic method exists to verify whether a large number is prime, that method being called “The Miller-Rabin Test”. And the blogger in question was named Antoine Prudhomme.

This blogger left out an important part in his exercise, in which he suggested some working Python code, but that would be needed if actual production grade-code was to generate large prime numbers for practical cryptography. He left out the eventual need, to perform more than just one type of test, because this blogger’s main goal was to explain the one method of testing, that was his posting subject.

I decided to modify his code, and to add a simple Fermat Test, simply because (in general,) to have two different probabilistic tests, reduces the chances of false success-stories, even further than Miller-Rabin would reduce those chances by itself. But Mr. Prudhomme already mentioned that the Fermat Test exists, which is much simpler than the Miller-Rabin Test. And, I added the step of just using a Seive, with the known prime numbers up to 65535, which is known not to be prime itself. The combined effect of added tests, which my code performs prior to applying Miller-Rabin, will also speed the execution of code, because I am applying the fastest tests first, to reduce the total number of times that the slower test needs to be applied, in case the candidate-number could in fact be prime, as not having been eliminated by the earlier, simpler tests. Further, I tested my code thoroughly last night, to make sure I’ve uploaded code that works.

Here is my initial, academic code:

http://dirkmittler.homeip.net/text/Generate_Prime_3.py

(Corrected 10/03/2018, 23h20 … )

(Updated 10/08/2018, 9h25 … )

(As of 09/28/2018, 17h05 : )

The original algorithm seemed to waste most of its time – about 5 seconds or so – just computing the array of the first 6542 known primes, less than 65535. For a production-grade program, this time can be eliminated, by making that array a static constant, and compiling and linking it in to the object-code statically. This would be especially feasible, because each of the elements is a 16-bit integer.

(Update 09/29/2018, 19h40 : )

I have changed the way in which the functions ‘smallprimes()’ and ‘sift_primes()’ work, to make them more consistent with the way Python stores lists, and therefore also to make them faster. The script now requires only about 2.4 seconds to compute its initial list of primes, on my machine. I have replaced the code linked to above, with the updated code.

However, the complete program still seems to require about 5 seconds to compute a 1024-bit prime, on my machine. The fact that I’ve decided to test each candidate up to 256 times may contribute to that.

Additionally, I’ve examined the Miller-Rabin Test more closely, to understand it, and have come to an additional conclusion about it: It also eliminates Fermat-Test failures. I.e., these two tests do overlap, in that no candidate should exist, which the Miller-Rabin Test confirms as probable primes, but which the Fermat Test will fail. Therefore, the only real gain I made in performing the Fermat Test first, was to reduce the number of times the Miller-Rabin Test needs to be performed, thereby improving speed. And this improvement in speed also presents a justification to perform 256 tests instead of 128, the latter part of which might reduce the probability of a so-called ‘Carmichael Number’, a Miller-Rabin Test Liar…

Understanding the Miller-Rabin Test:

The reader could eventually hap into the question of why at all, the Miller-Rabin Test, as it’s presented, works. And there is a key observation about the test, which will help explain it:

Given a tentative prime (n), it would follow that the totient is (n-1), which the Miller-Rabin Test factorizes into an odd factor (r), and a power (s) of the factor (2). The key to understanding the code is to observe, that (j) is initialized to (1), but that the squaring-loop will continue, as long as (j < s). This means that the inner loop gets executed (s – 1) times !

Well, if it was true that the only trivial square root was (1), then one absurd conclusion would be, that a number (a^r) which has yet to be squared (s) times, must have started out as being (1), so that after having been squared (s) times, the result will still be (1). But, because the two possible, trivial square-roots are in fact {-1,1} , it can happen that after being squared only (s-1) times, the result is (-1). After the result is then squared one more time, the final result (which Miller-Rabin does not bother to compute) will be (1).

What seems to follow, is that (-1) will have non-trivial square roots eventually (at least in modulus arithmetic; nobody is suggesting that imaginary numbers are being used here), so that an initial value of (a^r) may start out as non-trivial, but which when squared (s) times at the latest, will result in (1). In fact, the result of (-1) may happen, before (a^r) has even been squared (s-1) times, just because (a^r) may already start out as being a square, in the modulus of (n). And so the test needs to exit in such cases, without rejecting the presumed totient of (n-1), for those examples of (a).

Just to name an example, we could be trying to test whether our famous case of (2^16 + 1) is prime, which can also be written (65537). Its totient is (2^16), or, (65536). (r=1, s=16). Miller-Rabin will try to square (a) 15 times, not 16, and after doing so 15 times at the latest, the result will be (-1). This implies that after the 16th squaring, the result will be (1). But the intermediate result was not (1), at any time before it was (-1). And so the test succeeds.

One vulnerability of the Miller-Rabin Test:

According to what I just wrote, there follows a vulnerability in the Miller-Rabin Test, in its original form. For some value of (a), it can happen that (a^r mod n == 1), that is, without the test having squared (a^r) even once. In such cases, the test will fail to prove that (n) is not prime. Unfortunately, the Fermat Test will fail to do so as well, since it will follow that (a^(r(2^s)) mod n == 1). This does not mean that (n) is not prime, for which reason the test ‘succeeds’, thus allowing additional tests to determine whether (n) is prime or not.

One reason this can happen is the possibility that (a) was square, which, if (r) was hypothetically as small as (5), would already mean that (a^r) will be square five times over. Thus, if we were to imagine (some number) being squared 5 times, we could imagine that at some point that does not exist literally, (-1) could have been reached, and then (1), followed eventually by (a^r). The problem is, we don’t know whether (-1) was ever a part of this virtual sequence.

The conventional solution to this problem would have been, to perform a larger number of tests (k), so that eventually, some value of (a) will arise, for which (a^r != 1).

I have now modified the test, so that If (a^r mod n == 1), For more than a certain number of cases tested, that is, (k/2) times, the assumption will be that this is also true for all values of (a), and the candidate of (n) which did this will be failed. And I have replaced the code linked to above, with the result of this modification. Even if the candidate was in fact prime, this fact needs to be witnessed by a certain number of values for (a), that produce meaningful results. And then, to replace (n) also causes more tests to be performed.

(Update 09/30/2018, 12h10 : )

I suppose that there’s another hypothetical way in which the Miller-Rabin Test could fail. After being squared zero or more times, (a^r) could become (-1) for some invalid reason. That is, a non-trivial value of (a) raised to some exponent, could yield (-1), even though (n) is not prime. It was never a situation which Miller-Rabin would have been able to detect, and for that, I also see no reason to raise any sort of exception, if this already happens with (a^r). If it happened, say, with (a^(2r)), after (a^r) was explicitly squared once, then the test would not have raised any exceptions either. The only condition for which the test can reject the candidate (n), is a value for (a), such that (1) is reached before (-1) was reached.

I believe that my only defense against this problem, is actually to carry out a sufficient number of tests, again, using random values for (a).

In fact, If a candidate (n) was ever tested, which is prime, but the totient of which, that must be even, only has a single power of two, such that (s = 1), then, almost every time the Miller-Rabin Test was performed, (a^r mod n) should in fact be (n – 1), which I’ve been assuming is the modulus equivalent of (-1). In that case, this test is only as strong as the simple Fermat Test. And for that reason the fact represents a reassurance, that the Miller-Rabin Test can also be performed using a single exponentiation, thereby making it only as expensive computationally, as the Fermat Test would be.

Now, If one was really stubborn, one could reject all such cases, even though sometimes, (n) was really prime, just because the available test for primeness would not be stronger than the Fermat Test.

(Observation added 10/01/2018, 18h40 : )

If I may quote Antoine Prudhomme:

>>>

There are some composite numbers that satisfies the Fermat’s little theorem for all possible values of a. As you guessed, these numbers are called … Carmichael numbers (so much suspense…).

The first 3 are 561, 1105 and 1729.

There are only 255 Carmichael numbers < 10⁸, and 20138200 < 10²¹ !

<<<

The three examples which Mr. Prudhomme writes literally, are all examples in which (n – 1) is divisible by (4). There is an easy trick to testing this visually, which is, after one has subtracted (1) from each number, the last two digits in Base-10 must be divisible by (4). Hence:

• 60 / 4 == 15
• 04 / 4 == 1
• 28 / 4 == 7

And so luck would seem to have it in each case, that for the Miller-Rabin Test, (s >= 2). So the fact that this test can ‘catch those liars’, while the simple Fermat Test cannot, remains consistent with what I wrote above.

Furthermore, the fact that Antoine Prudhomme gives exact answers, for integers up to 1021, suggests that prime numbers up to that size are known with certainty, for example, through the use of a seive. In itself, this is not a spectacular achievement by today’s standards, that assume powerful computers can be used to factorize, say, 1021. But this number being an upper bound to what Mr. Prudhomme wrote, still has significance today.

A fact which my reader may not be aware of is that in Mathematics, the fact that certain patterns seem to hold, up to the value 1021, is not enough to prove that such patterns remain true for much-larger numbers. So according to Mr. Prudhomme, the (unmodified) Miller-Rabin Test may be sufficient, to find all the ‘liars’ up to that value. And according to me, all the liars that can be found this way are probably of the form that:

(n – 1 mod 4 == 0)

Well, there exists a Historical Example, in “Euler’s Conjecture”, which states that for (k=4), there are no cases, where:

x1k + x2k + x3k == x4k

In Euler’s day, this might have been an appropriate assumption. But according to modern knowledge, the first counter-example is:

26824404 + 153656394 + 187967604 = 206156734

Each of these terms is larger than 1021, and the counter-example requires the use of computers to find. Well, in cryptography, the goal is to generate prime-numbers (n ~= 21024), which is approximately equivalent to (10300), hence, equivalent to ‘a Googol Cubed’. There could, for all I know, be counter-examples, to the unmodified Miller-Rabin Test always discovering a Fermat Test Liar, as well as to the Fermat Test Liar always being of the form I wrote above

Actually, I also know that the first counter-examples to Euler’s Conjecture, ever found, were to a form of it, in which the exponent was (5), not (4). But the example I gave above, is the smallest-known case, when the exponent is (4).

(Update 10/02/2018, 8h00 : )

The probability of an integer being square, modulo n:

Through simple experimentation I seem to have discovered, that the probability with which:

a^r mod n == 1

On the assumption that r, s and n are as defined in the preceding paragraphs of my posting,

Is equal to (1/2) if (s = 1), is equal to (1/4) if (s = 2), is equal to (1/8) if (s = 3), etc..

I take this situation to be equivalent, to (a) being square modulo (n), and additionally, not being (0) or (1), which my program will not suggest as witnesses.

This observation has helped me better determine the threshold, at which my program should reject (n), over this condition.

(Update 10/02/2018, 15h05 : )

In order to test the last premise I wrote above further, I have now added two assertions to my Python script, and have been able to run the script again, 10 times in a row, without causing any error messages. However, because (a) is being chosen at random out of [2, n-1] a limited number of times, the frequency with which the statement is true, that:

a^r mod n == 1

Is also sensitive to some random fluctuations. Yet, because the finding which I just wrote above directly contradicts This finding, from elsewhere on the Web, I felt I needed to test my own finding anyway, by making such non-deterministic assertions, which is normally a bad practice in programming. These assertions will at least be repeatable 90% of the time…

(Erratum 10/03/2018, 14h55 : )

It seems that I have fallen into the trap of mistaking:

(p -> q)

For:

(q -> p)

On the assumption that r, s and n are as defined in the preceding paragraphs of my posting,

The condition that:

a^r mod n == 1

Does Imply that (a) is square. But the reverse is not true. This is why the condition which I was interested in, can occur less-frequently, than (a) being square.

Furthermore, one of the assertions in my script, which I was testing my condition with, was:


assert is_prime(113, 256)



Which picks 256 random values of (a) in the appropriate range as witnesses. While it’s true that 112 is divisible by 4, it’s also true that this same number is divisible by 16. Hence, by asserting the above, my code has verified dozens of times, that the probability of my condition was on the order of (1/16), not even (1/4).

(Update 10/03/2018, 23h20 : )

I can elaborate on this idea of squareness:

To recap, (n) is a presumed prime number, and a1 and a2 can be integers modulo (n). But, let it be that:

a2 = a12 mod n

The following situation defines (r) and (s):

(n – 1) = r(2s)

Where (r) is odd.

Strangely, the following two statements follow from Fermat’s Little Theorem:

(a1r)(2s) mod n == 1

(a2r)(2s) mod n == 1

The second statement above is implied by the first, and by the fact that:

12 mod n == 1

If (s = 1), it will follow that:

a2r mod n == 1

But, there could be yet another witness (a3), such that:

a3 = a22 mod n

If (s = 2), it will follow that:

a3r mod n == 1

I would induce that the probability of:

ar mod n == 1

Is approximately equal to

ps

Where (p) is the probability that:

a2 == a12 mod n

In other words, if (s = 2) and (a) is randomly chosen, this condition will be true, when (a) is square, and when the root of (a) is also square

Hence,

p = ½

(Update 10/08/2018, 9h25 : )

I have updated the script, and one of the main features which I have now implemented, is a cryptographically-strong random number generator, that uses the system source of randomness, which under Linux, turns out to be the device-file ‘/dev/urandom’, instead of always using the ‘Mersennes Twister’. This is the updated script:

http://dirkmittler.homeip.net/text/Generate_Prime_11.py

I think that another way in which this script differs from the earlier example, is in the possibility that this one may need to be run using Python 3.

Also, while the previous version of this script would run fine on a 64-bit machine, it would have produced sub-optimal results on a 32-bit machine, and the reason for that is the fact that (sys.maxsize) is usually (231 – 1), on a 32-bit machine, and not (232 – 1). This version should run fine on 32-bit machines.

Enjoy,

Dirk