## What I’ve learned about RSA Encryption and Large Prime Numbers – How To Generate

One of the ways in which I function, is to write down thoughts in this blog, that may seem clear to me at first, but which, once written down, require further thought and refinement.

I’ve written numerous times about Public Key Cryptography, in which the task needs to be solved, to generate 1024-bit prime numbers – or maybe even, larger prime numbers – And I had not paid much attention to the question, of how exactly to do that efficiently. Well only yesterday, I read a posting of another blogger, that inspired me. This blogger explained in common-sense language, that a probabilistic method exists to verify whether a large number is prime, that method being called “The Miller-Rabin Test”. And the blogger in question was named Antoine Prudhomme.

This blogger left out an important part in his exercise, in which he suggested some working Python code, but that would be needed if actual production grade-code was to generate large prime numbers for practical cryptography. He left out the eventual need, to perform more than just one type of test, because this blogger’s main goal was to explain the one method of testing, that was his posting subject.

I decided to modify his code, and to add a simple Fermat Test, simply because (in general,) to have two different probabilistic tests, reduces the chances of false success-stories, even further than Miller-Rabin would reduce those chances by itself. But Mr. Prudhomme already mentioned that the Fermat Test exists, which is much simpler than the Miller-Rabin Test. And, I added the step of just using a Seive, with the known prime numbers up to 65535, which is known not to be prime itself. The combined effect of added tests, which my code performs prior to applying Miller-Rabin, will also speed the execution of code, because I am applying the fastest tests first, to reduce the total number of times that the slower test needs to be applied, in case the candidate-number could in fact be prime, as not having been eliminated by the earlier, simpler tests. Further, I tested my code thoroughly last night, to make sure I’ve uploaded code that works.

Here is my initial, academic code:

http://dirkmittler.homeip.net/text/Generate_Prime_3.py

(Corrected 10/03/2018, 23h20 … )

(Updated 10/08/2018, 9h25 … )

## A Hypothetical form of Broadcast Encryption

I have pursued the mental exercise, of supposing that a group of (n) people might exist, who are to receive a broadcast, encrypted message, but in such a way that two of those recipients’ credentials are required, to decrypt that message. The assumed author of the message is a secure party, entrusted to keep all the encryption details of the system.

The basis of my exercise is, that RSA encryption and hybrid encryption may be used, with the twist, that as long as the modulus is the same for two transactions, a symmetrical key can be encrypted twice, in order to be decrypted twice. A formalized way to write this could be:

C = (T ^ E1) ^ E2 mod (p)(q)

T = (C ^ D2) ^ D1 mod (p)(q)

Where (p) and (q) are random 1024-bit prime numbers, (T) stands for the symmetrical encryption key, and (C) stands for the encrypted form of that key. Clearly, (p) and (q) would be filtered by the central party, such that neither (p-1) nor (q-1) are divisible by either (E1) or (E2), which are, 65537 and 32771 respectively.

My concept continues, that the central party associates a single prime number with each distributed recipient for the long term, and that the recipient is not allowed to know their own prime number. For any pair of recipients, a modulus (p)(q) follows, which the recipients store, for each other recipient, that the current recipient may eventually want to combine his key with.

(Corrected 09/22/2018, 18h00 … )

(As of 09/21/2018, 23h40 : )

## Finding the Multiplicative Inverse within a Modulus

The general concept in RSA Cryptography is, there exists a product of two prime numbers, call them (p) and (q), such that

(T ^ E) ^ D mod (p)(q) = T

where,

T is an original plaintext document,

E is an encryption key,

D is the corresponding decryption key,

E and D are successively-applied exponents, of T,

(p)(q) is the original modulus.

A famous Mathematician named Euler found, that in order for exponent operations on T to be consistent in the modulus (p)(q), the exponents’ product itself must be consistent in the modulus (p-1)(q-1). This latter value is also known as Euler’s Totient Product of (p)(q).

There is a little trick to this form of encryption, which I do not see explained often, but which is important. Since E is also the public key, a relatively small prime number is used in practice, that being (2^16 + 1), or 65537. D could be a 2048 or a 4096-bit exponent. The public key consists of the exponent E and the modulus (p)(q) packaged together.

Thus, when the key-pair is created, (p) and (q) must be known separately, so that D can be computed, and then (p-1)(q-1), which hopefully never touched the hard-drive, can be discarded, after D is saved, along with (p)(q) again, this time forming the private key.

Therefore, D must be the multiplicative inverse of E, in the modulus of (p-1)(q-1). But how does one compute that?

(Corrected 09/27/2018, 15h15 … )

## The Original RSA Trapdoor Function

I once ran into laypeople, who were able to understand what a modulus was – so that the output of a series of computations would never equal or exceed that modulus – and who were able to understand what the exponent function is, but who were incredulous, when I told them that it was possible to compute the result of raising a 2048-bit number, to a 2048-bit exponent, on the basis that the result only needs to fit inside a 2048-bit modulus.

I believe that the most common way in which this is done, is based on the assumption that two 2048-bit numbers can be multiplied, and the result brought back down to a 2048-bit modulus. This notion can be extended, to mean that a 2048-bit number can also be squared, and the result written in the 2048-bit modulus…

Well to achieve the exponent-function, one needs a base-register, an accumulator-register, and the exponent. The accumulator is initialized to the value (1).

If the exponent has 2048 bits, then the operation can be repeated 2048 times:

1. Square the value in the accumulator.
2. Left-Shift the Most-Significant Bit of the Exponent out, into a bit-register that can be examined.
3. If that bit-register is equal to (1) and not (0), multiply the base-register into the accumulator an extra time.

Because the Most-Significant Bit of the Exponent was shifted out first, its being (1) would mean that the value in the Accumulator was multiplied by the Base earlier, so that this exponent of the Base will also be squared, as a factor of the Accumulator, by the highest number of iterations, thus effectively raising the Base to the power of 1, 2, 4, 8, 16, 32, … 2^2047 , in combinations.

(Update 09/10/2018, 7h45 : )

I am well aware of the fact, that when Step 1 above is executed for the first time, it has no effect. In fact, depending on how many most-significant zeroes the exponent has, this could even repeat itself. One way in which this ‘waste’ of CPU cycles can be removed, is by changing Step 1 above to read:

1. If the value in the accumulator is Greater Than 1, square it.

But the problems with such a supposed ‘optimization’ when performed with a 2048-bit exponent would be, that the value in the accumulator needs to be compared with (1), 2048 times, and as long as the accumulator also has 2048 bits, depending on how it’s being stored, that each comparison needs to go through the entire accumulator. If we could assume that only a few most-significant bits in the exponent are in fact zeroes, such an optimization might actually slow the loop down.

(Updated 09/15/2018, 12h50 : )

IF the reader needs to implement this type of exponentiation based on an arbitrary-precision integer library, and on the assumption that the exponent may have an arbitrary length – i.e. be 17 bits long as easily as 2048 bits long, then the first variable to take into account would be, whether the arbitrary-precision integer library is applying the approach, to make the least-significant bits the first word linked to, such that a linked list would lead to the progressively more-significant bits indirectly. This could be seen as kind of an equivalent, to Little-Endian representation, except that it would exist with linked lists, instead of with contiguous bytes, that would form a fixed-length field.

If this is the case, the organization of an integer as a linked list is not optimal, for exponentiation. In such a case, primitive operations on the linked list would be helpful, that break it down into a least-significant word first, followed by the more-significant words. This operation would be Mathematically equivalent, to dividing by ( 2^(Word-Size) ), and additionally forming the remainder. However, to make this operation speedy, it is best implemented as a C or a C++ subroutine, even if the arbitrary-precision integer library exposes purely Mathematical operations, as an API.

If instead the application programmer is only able to access purely Mathematical operations from such a library, then another approach would be, first to count how many bits long the exponent is, let’s say by halving it repeatedly until we get zero. And while doing so, it’s also possible to construct a number, the bits of which are simply the bits of the exponent reversed.

If the library exposes functions that determine the quotient and remainder, of dividing by ( 2^(Word-Size) ), then the application programmer can do so first, and devise a subroutine, which only elects to exponentiate a base, with ( E < 2^(Word-Size) ). In that case, exponentiation by an arbitrary-length exponent, can be composited out of numerous operations to perform this smaller exponentiation ‘by chunks’. And again, the exponentiation would start with the most-significant chunk.

If we can’t do this, then an arbitrary-precision integer library can also test the least-significant bit repeatedly, as it halves the resulting, derived integer. But there would be a considerable performance penalty in doing that, if the API operation to divide a 2048-bit number by 2, has not also been optimized, to arrive solely at a remainder, in less than Asymptotic Time.

Dirk