## What I’ve learned about RSA Encryption and Large Prime Numbers – How To Generate

One of the ways in which I function, is to write down thoughts in this blog, that may seem clear to me at first, but which, once written down, require further thought and refinement.

I’ve written numerous times about Public Key Cryptography, in which the task needs to be solved, to generate 1024-bit prime numbers – or maybe even, larger prime numbers – And I had not paid much attention to the question, of how exactly to do that efficiently. Well only yesterday, I read a posting of another blogger, that inspired me. This blogger explained in common-sense language, that a probabilistic method exists to verify whether a large number is prime, that method being called “The Miller-Rabin Test”. And the blogger in question was named Antoine Prudhomme.

This blogger left out an important part in his exercise, in which he suggested some working Python code, but that would be needed if actual production grade-code was to generate large prime numbers for practical cryptography. He left out the eventual need, to perform more than just one type of test, because this blogger’s main goal was to explain the one method of testing, that was his posting subject.

I decided to modify his code, and to add a simple Fermat Test, simply because (in general,) to have two different probabilistic tests, reduces the chances of false success-stories, even further than Miller-Rabin would reduce those chances by itself. But Mr. Prudhomme already mentioned that the Fermat Test exists, which is much simpler than the Miller-Rabin Test. And, I added the step of just using a Seive, with the known prime numbers up to 65535, which is known not to be prime itself. The combined effect of added tests, which my code performs prior to applying Miller-Rabin, will also speed the execution of code, because I am applying the fastest tests first, to reduce the total number of times that the slower test needs to be applied, in case the candidate-number could in fact be prime, as not having been eliminated by the earlier, simpler tests. Further, I tested my code thoroughly last night, to make sure I’ve uploaded code that works.

Here is my initial, academic code:

http://dirkmittler.homeip.net/text/Generate_Prime_3.py

(Corrected 10/03/2018, 23h20 … )

(Updated 10/08/2018, 9h25 … )

## Some Suggested Code

In This Earlier Posting, I had written at first some observations about Bluetooth-pairing, but then branched out on the subject, of whether a Diffie-Hellman Key Exchange could be easier to compute, if it was somehow simplified into using a 32-bit modulus. Obviously, my assumption was that a 64-bit by 32-bit divide instruction would be cheap on the CPU, while arbitrary-precision integer operations are relatively expensive, and actually cause some observable lag on CPUs which I’ve used.

And so, because I don’t only want to present theory in a form that some people may not be able to visualize, what I did next was to write a C++ program, that actually only uses C, that assumes the user only has a 32-bit CPU, and yet that performs a 64-bit by 64-bit division.

This has now been tested and verified.

One problem in writing this code is the fact that, depending on whether the divisor, which is formatted as a 64-bit field, contains an actual 64-bit, 32-bit, 24-bit, or 16-bit value, a different procedure needs to be selected, and even this fixed-precision format cannot assume that the bits are always positioned in the correct place.

I invite people to look at this sample-code:

http://dirkmittler.homeip.net/text/divide64.cpp

(Update 06/10/2018, 23h30 : )

I needed to correct mistakes which I made in the same piece of code. However, I presently know the code to be correct.

Just to test my premises, I’m going to assume that the following division is to be carried out, erroneously as a simple division, but assuming a word-size of 32 bits:

## Hash Tables

There is already a lot of work published, that details numerous established ways to implement hash tables. The WiKiPedia article on the broad class of hash-tables, is already a good starting point to become familiar with them.

But just to make this interesting, I undertook the mental exercise of designing a hash table, which when grown, does not require that the contents of the original hash table be rearranged in any way, but in which the expansion is achieved, by simply appending zeroes to the original hash table, and then changing one of the parameters, with which the new, bigger hash table is accessed.

My exercise led me to the premise, that each size of my hash table be a power of two, and it seems reasonable to start with 2^16 entries, aka 65536 entries, giving the hash table an initial Power Level (L1) of 16. There would be a pointer I name P to a hash-table position, and two basic Operations to arrive at P:

1. The key can be hashed, by multiplying by the highest prime number smaller than 2^L1 – This would be the prime number 65521 – within the modulus of 2^L1, and yielding Position P0.
2. The current Position can have added the number A, which is intentionally some power of two lower than L1, within the modulus of 2^L, successively yielding P1, P2, P3, etc…

Because the original multiplier is a prime number lower than 2^L1, and the latter, the initial modulus of the hash table, a power of two, consecutive key-values will only lead to a repetition in P0 after 2^L key-values. But, because actual keys are essentially random, a repetition of P0 is possible by chance, and the resolution of the resulting hash-table collision is the main subject in the design of any hash table.

Default-Size Operation

By default, each position of the hash table contains a memory address, which is either NULL, meaning that the position is empty, or which points to an openly-addressable data-structure, from which the exact key can be retrieved again. When this retrieved exact key does not match the exact key used to perform the lookup, then Operation (2) above needs to be performed on P, and the lookup-attempt repeated.

But, because A is a power of 2 which also fits inside the modulus 2^L, we know that the values of P will repeat themselves exactly, and how small A is, will determine how many lookup-attempts can be undertaken, before P repeats, and this will also determine how many entries can be found at any one bucket. Effectively, if A = 2^14, then 2^L / A = 2^2, so that a series of 4 positions would form the maximum bucket-size. If none of those entries is NULL, the bucket is full, and if an attempt is made to insert a new entry to a full bucket, the hash-table must be expanded.

Finding the value for a key will predictably require, that all the positions following from one value of P0 be read, even if some of them were NULL, until an iteration of P reveals the original key.

It is a trivial fact that eventually, some of the positions in this series will have been written to by other buckets, because keys will be random again, thus leading to (their own values of P) which coincide with (current nA + P0) . But, because the exact key will be retrieved from non-NULL addresses before those are taken to have arisen from the key being searched for, those positions will merely reflect a performance-loss, not an accuracy-loss.

Because it is only legal, for 1 key to lead to a maximum of 1 value, as is the case with all hash tables, before an existing key can be set to a new value, the old key must be found and be deleted. In order for this type of hash table to enforce that policy, its own operation to insert a key would need to be preceded by an operation to retrieve it, which can either return a Boolean True value if the key exists, plus the position at which its address is located, or a Boolean False value if it does not exist, plus the position at which the first NULL address is located, at which a following operation could insert one, in a single step. I would be asking two operations to take place atomically, because this can pose further issues with multi-threaded access to the hash table.

(Edit 06/22/2017 : The premise here, in the case of a multi-threaded application, is that it should be possible to lock a global object, before the retrieval-function is called, as long as it is not already locked, but that if a thread gives the instruction to lock it when it already is, then that thread is put to sleep, until the other thread, which originally locked the object, unlocks it again.

I believe that Such an object is also known as a ‘Mutex’.

In my thought-exercise, this Mutex-object not being locked, should also be a signal to the retrieval-function, that the position returned, in case the key is not found, does not need to be of significance, specifically for the situation that the bucket could be full, in which case there should be no valid pointer to a NULL-address. )

Any defined operation to delete a key would follow according to the same logic – It must be found first, and if not found, cause the appropriate error-code to be returned. Any hash table with more than one entry for the same key is corrupted, but can conceivably be cleaned, if the program that uses it sends multiple commands to delete the same key, or one command to purge it…

Growing the Hash Table

(Last Updated 06/28/2017 … )

## Finding the Multiplicative Inverse within a Modulus

The general concept in RSA Cryptography is, there exists a product of two prime numbers, call them (p) and (q), such that

(T ^ E) ^ D mod (p)(q) = T

where,

T is an original plaintext document,

E is an encryption key,

D is the corresponding decryption key,

E and D are successively-applied exponents, of T,

(p)(q) is the original modulus.

A famous Mathematician named Euler found, that in order for exponent operations on T to be consistent in the modulus (p)(q), the exponents’ product itself must be consistent in the modulus (p-1)(q-1). This latter value is also known as Euler’s Totient Product of (p)(q).

There is a little trick to this form of encryption, which I do not see explained often, but which is important. Since E is also the public key, a relatively small prime number is used in practice, that being (2^16 + 1), or 65537. D could be a 2048 or a 4096-bit exponent. The public key consists of the exponent E and the modulus (p)(q) packaged together.

Thus, when the key-pair is created, (p) and (q) must be known separately, so that D can be computed, and then (p-1)(q-1), which hopefully never touched the hard-drive, can be discarded, after D is saved, along with (p)(q) again, this time forming the private key.

Therefore, D must be the multiplicative inverse of E, in the modulus of (p-1)(q-1). But how does one compute that?

(Corrected 09/27/2018, 15h15 … )