## Generating a Germain Prime efficiently, using Python and an 8-core CPU.

I have spent extensive time, as well as previous blog postings, exploring the subject of how to generate a Germain prime (number), in the Python (3.5) programming language, but taking advantage of parallel computing to some slight degree.

Before I go any further with this subject, I’d like to point out that generally, for production-ready applications, Python is not the best language to use. The main reason is the fact that Python is an interpreted language, even though many modern interpreted languages are compiled into bytecode before being interpreted. This makes a Python script slower by nature, than very well-written C or even C++. But what I aim to do is to use Lego-Blocks to explore this exercise, yet, to use tools which would be used professionally.

The main way I am generating prime numbers is, to start with a pseudo-random, 512-bit number (just as a default, the user can specify different bit-lengths), and then to attempt to divide this number by a list of known, definite primes, that by now only extend to 4096 (exclusively, of course), in an attempt to disprove that number prime. In about 1/8 of all cases, the number survives this test, after which I undertake a more robust, Miller-Rabin approach to try disproving it prime 192 times, probabilistically. If the number has survived these tests, my program assumes it to be prime.

Even though I’ve never been told this, the probability of a non-prime Candidate surviving even a single Miller-Rabin Test, or Witness, is very small, smaller than 1/8. This could be due to the fact that the Witness in question is raised to a high, odd exponent, in its 512-bit modulus etc., after which it would be squared some number of times. Because the first Candidate in the modulus of 4 is 3, that one actually gets squared a subsequent total of zero times. And after each exponentiation, the result could be any number in the modulus, with the possible exception of zero. It needs to become either (1) or (n-1) in the modulus of (n), for the Candidate to survive the test. (:1)

Further, there is no need for the numbers that get used as witnesses, which are pseudo-random, to be of the same, high, cryptographic quality of pseudo-random, as the candidate is, which is being tested.

But there is a sub-category of prime numbers which have recently been of greater interest to me, which is known as the Germain prime numbers, such that the Totient of that main candidate, divided by two, should also be prime. And so, if the density of prime numbers roughly equal to (n) is (1 / log(n)), and if we can assume a uniform distribution of random numbers, then the probability of finding a Germain prime is roughly (1 / (log (n))2), assuming that our code was inefficient enough actually to test all numbers. The efficiency can be boosted by making sure that the random number modulo 4 equals 3.

But one early difficulty I had with this project was, that if I needed to start with a new pseudo-random number for each candidate, on my Linux computers, I’d actually break ‘/dev/urandom’ ! Therefore, the slightly better approach which I needed to take next was, to make my initial random number the pseudo-random one, and then just to keep incrementing it by 4, until the code plodded into a Germain prime.

Even when all this is incorporated into the solution I find that with Python, I need the added advantage of parallel computing. Thus, I next learned about the GIL – The Global Interpreter Lock – and the resulting pitfalls of multi-threaded Python, which is not concurrent. Multi-threading under Python tells the O/S to allocate CPU cores as usual, but then only allows 1 core to be running at any one time! But, even under those constraints, I found that I was benefiting from the fact that my early code was able to test at least 2 candidates for primality simultaneously, those being the actual candidate, as well as its Totient divided by 2. And as soon as either candidate was disproved prime, testing on the other could be stopped quickly. This reduced the necessary number of computations dramatically, to make up for the slowness of multi-threaded Python, and I felt that I was on the right track.

The only hurdle which remained was, how to convert my code into multi-processing code, no longer merely multi-threaded, while keeping the ability for two processes, then, to send each other a shutdown-command, as soon as the present process disproved its number to be prime.

(Updated 6/01/2019, 17h35 … )

## Understanding ‘Meltdown’ and ‘Spectre’ in Layman’s Terms

One of the pieces of news which many people have heard recently, but which few people fully understand, is that in Intel chip-sets in particular, but also to a lesser degree in AMD-chip-sets, and even with some ARM (Android) chip-sets, a vulnerability has been discovered by researchers, which comes in two flavors: ‘Meltdown’ and ‘Spectre’. What do these vulnerabilities do?

Well, modern CPUs have a feature which enables them to execute multiple CPU-instructions concurrently. I learned about how this works, when I was taking a System Hardware course some time ago. What happens is meant to make up for the fact that to execute one CISC-Chip instruction, typically takes up considerably more than 1 clock-cycle. So what a CISC-Chip CPU does, is to start execution on instruction 1, but during the very next clock-cycle, to fetch the opcode belonging to instruction 2 already. Instruction 1 is at that point in the 2nd clock-cycle of its own execution. And one clock-cycle later, Opcode 3 gets fetched by the CPU, while instruction 2 is in the 2nd clock-cycle, and instruction 1 is in the 3rd clock-cycle – if their is any – of each of their execution.

This pushes the CISC-Chip CPUs closer to the ideal goal of executing 1 instruction per clock-cycle, even though that ideal is never fully reached. But, because CPU-instructions contain branches, where a condition is tested first, and where, in a roundabout way, if the non-default outcome of this test happens to be true, the program ‘branches off’, to another part within the same program, according to the true logic of the CPU-instructions. The behavior of the CPU under those conditions has also been made more-concurrent, than a first-glance appraisal of the logic might permit.

When a modern CISC-Chip CPU reaches a branching instruction in the program, it will continue to fetch opcodes, and to execute the instructions which immediately follow the conditional test, according to the default assumption of what the outcome of the conditional test is likely to be. But if the test brings about a non-default logical result, which will cause the program to continue in some completely different part within its code, the work which has already been done on the partially-executed instructions is to be discarded, in a way that is not supposed to affect the logical outcome, because program flow will continue at the new address within its code. At that moment, the execution of code no longer benefits from concurrency.

This concurrent execution, of the instructions that immediately follow a conditional test, is called “Speculative Execution”.

The problem is, that Complex Instruction-Set CPUs, are in fact extremely complex in their logic, as well as the fact that their logic has been burned as such, into the transistors – into the hardware – of the CPU itself, and even the highly-skilled Engineers who design CPUs, are not perfect. So we’ve been astounded by how reliably and faithfully actual, physical CPUs execute their intricate logic, apparently without error. But now, for the first time in a long time, an error has been discovered, and it seems to take place across a wide span of CPU-types.

This error has to do with the fact that modern CPUs are multi-featured, and that in addition to having concurrent execution, they also possess Protected Memory, as well as Virtual Memory. Apparently, cleverly-crafted code can exploit Speculative Execution, together with how Virtual Memory works, in order in fact to bypass the feature which is known as Protected Memory.

It is not the assumption of modern computers, that even when a program has been made to run on your computer – or your smart-phone – It would just be allowed ‘to do anything’. Instead, Protected Memory is a feature that blocks user-space programs from accessing memory that does not belong to them. It’s part of the security framework actually built-in to the hardware, that makes up the CPU.

More importantly, user-space programs are never supposed to be able to access kernel memory.

(Updated 01/11/2018 : )