Generating a Germain Prime efficiently, using Python and an 8-core CPU.

I have spent extensive time, as well as previous blog postings, exploring the subject of how to generate a Germain prime (number), in the Python (3.5) programming language, but taking advantage of parallel computing to some slight degree.

Before I go any further with this subject, I’d like to point out that generally, for production-ready applications, Python is not the best language to use. The main reason is the fact that Python is an interpreted language, even though many modern interpreted languages are compiled into bytecode before being interpreted. This makes a Python script slower by nature, than very well-written C or even C++. But what I aim to do is to use Lego-Blocks to explore this exercise, yet, to use tools which would be used professionally.

The main way I am generating prime numbers is, to start with a pseudo-random, 512-bit number (just as a default, the user can specify different bit-lengths), and then to attempt to divide this number by a list of known, definite primes, that by now only extend to 4096 (exclusively, of course), in an attempt to disprove that number prime. In about 1/8 of all cases, the number survives this test, after which I undertake a more robust, Miller-Rabin approach to try disproving it prime 192 times, probabilistically. If the number has survived these tests, my program assumes it to be prime.

Even though I’ve never been told this, the probability of a non-prime Candidate surviving even a single Miller-Rabin Test, or Witness, is very small, smaller than 1/8. This could be due to the fact that the Witness in question is raised to a high, odd exponent, in its 512-bit modulus etc., after which it would be squared some number of times. Because the first Candidate in the modulus of 4 is 3, that one actually gets squared a subsequent total of zero times. And after each exponentiation, the result could be any number in the modulus, with the possible exception of zero. It needs to become either (1) or (n-1) in the modulus of (n), for the Candidate to survive the test. (:1)

Further, there is no need for the numbers that get used as witnesses, which are pseudo-random, to be of the same, high, cryptographic quality of pseudo-random, as the candidate is, which is being tested.

But there is a sub-category of prime numbers which have recently been of greater interest to me, which is known as the Germain prime numbers, such that the Totient of that main candidate, divided by two, should also be prime. And so, if the density of prime numbers roughly equal to (n) is (1 / log(n)), and if we can assume a uniform distribution of random numbers, then the probability of finding a Germain prime is roughly (1 / (log (n))2), assuming that our code was inefficient enough actually to test all numbers. The efficiency can be boosted by making sure that the random number modulo 4 equals 3.

But one early difficulty I had with this project was, that if I needed to start with a new pseudo-random number for each candidate, on my Linux computers, I’d actually break ‘/dev/urandom’ ! Therefore, the slightly better approach which I needed to take next was, to make my initial random number the pseudo-random one, and then just to keep incrementing it by 4, until the code plodded into a Germain prime.

Even when all this is incorporated into the solution I find that with Python, I need the added advantage of parallel computing. Thus, I next learned about the GIL – The Global Interpreter Lock – and the resulting pitfalls of multi-threaded Python, which is not concurrent. Multi-threading under Python tells the O/S to allocate CPU cores as usual, but then only allows 1 core to be running at any one time! But, even under those constraints, I found that I was benefiting from the fact that my early code was able to test at least 2 candidates for primality simultaneously, those being the actual candidate, as well as its Totient divided by 2. And as soon as either candidate was disproved prime, testing on the other could be stopped quickly. This reduced the necessary number of computations dramatically, to make up for the slowness of multi-threaded Python, and I felt that I was on the right track.

The only hurdle which remained was, how to convert my code into multi-processing code, no longer merely multi-threaded, while keeping the ability for two processes, then, to send each other a shutdown-command, as soon as the present process disproved its number to be prime.

(Updated 9/19/2020, 17h55 … )

(As of 5/25/2019 : )

What I finally found worked was, to use shared memory.

It was really just a bit of a myth, for me to propagate that processes have as an inherent disadvantage compared to threads, the inability to share each other’s memory. While it’s possible for them to do so, usually, in low-level code, it’s just much more difficult to implement. And so, I was very happy to learn that Python 3’s ‘multiprocessing’ module does possess the classes ‘Value’ and ‘Array’, which make sharing of values and arrays straightforward, because the implementation details have been taken care of in the Python interpreter.

The only step which remained to be solved was, how to take care of the semantics of making such shared memory accessible to the worker processes, from the main process, without trying to pass it in via the ‘starmap_async()’ function, as an argument. The reason that would not work was, the fact that when this function is used to create a batch of processes, not threads, it serializes the arguments, which is also called ‘to pickle the arguments’, at which point of course, they no longer consist of a shared region of memory addresses.

And so to solve that last hurdle, what worked was to declare my shared object as a global variable, and then to access it both in the main process and in the worker processes, from the same source-file.

The following is my result:




If the reader happens to have a system with multiple real cores, and which incorporates more than one CPU socket, then the code above will not work without modification. The reason for that is the fact that, even when multiple processes and not threads are spawned on such a motherboard, processes that have been scheduled by the O/S to run on separate physical CPUs, still cannot share memory. This is only common sense. In all the examples I’m aware of, each socket has its own set of RAM modules… It’s like having two computers on one motherboard.

Now, there are some naive attempts I could make, to adapt the code to such situations. But, attempts which I can think of would still have two pitfalls:

  1. If an exact attempt is made to divide the number of real cores, by the number of sockets, then the result that I can cobble up will only work under Linux. And I don’t believe in software monopolies,
  2. Even if the number of real cores has simply been halved, I still cannot be sure that the O/S will schedule the resulting processes to run on one physical CPU (just because their number would be equal to the number of real cores, on one physical CPU).

The reader is invited to make such modifications himself. If I were to make such a modification, I’d try to use CPU affinity, but would also prefer to test that feature. But, I do not possess a motherboard with two physical sockets like that…

If the unmodified code was run on such a board, Yes, I’d expect that the Python interpreter would throw errors over incorrectly-referred objects. But I’d also expect that this would happen in each of the worker-processes, not in the main process. So the user would receive a lengthy error message, potentially 16 times.

One of the things which I try to avoid doing myself, is to allocate shared memory, and then to have the program crash. My reason for disliking this situation is the fact that the shared memory segment may not only form a resource leak in user-space, but may cause a much-more-expensive leak of kernel resources, for each time the script is run.

And the allocation is automatic, in the form of the global variable.


Okay, okay, okay, I give in. You’ve convinced me. For people who, like me, are Linux-based, but who, unlike me, have a motherboard with more than 1 physical CPU socket, I’ve written a version of the Python script which has correct syntax, the logic of which strikes me as sound, but which I could not test – as explained above:





(Update 5/26/2019, 21h50 : )


One of the questions which I’ve mused about before was, whether the Miller-Rabin Test provides an ultimate proof of primality, and, whether it will always be stronger than the Fermat Test.

According to the Fermat Test, certain known numbers will seem to be prime for every witness chosen, even though they are not prime. Those famous numbers are called ‘Carmichael Numbers’, or, ‘Pseudo-Primes’.

The Fermat Test itself is simply based on the concepts, that if the modulus is prime number (n), then the Totient relevant to the exponent is (n – 1), and that to raise any number other than zero itself, to the power of zero, will produce (1). Well, in the modulus of (n – 1), (n – 1) is equivalent to zero. Hence, the Fermat Test is attempting to test, whether (n – 1) is actually the Totient of (n). (:2)

I have observed myself that if the candidate is indeed prime, and if the Totient’s factorization only contains a single power of (2), then to raise the witnesses to the power of ((n – 1) / 2), will produce the result (1) half the time on average.

If the other half of the witnesses produce (n – 1) instead, this corresponds to (-1) in the modulus of (n), which, when squared, will simply give the result of the Fermat Test. But, the Miller-Rabin Test skips this last squaring because it would only lead to a predictable result. At maximum, Miller-Rabin will square the intermediate result (s – 1) times, where (s) is the power of (2) present in (n – 1). (:3)

I suppose that one behaviour of the Carmichael Numbers could be, to produce a result other than (-1), immediately before producing (1). The sheer statistical probability of failing to discover this with 192 applications of the Miller-Rabin Test, is (1 / 2192). So, the Miller-Rabin Test will generally be stronger than the Fermat Test.



(Update 5/28/2019, 15h15 : )


Given what the Fermat Test consists of, an important question to ask next would be, ‘Why do so-called Carmichael Numbers and their relatives even exist?’ And the answer lies in the fact that the following two statements don’t have the same meaning:


p → q
q → p


Each of these implications is the converse of the other, and even if one is true, it does not necessarily follow that the other will be true.

In a similar vein, I could be sure that:


∃ a > 1 ∋
n > a + 1  ∧
a2 mod n = 1 →
'n Is not prime.'


But what I really need to know, is that the converse of this implication is true. But, even though the logical converse has not, to the best of my knowledge, been proved, the fallacy of the causal term contained above found many times – or, found every time – is taken to be equivalent, to proving the fallacy of the implied term. ‘If he can’t be proven guilty, then he must be innocent.’ Even though to do so has proved false in the case of the Fermat Test. (:4)

In fact, some trivial thought experiments can reveal cases, in which the converse of the implication which I wrote above, is actually false. The prime-number candidate (n) could be (10), which is known not to be prime. But the only cases in which (a2 mod n == 1), would be the witnesses (a == 1) and (a == 9 … a2 == 81 … mod 10 == 1). (a == 9) corresponds to (a == n-1), which was also an example allowed to be true, for primes.

The fact should just be acknowledged once more, that the Miller-Rabin Test makes the additional, critical stipulation, that the result of such a squaring must become Algebraically equivalent to the Fermat Test, such that (a9 mod 10) would need to equal (1), for all (1 < a < 10), which already fails because (99 == 387420489). Besides, (10) being an even number means that it will fail the Miller-Rabin Test, over the trivial fact that the power of (2), present in the factorization of (n – 1), would fatally equal zero! Because the only even prime number possible is (2), one would more normally choose candidates (n) that are odd, for which reason (n – 1) is supposed to be even, and in that case, there will be at least one implicit squaring within the Fermat Test.



(Update 5/29/2019, 13h45 : )

I suppose that one unconscious conclusion which the reader could come to, is that the converse of the implication which I suggested above can only be disproved, if the candidate chosen is even. So let’s redo that, with (n == 9), which is also known not to be prime:


22 = 4 ... mod 9 = 4
32 = 9 ... mod 9 = 0
42 = 16 ... mod 9 = 7
52 = 25 ... mod 9 = 7
62 = 36 ... mod 9 = 0
72 = 49 ... mod 9 = 4
82 = 64 ... mod 9 = 1, 8 = (9-1)
    (Because, 7 · 9 = 63)


Thus, this behaviour can happen just as easily if the chosen candidate is odd. Then, according to that, it would remain plausible that 9 is prime, even though we know better.

However, the calculation above never obtains (8). The Miller-Rabin Test would only conclude that (9) is prime, if this was to happen. And this requirement hinges on the fact that the success of the Fermat Test, must follow when (n-1) itself is squared.


(Update 5/30/2019, 10h05 : )

My real point is that even very small numbers, such as 2-256, may not equal zero. And so a sensible question to ask might be, for larger numbers (n), what the actual probability is, of the converse of the statement I made above being false. This probability is related to the probability that any number (n) would be a perfect square, minus (1). More precisely, the probability should be similar, to a number (n) not being a perfect square, (n) consecutive times, to represent the multiples of the modulus. And so a very approximate equation that comes to mind would be:


Casual inspection reminds us, that if we are given a number less than (1) even by some small amount, and if we raise that to some very high exponent, we might obtain a very small fraction nonetheless.

Yet, as I’ve tried to remind the reader, two things would actually need to go wrong simultaneously, for the Miller-Rabin Test to indicate that a non-prime number be prime. In addition, the number would already need to be a Fermat Pseudo-Prime. And I do not know whether there is a straightforward way to compute what the probability of that is, given a similarly-sized number to (n). If that probability can be computed, I’d multiply it with the probability I estimated above, to arrive at the probability of ~a Miller-Rabin Pseudo-Prime~.

However, the possibility still exists, that these two misfortunes just don’t take place for the same value of (n), for reasons we don’t know.

(Update 6/01/2019, 17h35 : )

I have by now done my Math thoroughly, and computed what the probability is of squaring every number of such a modulus, except for (1) and (n-1), and never obtaining the result (1). Below is my work-sheet:

Link to a Letter-Sized PDF

Link to a PDF Adapted to be Read On A Phone

Link to an EPUB3 File with MathML



(Update 5/28/2019, 14h00 : )


I suppose that there’s another issue which my readers might have with the suppositions of this posting. If the premise was accepted that ((n – 1) mod n) was roughly equivalent to (-1), the next question might follow: ‘How can any number exist, which when squared, results in (-1)? This is usually only possible with complex numbers, where such a number needs to be (±i).’

And the simplest answer to this question would be that, this situation does not only exist with complex numbers, but also in modulus arithmetic, the circular numbers of which have important discrepancies, to how integers would otherwise work. If this did not exist, the nonexistence could also interfere with the eventual result of (1 mod n), and it would then also cause the Fermat Test always to fail.


(Update 6/01/2019, 0h20 : )


The thought occurs to me that so-called Carmichael Numbers may exist for a different reason than I had previously thought. When used as a modulus, in which some members are squared several times, they could simply map an unusually high number of members to (1). If this is the case, then the Miller-Rabin Test will just be particularly proficient at outing these numbers.

My earlier ideas were based on the assumption that it was the odd factor in (n – 1) that would somehow have been responsible for their misbehaviour.



(Update 9/19/2020, 13h30: )

The fact has come to my attention – because I was running my algorithm on a different architecture recently – that the original version of this Python Script has a vulnerability. This has to do with calling the function:




I was given some advanced warning, that the ‘psutil’ package which is built-in to Python can be quirky, when asked in this way, ‘How many real cores the CPU has, as opposed to how many logical cores it has.’ Many CPUs only offer ‘multiple logical cores’, all belonging to ‘one real core’. Instead of giving a rational result in such cases, such as (1), what the function call may return is ‘None’, in which case, a numerical comparison with an integer simply causes an error message at run-time.

I also know that Python packages exist, which are not weak in this way. However, those other packages are add-ons under Linux, and not part of the core Python implementation, and so I avoided using those.

What I have now done is to update my code, no longer to generate an error message in such cases, in the following two files:



(Update 9/19/2020, 17h10: )

I should admit that an earlier version of these Python scripts, also contained ‘an error in form’, approximately as follows:


if __name__ == '__main__':
    stop_threads = Array('i', range(256))



Literally, that code creates an associative array, by first expanding ‘range(256)‘ into a list of integers (from ‘zero’ to ‘255’ inclusively), and then declaring, that each integer in the list should lead to one of the array’s elements. Because the code runs, the semantic roughness is not immediately obvious, but, the code is much slower than it should be. This ‘error’ was actually pointed out to me recently by ‘thonny’, which is a Debian 10 / Buster Python IDE, that I now have running on a tablet.


The way I rewrote the code now is like so:


if __name__ == '__main__':
    stop_threads = Array('i', 256)



In a more direct fashion, this code just creates an array – to be shared between processes – with 256 elements, and to be indexed by a single integer, that forms part of the contiguous range. Obviously, this improvement is actually needed, to get the code to be efficient.



Admittedly, Python does two things in general, that mitigated the impact of the previous inefficiency:

  1. Python actually prefers associative arrays over the C or C++ -type arrays, in external modules that provide arrays, And
  2. Python frequently optimizes its internal representation of lists of integers, into what Python calls “Slices”.

However, the reality remains that, even if the Python interpreter only needs to perform 2 look-ups instead of 1, to access one element of an array, this already halves the resulting speed…



(Update 9/19/2020, 17h55: )

The recent updates to this posting imply that the multiprocessing version of the script runs fine, on the tablet, that did not assume that physical CPU cores reside in more than one socket. Yet, the same tablet is only loosely specified to have 8 cores, with no explicit mention where I can find it, of ~how many CPUs~ it has, totalling 8 cores. And so, given the fact that I simply ran the script and obtained the correct results, an obvious question arises, of why this works at all. And I would guess that it works because “UserLAnd”, the Android app that provides Linux on that tablet, does one out of two things, to the Linux Guest System:

  1. UserLAnd may only allow the Guest System to take up a single CPU core, but in a way that gives a corrupted readout within certain multi-processing scripts like mine, Or
  2. UserLAnd may allow the Guest System to take up more than 1 core, but always belonging to the more-powerful, frequency-scaling CPU, never, belonging to the low-power, constant-frequency CPU…


One feat which has been possible since the earlier years of Computing has been, to be running several processes on only one CPU, and on one CPU core. Doing so simply slows down the processes in question. But, because finding a Germain Prime requires that two matching candidates be tested for primeness, and requires that both candidates pass, there is still an optimization to be gained, for running at least 2 processes. With great probability, one of the two candidates in the set will be proven not-prime, while the other could be prime, and could consume the CPU for the full duration of its testing. The way I wrote the code, as soon as 1 candidate of the set is proven not-prime, testing on its counterpart is quickly stopped, and a new set of two candidates is generated.

That was the whole purpose of sharing a ‘multiprocessing.Array’ object.

Thus, even to be running 2 processes on 1 core, will end up speeding the eventual recognition of 1 Germain Prime. Therefore, I have also made that the default behaviour, if:


psutil.cpu_count(logical=False) == None





Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>