I once ran into laypeople, who were able to understand what a modulus was – so that the output of a series of computations would never equal or exceed that modulus – and who were able to understand what the exponent function is, but who were incredulous, when I told them that it was possible to compute the result of raising a 2048-bit number, to a 2048-bit exponent, on the basis that the result only needs to fit inside a 2048-bit modulus.

I believe that the most common way in which this is done, is based on the assumption that two 2048-bit numbers can be multiplied, and the result brought back down to a 2048-bit modulus. This notion can be extended, to mean that a 2048-bit number can also be squared, and the result written in the 2048-bit modulus…

Well to achieve the exponent-function, one needs a base-register, an accumulator-register, and the exponent. The accumulator is initialized to the value (1).

If the exponent has 2048 bits, then the operation can be repeated 2048 times:

- Square the value in the accumulator.
- Left-Shift the Most-Significant Bit of the Exponent out, into a bit-register that can be examined.
- If that bit-register is equal to (1) and not (0), multiply the base-register into the accumulator an extra time.

Because the Most-Significant Bit of the Exponent was shifted out first, its being (1) would mean that the value in the Accumulator was multiplied by the Base earlier, so that this exponent of the Base will also be squared, as a factor of the Accumulator, by the highest number of iterations, thus effectively raising the Base to the power of 1, 2, 4, 8, 16, 32, … 2^2047 , in combinations.

(Update 09/10/2018, 7h45 : )

I am well aware of the fact, that when Step 1 above is executed for the first time, it has no effect. In fact, depending on how many most-significant zeroes the exponent has, this could even repeat itself. One way in which this ‘waste’ of CPU cycles can be removed, is by changing Step 1 above to read:

*If the value in the accumulator is Greater Than 1*, square it.

But the problems with such a supposed ‘optimization’ when performed with a 2048-bit exponent would be, that the value in the accumulator needs to be compared with (1), 2048 times, and as long as the accumulator also has 2048 bits, depending on how it’s being stored, that each comparison needs to go through the entire accumulator. If we could assume that only a few most-significant bits in the exponent are in fact zeroes, such an optimization might actually slow the loop down.

(Updated 09/15/2018, 12h50 : )

IF the reader needs to *implement* this type of exponentiation based on an arbitrary-precision integer library, and on the assumption that the exponent may have an arbitrary length – i.e. be 17 bits long as easily as 2048 bits long, then the first variable to take into account would be, whether the arbitrary-precision integer library is applying the approach, to make the least-significant bits the first word linked to, such that a linked list would lead to the progressively more-significant bits indirectly. This could be seen as kind of an equivalent, to Little-Endian representation, except that it would exist with linked lists, instead of with contiguous bytes, that would form a fixed-length field.

If this is the case, the organization of an integer as a linked list is not optimal, for exponentiation. In such a case, primitive operations on the linked list would be helpful, that break it down into a least-significant word first, followed by the more-significant words. This operation would be Mathematically equivalent, to dividing by ( 2^(Word-Size) ), and additionally forming the remainder. However, to make this operation speedy, it is best implemented as a C or a C++ subroutine, even if the arbitrary-precision integer library exposes purely Mathematical operations, as an API.

If instead the application programmer is only able to access purely Mathematical operations from such a library, then another approach would be, first to count how many bits long the exponent is, let’s say by halving it repeatedly until we get zero. And while doing so, it’s also possible to construct a number, the bits of which are simply the bits of the exponent reversed.

If the library exposes functions that determine the quotient and remainder, of dividing by ( 2^(Word-Size) ), then the application programmer can do so first, and devise a subroutine, which only elects to exponentiate a base, with ( E < 2^(Word-Size) ). In that case, exponentiation by an arbitrary-length exponent, can be composited out of numerous operations to perform this smaller exponentiation ‘by chunks’. And again, *the exponentiation would start with the most-significant chunk*.

If we can’t do *this*, then an arbitrary-precision integer library can also test the least-significant bit repeatedly, as it halves the resulting, derived integer. But there would be a considerable performance penalty in doing that, if the API operation to divide a 2048-bit number by 2, has not also been optimized, to arrive solely at a remainder, *in less than Asymptotic Time*.

Dirk