How to compute the sine function, on a CPU with no FPU.

There exists a maxim in the publishing world, which is, ‘Publish or Perish.’ I guess it’s a good thing I’m not a publisher, then. In any case, it’s been a while since I posted anything, so I decided to share with the community some wisdom that existed in the early days of computing, and when I say that, it really means, ‘back in the early days’. This is something that might have been used on mini-computers, or, on the computers in certain special applications, before PCs as such existed.

A standard capability which should exist, is to compute a decently accurate sine function. And one of the most lame reasons could be, the fact that audio files have been encoded with an amplitude, but that a decoder, or speech synthesis chip, might only need to be able to play back a sine-wave, that has that encoded peak amplitude. However, it’s not always a given that any ‘CPU’ (“Central Processing Unit”) actually possesses an ‘FPU’ (a “Floating-Point Unit”). In such situations, programmers back-when devised a trick.

It’s already known, that a table of pre-computed sine functions could be made part of a program, numbering maybe 256, but that, if all a program did was, to look up sine values from such a table once, ridiculously poor accuracies would initially result. But it was also known that, as long as the interval of 1 sine-wave was from (zero) to (two-times-pi), the derivative of the sine function was the cosine function. So, the trick, really, was, to make not one lookup into the table, but at least two, one to fetch an approximate sine value, and the next, to fetch an approximate cosine value, the latter of which was supposedly the derivative of the sine value at the same point. What could be done was, that a fractional part of the parameter, between table entries, could be multiplied by this derivative, and the result also added to the sine value, thus yielding a closer approximation to the real sine value. (:3)

But, a question which readers might have about this next could be, ‘Why does Dirk not just look up two adjacent sine-values, subtract to get the delta, and then, multiply the fractional part by this delta?’ And the answer is, ‘Because one can not only apply the first derivative, but also the second derivative, by squaring the fractional part and halving it (:1), before multiplying the result from that, by the negative of the sine function!’ One obtains a section of a parabola, and results from a 256-element table, that are close to 16 bits accurate!

The source code can be found in my binaries folder, which is:

https://dirkmittler.homeip.net/binaries/

And, in that folder, the compressed files of interest would be, ‘IntSine.tar.gz’ and ‘IntSine.zip’. They are written in C. The variance that I get, from established values, in (16-bit) integer units squared, is “0.811416” “0.580644” (:2). Any variance lower than (1.0) should be considered ‘good’, since (±1) is actually the smallest-possible, per-result error.


(Updated 12/04/2020, 11h50… )

(As of 12/01/2020, 19h10: )

Hint: In order to link both source files, the directive ‘-lm’ needs to be given, when using Linux. One source file legitimately uses the true, double-precision library function, in order to build the table of integer constants, while the other is only using it as a comparison, to score the accuracy of the integer arithmetic, which itself does not require the Math library.

Further, when compiling on a platform as old as Debian 8 / Jessie, the flag ‘-std=gnu11′ must be used.


 

 

(Update 12/02/2020, 5h55: )

Given that one thing which I did was, to multiply 2 times Pi by 65536, in order to arrive at an integer constant, which the fractional part between two table-entries was to be multiplied with, this could also raise questions from the reader. Because my format really is, to take the 8 LSBs  of a 16-bit integer value, as the fractional position between two table-entries, in such a way that the entire 16-bit modulus defines 1 cycle of the sine wave, it follows that the first derivative is actually 2 times Pi times the cosine function, etc…

I can know that 411774 is a 19-bit integer in two ways:

  • I know that 2 times Pi does not exceed 7, so that only 3 extra bits are being added, to what would otherwise be a 16-bit fraction, And
  • While 2 to the power of 20 can be estimated as being approximately 1 million, what this really means is, that most 20-bit values don’t actually reach 1 million, only spanning approximately from 500,000 to 1,000,000. Since my integer is closer to within the range from 250,000 to 500,000, it’s most probably a 19-bit number.

Further, while I like the format of having a 16-LSB fractional part to certain numbers, I found that I could reduce round-off errors slightly, if I made the fractional part of certain other numbers 17 bits wide. I apologize if this makes my code less readable. But what this means is that, theoretically, if I was to square such a value, it would have a 34-bit fractional part, even though it’s only a signed, 32-bit integer! I suppose the fact is fortunate then, that the maximum starting value (in the corresponding part of the program) was only a positive 12-bit number. But this means that I’d need to right-shift the value by 18 bits, to arrive at a representation, which again, has a 16-bit fractional field.

I (was) right-shifting the value by 19 bits (in an earlier version of the code), because I additionally needed to halve it, not, because to do so directly stems from the format of the number.


 

 

(Update 12/02/2020, 6h15: )

An additional question which could be asked would be, whether there would be any benefit, to also computing the third derivative, cubing the fractional component, and dividing by three six.

But even before dividing by three, an observation speaks against doing so. If the fractional component by itself, within a 16-bit fraction, only had 11 bits max, and if the fractional component squared, again within a 16-bit fraction, only had 6 bits max, then their product will have 17 bits max. After that has been divided by three, it will not have more than 16 bits. What this means is, that after being right-shifted one more time, to align it with a 16-bit fractional field, the result should be zero. Doing so cannot offer any improvements, within a 16-bit fractional format.

 


 

(Update 12/02/2020, 8h20: )

1:)

A question which some readers may not know the answer to would be of the form: ‘The second derivative of a function can be multiplied, by a small difference in the parameter squared; why does it need to be multiplied by that small difference, squared and halved?’ And the answer is as follows:

(x) Times a constant, is assumed to give the (first) derivative of a certain value. And the reason this is so, is because the integral of a constant over (x) is that constant times (x). What this means is, that the value itself needs to be the integral of (x), times that constant, in order for the constant, also to be the multiplier of the second integral of (1). Well, the integral of (x) is (½ x2), therefore, the (½) term. (x) Was also the fractional part between two table-entries.

(Edit 12/03/2020, 21h05…

If the cubic term was also to be expanded, from the third derivative, it would follow as (1/6 x3)…

(…End Of Edit, 12/03/2020, 21h05. )


 

(Update 12/02/2020, 14h10: )

2:)

The way in which I computed the variances, assumed first of all, that integers were to be assessed, in which each unit corresponded to:

1 / 65536

Where, 65536 is also 216. The exact calculation used was:

(mean of x2) – (mean of x)2

The problem when reporting variances in this way, especially since the mean is supposed to be zero, is that the information is incomplete, unless the mean was also reported. And so, oddly, the mean equals (-0.111511).

I don’t know the explanation for this, and chalk it up to, ‘an odd conspiracy of round-off errors’.

This was calculated for all valid input-values, that are 16-bit integers, and where the modulus of 65536 completes 1 full cycle.


 

 

(Update 12/03/2020, 12h55: )

There’s another observation which can be made about my integer arithmetic, which has to do with the rule in Calculus 1, that is also known as the “chain rule”. What that rule implies, in this case is, that if the parameter of a sine-function is (x), because according to pure Math, that sine function completes 1 cycle, over the domain from [0 .. 2*Pi), If the parameter is to be multiplied by (two-times-Pi) – to complete 1 cycle over the interval [0 .. 1.0) – Then the first derivative of the sine function, will be the cosine function of (x), times two-times-Pi. Additionally, the derivative of the cosine function, given the same assumption, will be the negative of the sine function, times two-times-Pi. Thus, to compute the second derivative, I need to end up multiplying (x) by two-times-Pi twice.

What I did instead was, to multiply (x) by two-times-Pi once, both, before using it to expand the first derivative, and also, before squaring it, to expand the second derivative. What this accomplishes with fewer CPU operations is, to multiply by two-times-Pi twice, when expanding the second derivative.


 

(Update 12/03/2020, 19h05: )

By tweaking the code a little further, I was able to bring the variance down to (0.411991), and the DC offset down to (-0.077118).

 

There exists another platform-specific implementation detail, which my programs depended on, and which I received a surprise about, several months ago, when I was writing some simple ‘Qt5′ / GUI-building exercises. In C++, under the Debian / GNU Architectures, the following two function-calls differ in two ways and not one:

 


int(-1.5);
floor(-1.5);

 

One important difference is, that the ‘int()’ call is really a type-conversion, which C programs should never call in the above form, to an integer. The other, more important difference is the fact that, these two function-calls return numerically different results. ‘floor(-1.5)’ does exactly what I expected it to do, which is, to return (-2.0). But to my surprise back then, ‘int(-1.5)’ actually returned (-1) ! In other words, when assigning a floating-point number to type ‘int’, the absolute of the numerical value is rounded down ‘towards zero’, but, if the value was negative, the negative sign is put in front of it.

The way I corrected for this in my Qt5 exercise was, to avoid casting directly from a floating-point number to an integer, instead, using the ‘floor()’ function whenever possible.

But a previous version of my two programs depended on the cast. Correct programming practice is, first to compute the ‘floor()’ function of (the floating-point number + 0.5), and then, to assign the result to an integer, the second operation out of which should just, ‘Extract the integer part, when the fractional part is zero.’

The present version of my two programs uses the (correct) ‘floor()’ function, as described.


 

 

(Update 12/04/2020, 11h50: )

3:)

On the subject of Computing History, when I was reading about the fact that one of my predecessors was using such a trick, his trick differed from mine in two key ways:

  • He was only shooting for 8 bits of precision, not 16, And
  • His lookup table only had 128 entries, not 256.

And both those facts had as common reason, the fact that although programmable chips existed as early as in the late 1970s, and on into the 1980s, those programmable chips had far less ROM, than even embedded CPUs have today. They simply did not have enough ROM, to store a constant array of 256, 16-bit integers, let alone, 32-bit integers, as my suggested algorithm does.

I vaguely seem to recall, that programmable chips with approximately 1KB of ROM in total existed at that time, but their programmers also had to use that, for the actual programming, including, for everything else that a programmable chip was expected to do, as part of an appliance.

Now, my exercise can simply be modified, so that the peak amplitude in the sine-value table reaches ±214, let’s say, just so that the statement can be made, that all its constants be 16-bit constants. But if one did that, one would also no longer achieve ~16-bit accuracy in the results, as the output amplitude would also just scale down.

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>