Secondary Polishing

When the project is undertaken to write programs, that would be sub-components to Computer Algebra Systems, but that produce floating-point numerical outputs, then an unwanted side effect of how those work is, they can be output in place of integers (whole numbers), but may differ from those integers by some very small fractional amount. Thus, instead of outputting (1) exactly, such a program might output:


The problem is that such output can be visually misleading, and confusing because a Human user wants to know that the answer to a problem was (1). And so a possible step in the refinement of such programs is “Secondary Polishing”, which does not change the actual computations, but which makes the output ‘look nicer’.

I recently completed a project that approximates the roots of arbitrary polynomials, and also looked in to the need for secondary polishing. There was one specific situation in which this was not required: The root’s real or imaginary component could have an absolute of (1/10) or greater. In this case, the simple fact that I had set the precision of the printed output to (14), but that the roots found are more precise than to be within (10^-14), at least after the actual, primary polishing, that affects computed values, together with the way the standard output functions work in C++, will cause the example above to be output as a single-digit (1), even though what was stored internally might be different from that, by less than (10^-14). But a special case exists within the norms of C++, if the absolute of the numerical term to be output is less than (1/10).

(Updated 2/11/2019, 19h35 … )

Scientists have been using a notation to write a wide range of numbers for a long time, which is called ‘Scientific Notation’, and what it means is that a digit is to be printed, followed by a decimal point and more digits where appropriate, and then the fact needs to be denoted, that this term should be multiplied by (10) raised to some positive or negative exponent. Floating-point numbers in Computing follow the same principle, only with exponents of (2) instead of exponents of (10).

But, ever since the early days of Computing, computers were not able to typeset the Mathematical statement, that a number should be multiplied by an exponent of (10). Instead, since the beginning of Computing, an expression could be output, which looks like this:


That ‘E’ has no defined meaning in Mathematics. But in Computing it means, that the numerical value to the left of it should be multiplied by (10), raised to whatever number is to the right of the ‘E’, as the exponent. This convention has been followed to the present day. (:1) In this case, the printed result would have the order of magnitude of (10^-40), so that the decimal point would need to be moved 40 places further to the left, to result in a number the way some people might normally read it.

Even though the order of magnitude is so tiny, C++ will casually print such numbers, with 16 digits of precision, or, if only (14) was set, with 14 digits of precision. Thus, the way I had my program written, it would still have output the number above as:


(With the last digit rounded up.) Because it was never the intention of my search algorithm to be that precise, I needed to set all the numeric values that are to be printed to zero, if their absolute was less than ‘epsilon’, which I had set to (10^12) actually, which defines the actual precision of my search algorithm, before the primary polishing, which in fact does affect computation.

And this was my secondary polishing.


(Update 2/11/2019, 19h35 : )


There is one slight way, in which the floating-point number I used in the example above, would not have been output ‘in the early days of computing’. Early computers that I can remember, actually had fewer bits defining their floating-point numbers, and in some cases even had a bus-width with multiples of 3 bits. Today’s norm, is multiples of 4 bits, that are sometimes called ‘nibbles’ or half-bytes.

And so the maximum number which a Helwett-Packard 2000 could print, before emitting an overflow message, was in the vicinity of:


This was roughly similar to today’s single-precision, 32-bit format.

Today, 64-bit ‘double’ floating-point numbers, have largely replaced even the 32-bit format.



Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>