64-bit FORTH

Before I describe a 64-bit FORTH version, I need to explain something about the more-established, general 32-bit version of this low-level language. The 32-bit FORTH had an accepted method of storing 64-bit, so-called ‘double-width’ numbers, in two positions on the stack, with the most-significant word ‘on top’, and the less-significant word in second position from the ‘top’. Correspondingly, ‘normal’ 32-bit FORTH possesses special operators that can either perform full, double-width arithmetic, which treats two consecutive stack-positions as defining a single number, or mixed-width arithmetic, in which two single-width numbers can lead to a double-width product, or by which a double-width number can be divided by a single-width, to arrive at a single-width quotient, and optionally, also to arrive at a single-width modulus / remainder.

This is a fashion in which 32-bit CPUs have generally been able to perform 64-bit arithmetic, partially. And if the reader is not familiar with how this is accessible under FORTH, I can suggest This External Article as a source of reference.

But, if the reader has installed the 64-bit GNU FORTH, which is also just called ‘gforth’ under Linux, then I should call to his attention, that now, each stack-position is capable of holding a 64-bit number, and that all the operators on those numbers are possible, which would otherwise be available for 32-bit numbers, with no special naming.

The following is a small text-session-clip, that illustrates how this works:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
1 cells . 8  ok
hex  ok
$0123456789ABCDEF $100000000 /mod .s <2> 89ABCDEF 1234567  ok
$20 lshift + .s <1> 123456789ABCDEF  ok
dup  ok
m* .s <2> -235A1DF76F0D5ADF 14B66DC33F6AC  ok
d. 14B66DC33F6ACDCA5E20890F2A521  ok
bye 
dirk@Klystron:~$ 


 

So the ‘/mod’, the ‘lshift’ and the ‘+’ operators are spelled exactly as they would have been for 32-bit FORTH, but operate on potential 64-bit numbers. ‘gforth’ still preserves the double-width operators with the special naming, but in this case, double-width actually means 128-bit ! Its implementation of the standard Fetch operator, which is still named ‘@’, now fetches a 64-bit value from RAM. And I have already documented this slight incompatibility in This Earlier Posting.

If we can assume that our source-code is to be compiled on 64-bit FORTH, we can just perform 64-bit operations on single stack-positions, at will.

It should also be noted that in FORTH comments and stack-traces, the topmost stack-position is written on the right-hand side of the textual list. The apparent negative number above, in the second stack position from the logical top, after ‘ m* .s ‘ , is the result of the most-significant bit of that word being a (1) and not a (0). By convention, in signed integers, this will trigger that a negative number is meant, using two’s complement. And this is still the case, in hexadecimal. But, because this word is the less-significant of the two listed, its most-significant bit will no longer be the most-significant, after it has been combined with the other word, thereby again forming a positive number when printed out as a single 128-bit, signed, integer.

(Edit 07/31/2017 : )

One fact which I have blatantly ignored in my own coding, was that the way in which I chose to separate a single numeric value into two bit-fields – through a modulus-division – is not the most efficient in terms of how many CPU-clock-cycles it consumes. A preferable way to do the same thing, is by using ‘rshift’, and then masking.

The reason for this is the fact that when a CPU is instructed to left-shift or right-shift a binary register-content, doing so takes up about 1 clock-cycle per bit-position shifted. What people may not realize, is that although addition and subtraction can easily be performed in one step by logic-circuits, multiplication and division may not be, assuming a generic, general-purpose CPU. To multiply two 64-bit numbers, actually means to perform 64 additions optionally, each depending on the value of one bit. And to divide a 64-bit number by another, actually means to perform 64 subtractions optionally, each depending on the outcome of a comparison. Maybe for 32-bit or 16-bit registers, we don’t care. But by the time we’re using 64-bit numbers, it penalizes our CPU-load twice as much.

Continue reading 64-bit FORTH

A Caveat in Using Infix Notation With FORTH

The default way in which the language FORTH works, is with Postfix Notation. Thus, if we wanted to compute the sum of 48 and 16 we would write

 


48 16 +

 

But there is a way to tease the language into using infix-notation for certain purposes. The following function will read text from the line, from which the FORTH interpreter is reading code, until a word of text has been completed, will convert this text into a number, and will push that line of text onto the stack:

 


: infix-number 0. bl word count >number 2drop drop ;

 

After this subroutine has been compiled, it can be used in the compilation of another subroutine, like so:

 


: infix-sum infix-number + ;

 

And after those two functions have been compiled, the following command will work from the top-level interpreter (of FORTH):

 


48 infix-sum 16

 

This will equally result in the sum of the two values being put on the stack, namely 64 in this example.

The problems in using this will begin, as soon as a programmer wants to use this form, in FORTH that’s supposed to be compiled. In error, a programmer could next compile a function which uses ‘infix-sum’, like so:

 


: output-16-higher infix-sum 16 ;

 

Assuming that the following line of code, given from the interpreter, will leave the value 64 on the stack again:

 


48 output-16-higher -?-

 

The reason this will fail, has to do with the fact that the FORTH-word ‘WORD‘ , expects to see a stream of text being input somewhere, when the subroutine it has been compiled into, is being executed. This needs to come from a text-stream which is being interpreted. FORTH consists of:

  1. A Text-Interpreter,
  2. A Compiler,
  3. A Bytecode-Interpreter.

When we delimit a set of words which have already been compiled, with a ‘:‘ and a ‘;‘ , we are assuming by default that each of the words used, is not to be executed, but that only its Bytecode-equivalent is to be inserted as part of the word which we are defining, which immediately follows the ‘:‘ . Thus, our example from above defines ‘output-16-higher‘ , without executing ‘infix-sum 16‘ .

Now, we might want to remedy this problem by putting

 


: infix-sum infix-number + ; immediate

 

And then indeed, as soon as the compiler encounters the word ‘infix-sum‘, it will execute that word, even during the compilation of ‘output-16-higher‘ . But then we’d run into another problem. ‘infix-sum‘ expects to receive a word off the stack. Naturally it will do so when it is executed. Does anybody know the value on the stack, while ‘output-16-higher‘ is being compiled, not executed? We want the value on the stack, when ‘output-16-higher‘ is executed.

What this dilemma also means, is that even though I now think that I’d want my own, future execution-command to be:

 


S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 0 4 * type

 

To say, Use Buffer (0), I might actually want to use instead:

 


0 S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type

 

To say the same thing, and To forget about using infix. Infix might be a nicety for data-input, but is the enemy, of wanting to compile our data input.

Dirk

 

About Encoding And Decoding Base-64 In FORTH

In This Previous Posting, I wrote that I had written some source-code in the language FORTH, that decodes standard Base-64 into a binary array of data, in output sizes that are multiples of 36 Bytes. For my own purposes, there might be no need to output Base-64, because I can use command-line utilities to prepare Base-64 strings, and then only use those as a means to enter the data, and embed it into future, hypothetical source code.

But the purposes of other, hypothetical software-developers have not been met with this exercise, because those people may need to be able to output Base-64, which means they’d need a matching encoder.

Unfortunately, the language does not lend itself to that easily, if a standard Base-64 radix is being implied, because 6-bit output-numerals would need to be bit-aligned, and trying to align fields of bits in FORTH is difficult.

(Edit 07/25/2017 : )

One subject which I have investigated more completely now, is the fact that the numeral-to-text conversion utilities built-in to FORTH, seem to continue to produce output, even if a Base of 64 has been set. In theory, the FORTH developers could have adopted a custom radix, in order to be able to state, that their binary-to-FB64 conversion is computed faster, than standard Base-64 could be. But OTOH, the characters output, could just become garbage, by the time 24-bit numerals are to be streamed:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok
bye 
dirk@Klystron:~$ 


 

My conclusion is, that This pseudo- Base-64 streaming remains usable, even when 24-bit numerals are given.

This conclusion reverses a negative, tentative conclusion, which I had only given yesterday.

I have by now coded both the encoder and decoder for standard Base-64, which I’ve named ‘b64-stream’ and ‘b64-parse’ respectively, but as well the encoder and decoder for the pseudo- Base-64, which I call ‘fb64-stream’ and ‘fb64-parse’. At this point, Base-64 has been implemented in a way software-experts would consider complete, with a full non-standard version of Base-64. This is what the code ultimately does:

 


dirk@Klystron:~$ cd ~/Programs
dirk@Klystron:~/Programs$ gforth fb64-parse-6.fs
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
S" generate1234TERRIBLE.,?-" 4 / b64-stream b64-parse 0 4 * type generate1234TERRIBLE.,?- ok
S" generate1234TERRIBLE.,?-" 4 / fb64-stream fb64-parse 0 4 * type generate1234TERRIBLE.,?- ok
bye 
dirk@Klystron:~/Programs$ 


 

My custom-semantics assume that on the stack, a binary array exists, with a numeric value placed on top of it, which warns each encoder, how many 32-bit words each array holds. OTOH, the input to each decoder expects a standard, full string, which the corresponding encoder outputs, and which also exist as two items on the stack each time, where the top numeral states how many characters long the string is, as per standard FORTH.

And below is the source-code (Updated 08/02/2017 : )

Continue reading About Encoding And Decoding Base-64 In FORTH

Completed Project: FORTH Base-64 Decoder

One of the subjects which I had written about earlier, was the fact that FORTH – the programming language – seems to output text fine, in a peculiar form of Base-64 that must have been quick to encode, and which I assume takes very little CPU power. This Base-64 is not compatible with the Base-64 used in email, and elsewhere where MIME-Types are used. The MIME-Base-64 is the standard one, not the FORTH approach.

This observation next led me to the question of whether FORTH can in fact decode its own, native, Base-64 – i.e. take the text and parse it into numbers again. And surprisingly the answer was No !

So a following question which has gripped me, was how easy or hard it might be to write a standard Base-64 Decoder / Parser, in FORTH.

I have just completed the project, and am willing to share all my results with the reader.

First of all, the Linux shell is an ideal place to test various encodings, because we have command-lines such as ‘base64′ as well as ‘tr’, and we can execute our other text-based programs, for the purpose of taking a picture of the text, which will show reproducible results when we were successful. And so the first text-clipping I have here, is an assortment of experiments I did in the terminal-window. It confirms the weird FORTH mapping, and also gives the standard Base-64 mapping of a snippet of text which I chose, that having been ‘generate1234TERRIBLE.,?‘ , and then uses ‘tr’ to show what that looks like when output by FORTH. Although, I have learned that standard Base-64 reverses its groups of 3 binary bytes, that each lead to 4 Base-64 characters, compared to the usual endianess of Linux computers and others. Therefore, if FORTH was made not to print out a single numeral, but rather a stream, its mapping might be different again from what is shown, due to this endianess.

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok
bye 
dirk@Klystron:~$ 

Base64:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

fb64:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv

tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

echo 'generate1234TERRIBLE.,?' | base64
echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64
Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K
dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'
PmLePN9XT6KhCZCkL4LIK[\2J4KeB3sA
dirk@Klystron:~$ 

S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type


 

I did not write an Encoder, i.e. a binary-to-Base-64 module. My main question was, if I wanted to embed binary data as prepared literal strings in FORTH, could I use a FORTH word defined in FORTH? The answer is Yes.

Also, I did not implement the ‘=’ symbol, which is used to denote that the Byte-alignment is not in sync with the character-alignment. Every 4 characters correspond to 3 Bytes, and if a random number of Bytes is assumed, there is a 2/3 chance that the last Byte or Bytes in the sequence, will not be a multiple of 3, and thus not conform to a multiple of 4 characters. Normally, this is why some Base-64 streams end in 1 or 2 ‘=’ symbols, to denote how many virtual Bytes were set to zero to allow the encoding to take place, but which are not to be decoded.

In fact, I made the rather blunt assumption that my FORTH code would assume that an integer number of work-units is to be completed, each of which is to take up 3 * 32 bits of binary, corresponding to 16 Base-64 characters. Incomplete work units will not be parsed by this program. Hence, if it was to be used to embed literal binary data, then those literals would certainly need to be prepared by software.

(Edit 07/22/2017 : The reason fw my Base-256 text-snippet only has 23 visible characters, where 24 might seem in order, thus completing 2 work units, was my knowledge that my software ends its strings with NULL characters – which FORTH actually does not do. If this was all there is to it, then the 4th-to-last Base-64 character should be an ‘A’, since the Bytes are read in in Big-Endian form, the way a standard stream is encoded. But what the Linux-based ‘echo’ command next does, is recognize the fact that this NULL characters ends the string and ignore it. And then, ‘echo’ appends a newline-character in its place. )

Below is the source-code, on which further source-code could also be based, to output the text:

Continue reading Completed Project: FORTH Base-64 Decoder