The default way in which the language FORTH works, is with Postfix Notation. Thus, if we wanted to compute the sum of 48 and 16 we would write
48 16 +
But there is a way to tease the language into using infix-notation for certain purposes. The following function will read text from the line, from which the FORTH interpreter is reading code, until a word of text has been completed, will convert this text into a number, and will push that line of text onto the stack:
: infix-number 0. bl word count >number 2drop drop ;
After this subroutine has been compiled, it can be used in the compilation of another subroutine, like so:
: infix-sum infix-number + ;
And after those two functions have been compiled, the following command will work from the top-level interpreter (of FORTH):
48 infix-sum 16
This will equally result in the sum of the two values being put on the stack, namely 64 in this example.
The problems in using this will begin, as soon as a programmer wants to use this form, in FORTH that’s supposed to be compiled. In error, a programmer could next compile a function which uses ‘infix-sum’, like so:
: output-16-higher infix-sum 16 ;
Assuming that the following line of code, given from the interpreter, will leave the value 64 on the stack again:
48 output-16-higher -?-
The reason this will fail, has to do with the fact that the FORTH-word ‘
WORD‘ , expects to see a stream of text being input somewhere, when the subroutine it has been compiled into, is being executed. This needs to come from a text-stream which is being interpreted. FORTH consists of:
- A Text-Interpreter,
- A Compiler,
- A Bytecode-Interpreter.
When we delimit a set of words which have already been compiled, with a ‘
:‘ and a ‘
;‘ , we are assuming by default that each of the words used, is not to be executed, but that only its Bytecode-equivalent is to be inserted as part of the word which we are defining, which immediately follows the ‘
:‘ . Thus, our example from above defines ‘
output-16-higher‘ , without executing ‘
infix-sum 16‘ .
Now, we might want to remedy this problem by putting
: infix-sum infix-number + ; immediate
And then indeed, as soon as the compiler encounters the word ‘
infix-sum‘, it will execute that word, even during the compilation of ‘
output-16-higher‘ . But then we’d run into another problem. ‘
infix-sum‘ expects to receive a word off the stack. Naturally it will do so when it is executed. Does anybody know the value on the stack, while ‘
output-16-higher‘ is being compiled, not executed? We want the value on the stack, when ‘
output-16-higher‘ is executed.
What this dilemma also means, is that even though I now think that I’d want my own, future execution-command to be:
S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 0 4 * type
To say, Use Buffer (0), I might actually want to use instead:
0 S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type
To say the same thing, and To forget about using infix. Infix might be a nicety for data-input, but is the enemy, of wanting to compile our data input.
In This Previous Posting, I wrote that I had written some source-code in the language FORTH, that decodes standard Base-64 into a binary array of data, in output sizes that are multiples of 36 Bytes. For my own purposes, there might be no need to output Base-64, because I can use command-line utilities to prepare Base-64 strings, and then only use those as a means to enter the data, and embed it into future, hypothetical source code.
But the purposes of other, hypothetical software-developers have not been met with this exercise, because those people may need to be able to output Base-64, which means they’d need a matching encoder.
Unfortunately, the language does not lend itself to that easily, if a standard Base-64 radix is being implied, because 6-bit output-numerals would need to be bit-aligned, and trying to align fields of bits in FORTH is difficult.
(Edit 07/25/2017 : )
One subject which I have investigated more completely now, is the fact that the numeral-to-text conversion utilities built-in to FORTH, seem to continue to produce output, even if a Base of 64 has been set. In theory, the FORTH developers could have adopted a custom radix, in order to be able to state, that their binary-to-FB64 conversion is computed faster, than standard Base-64 could be. But OTOH, the characters output, could just become garbage, by the time 24-bit numerals are to be streamed:
dirk@Klystron:~$ gforth Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit : list-forth-b64 64 base ! &255 &0 do i . space loop ; ok list-forth-b64 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e f g h i j k l m n o p q r s t u v 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G 1H 1I 1J 1K 1L 1M 1N 1O 1P 1Q 1R 1S 1T 1U 1V 1W 1X 1Y 1Z 1 1^ 1_ 1` 1a 1b 1c 1d 1e 1f 1g 1h 1i 1j 1k 1l 1m 1n 1o 1p 1q 1r 1s 1t 1u 1v 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 2G 2H 2I 2J 2K 2L 2M 2N 2O 2P 2Q 2R 2S 2T 2U 2V 2W 2X 2Y 2Z 2 2^ 2_ 2` 2a 2b 2c 2d 2e 2f 2g 2h 2i 2j 2k 2l 2m 2n 2o 2p 2q 2r 2s 2t 2u 2v 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L 3M 3N 3O 3P 3Q 3R 3S 3T 3U 3V 3W 3X 3Y 3Z 3 3^ 3_ 3` 3a 3b 3c 3d 3e 3f 3g 3h 3i 3j 3k 3l 3m 3n 3o 3p 3q 3r 3s 3t 3u 3v ok bye dirk@Klystron:~$
My conclusion is, that This pseudo- Base-64 streaming remains usable, even when 24-bit numerals are given.
This conclusion reverses a negative, tentative conclusion, which I had only given yesterday.
I have by now coded both the encoder and decoder for standard Base-64, which I’ve named ‘b64-stream’ and ‘b64-parse’ respectively, but as well the encoder and decoder for the pseudo- Base-64, which I call ‘fb64-stream’ and ‘fb64-parse’. At this point, Base-64 has been implemented in a way software-experts would consider complete, with a full non-standard version of Base-64. This is what the code ultimately does:
dirk@Klystron:~$ cd ~/Programs dirk@Klystron:~/Programs$ gforth fb64-parse-6.fs Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit S" generate1234TERRIBLE.,?-" 4 / b64-stream b64-parse 0 4 * type generate1234TERRIBLE.,?- ok S" generate1234TERRIBLE.,?-" 4 / fb64-stream fb64-parse 0 4 * type generate1234TERRIBLE.,?- ok bye dirk@Klystron:~/Programs$
My custom-semantics assume that on the stack, a binary array exists, with a numeric value placed on top of it, which warns each encoder, how many 32-bit words each array holds. OTOH, the input to each decoder expects a standard, full string, which the corresponding encoder outputs, and which also exist as two items on the stack each time, where the top numeral states how many characters long the string is, as per standard FORTH.
And below is the source-code (Updated 07/26/2017 : )
One of the subjects which I had written about earlier, was the fact that FORTH – the programming language – seems to output text fine, in a peculiar form of Base-64 that must have been quick to encode, and which I assume takes very little CPU power. This Base-64 is not compatible with the Base-64 used in email, and elsewhere where MIME-Types are used. The MIME-Base-64 is the standard one, not the FORTH approach.
This observation next led me to the question of whether FORTH can in fact decode its own, native, Base-64 – i.e. take the text and parse it into numbers again. And surprisingly the answer was No !
So a following question which has gripped me, was how easy or hard it might be to write a standard Base-64 Decoder / Parser, in FORTH.
I have just completed the project, and am willing to share all my results with the reader.
First of all, the Linux shell is an ideal place to test various encodings, because we have command-lines such as ‘base64′ as well as ‘tr’, and we can execute our other text-based programs, for the purpose of taking a picture of the text, which will show reproducible results when we were successful. And so the first text-clipping I have here, is an assortment of experiments I did in the terminal-window. It confirms the weird FORTH mapping, and also gives the standard Base-64 mapping of a snippet of text which I chose, that having been ‘
generate1234TERRIBLE.,?‘ , and then uses ‘tr’ to show what that looks like when output by FORTH. Although, I have learned that standard Base-64 reverses its groups of 3 binary bytes, that each lead to 4 Base-64 characters, compared to the usual endianess of Linux computers and others. Therefore, if FORTH was made not to print out a single numeral, but rather a stream, its mapping might be different again from what is shown, due to this endianess.
dirk@Klystron:~$ gforth Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit : list-forth-b64 64 base ! &255 &0 do i . space loop ; ok list-forth-b64 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e f g h i j k l m n o p q r s t u v 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G 1H 1I 1J 1K 1L 1M 1N 1O 1P 1Q 1R 1S 1T 1U 1V 1W 1X 1Y 1Z 1 1^ 1_ 1` 1a 1b 1c 1d 1e 1f 1g 1h 1i 1j 1k 1l 1m 1n 1o 1p 1q 1r 1s 1t 1u 1v 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 2G 2H 2I 2J 2K 2L 2M 2N 2O 2P 2Q 2R 2S 2T 2U 2V 2W 2X 2Y 2Z 2 2^ 2_ 2` 2a 2b 2c 2d 2e 2f 2g 2h 2i 2j 2k 2l 2m 2n 2o 2p 2q 2r 2s 2t 2u 2v 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L 3M 3N 3O 3P 3Q 3R 3S 3T 3U 3V 3W 3X 3Y 3Z 3 3^ 3_ 3` 3a 3b 3c 3d 3e 3f 3g 3h 3i 3j 3k 3l 3m 3n 3o 3p 3q 3r 3s 3t 3u 3v ok bye dirk@Klystron:~$ Base64: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ fb64: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuv tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuv' echo 'generate1234TERRIBLE.,?' | base64 echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuv' dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuv' PmLePN9XT6KhCZCkL4LIK[\2J4KeB3sA dirk@Klystron:~$ S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type
I did not write an Encoder, i.e. a binary-to-Base-64 module. My main question was, if I wanted to embed binary data as prepared literal strings in FORTH, could I use a FORTH word defined in FORTH? The answer is Yes.
Also, I did not implement the ‘=’ symbol, which is used to denote that the Byte-alignment is not in sync with the character-alignment. Every 4 characters correspond to 3 Bytes, and if a random number of Bytes is assumed, there is a 2/3 chance that the last Byte or Bytes in the sequence, will not be a multiple of 3, and thus not conform to a multiple of 4 characters. Normally, this is why some Base-64 streams end in 1 or 2 ‘=’ symbols, to denote how many virtual Bytes were set to zero to allow the encoding to take place, but which are not to be decoded.
In fact, I made the rather blunt assumption that my FORTH code would assume that an integer number of work-units is to be completed, each of which is to take up 3 * 32 bits of binary, corresponding to 16 Base-64 characters. Incomplete work units will not be parsed by this program. Hence, if it was to be used to embed literal binary data, then those literals would certainly need to be prepared by software.
(Edit 07/22/2017 : The reason fw my Base-256 text-snippet only has 23 visible characters, where 24 might seem in order, thus completing 2 work units, was my knowledge that my software ends its strings with NULL characters – which FORTH actually does not do. If this was all there is to it, then the 4th-to-last Base-64 character should be an ‘A’, since the Bytes are read in in Big-Endian form, the way a standard stream is encoded. But what the Linux-based ‘echo’ command next does, is recognize the fact that this NULL characters ends the string and ignore it. And then, ‘echo’ appends a newline-character in its place. )
Below is the source-code, on which further source-code could also be based, to output the text: