One of the subjects which I had written about earlier, was the fact that FORTH – the programming language – seems to output text fine, in a peculiar form of Base-64 that must have been quick to encode, and which I assume takes very little CPU power. This Base-64 is not compatible with the Base-64 used in email, and elsewhere where MIME-Types are used. The MIME-Base-64 is the standard one, not the FORTH approach.
This observation next led me to the question of whether FORTH can in fact decode its own, native, Base-64 – i.e. take the text and parse it into numbers again. And surprisingly the answer was No !
So a following question which has gripped me, was how easy or hard it might be to write a standard Base-64 Decoder / Parser, in FORTH.
I have just completed the project, and am willing to share all my results with the reader.
First of all, the Linux shell is an ideal place to test various encodings, because we have command-lines such as ‘base64′ as well as ‘tr’, and we can execute our other text-based programs, for the purpose of taking a picture of the text, which will show reproducible results when we were successful. And so the first text-clipping I have here, is an assortment of experiments I did in the terminal-window. It confirms the weird FORTH mapping, and also gives the standard Base-64 mapping of a snippet of text which I chose, that having been ‘
generate1234TERRIBLE.,?‘ , and then uses ‘tr’ to show what that looks like when output by FORTH. Although, I have learned that standard Base-64 reverses its groups of 3 binary bytes, that each lead to 4 Base-64 characters, compared to the usual endianess of Linux computers and others. Therefore, if FORTH was made not to print out a single numeral, but rather a stream, its mapping might be different again from what is shown, due to this endianess.
dirk@Klystron:~$ gforth Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit : list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ; ok list-forth-b64 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G 1H 1I 1J 1K 1L 1M 1N 1O 1P 1Q 1R 1S 1T 1U 1V 1W 1X 1Y 1Z 1[ 1\ 1] 1^ 1_ 1` 1a 1b 1c 1d 1e 1f 1g 1h 1i 1j 1k 1l 1m 1n 1o 1p 1q 1r 1s 1t 1u 1v 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 2G 2H 2I 2J 2K 2L 2M 2N 2O 2P 2Q 2R 2S 2T 2U 2V 2W 2X 2Y 2Z 2[ 2\ 2] 2^ 2_ 2` 2a 2b 2c 2d 2e 2f 2g 2h 2i 2j 2k 2l 2m 2n 2o 2p 2q 2r 2s 2t 2u 2v 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L 3M 3N 3O 3P 3Q 3R 3S 3T 3U 3V 3W 3X 3Y 3Z 3[ 3\ 3] 3^ 3_ 3` 3a 3b 3c 3d 3e 3f 3g 3h 3i 3j 3k 3l 3m 3n 3o 3p 3q 3r 3s 3t 3u 3v ok bye dirk@Klystron:~$ Base64: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ fb64: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv' echo 'generate1234TERRIBLE.,?' | base64 echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv' dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv' PmLePN9XT6KhCZCkL4LIK[\2J4KeB3sA dirk@Klystron:~$ S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type
I did not write an Encoder, i.e. a binary-to-Base-64 module. My main question was, if I wanted to embed binary data as prepared literal strings in FORTH, could I use a FORTH word defined in FORTH? The answer is Yes.
Also, I did not implement the ‘=’ symbol, which is used to denote that the Byte-alignment is not in sync with the character-alignment. Every 4 characters correspond to 3 Bytes, and if a random number of Bytes is assumed, there is a 2/3 chance that the last Byte or Bytes in the sequence, will not be a multiple of 3, and thus not conform to a multiple of 4 characters. Normally, this is why some Base-64 streams end in 1 or 2 ‘=’ symbols, to denote how many virtual Bytes were set to zero to allow the encoding to take place, but which are not to be decoded.
In fact, I made the rather blunt assumption that my FORTH code would assume that an integer number of work-units is to be completed, each of which is to take up 3 * 32 bits of binary, corresponding to 16 Base-64 characters. Incomplete work units will not be parsed by this program. Hence, if it was to be used to embed literal binary data, then those literals would certainly need to be prepared by software.
(Edit 07/22/2017 : The reason fw my Base-256 text-snippet only has 23 visible characters, where 24 might seem in order, thus completing 2 work units, was my knowledge that my software ends its strings with NULL characters – which FORTH actually does not do. If this was all there is to it, then the 4th-to-last Base-64 character should be an ‘A’, since the Bytes are read in in Big-Endian form, the way a standard stream is encoded. But what the Linux-based ‘echo’ command next does, is recognize the fact that this NULL characters ends the string and ignore it. And then, ‘echo’ appends a newline-character in its place. )
Below is the source-code, on which further source-code could also be based, to output the text:
: fb64-array create 12 * allot does> swap 4 * + ; immediate create fb64-ref-array 'A , 'B , 'C , 'D , 'E , 'F , 'G , 'H , 'I , 'J , 'K , 'L , 'M , 'N , 'O , 'P , 'Q , 'R , 'S , 'T , 'U , 'V , 'W , 'X , 'Y , 'Z , 'a , 'b , 'c , 'd , 'e , 'f , 'g , 'h , 'i , 'j , 'k , 'l , 'm , 'n , 'o , 'p , 'q , 'r , 's , 't , 'u , 'v , 'w , 'x , 'y , 'z , '0 , '1 , '2 , '3 , '4 , '5 , '6 , '7 , '8 , '9 , '+ , '/ , 3 fb64-array output1 variable fb64-sum : test-ref-array 64 0 do fb64-ref-array i cells + c@ emit loop ; : fb64-field ( string -- 32-bit-number ) 0 fb64-sum ! dup 0 do over i + c@ ( string in-char ) fb64-sum @ 64 * fb64-sum ! 64 0 do dup ( string in-char in-char ) fb64-ref-array i cells + c@ ( string in-char in-char test-char ) = if fb64-sum @ i + fb64-sum ! then loop drop loop drop drop fb64-sum @ $00FF0000 and 16 rshift fb64-sum @ $0000FF00 and + fb64-sum @ $000000FF and 16 lshift + \ fb64-sum @ ; : fb64-parse ( string -- array size ) dup 16 / ( string job-size ) rot rot ( job-size string ) 2 pick 0 do 2dup ( size string string ) drop 4 ( size string string/4 ) fb64-field ( size string number ) rot rot ( size subv1 string ) swap 4 + swap ( size subv1 +string ) 2dup ( size subv1 +string +string ) drop 4 ( size subv1 +string +string/4 ) fb64-field ( size subv1 +string number ) $100 /mod ( size subv1 +string mod q ) swap 24 lshift 4 roll ( size +string q mod subv1 ) + i 3 * output1 ! ( size +string q ) rot rot ( size subv2 +string ) swap 4 + swap ( size subv2 ++string ) 2dup drop 4 ( size subv2 ++string ++string/4 ) fb64-field ( size subv2 ++string number ) $10000 /mod ( size subv2 +string mod q ) swap 16 lshift 4 roll ( size ++string q mod-aug subv2 ) + i 3 * 1+ output1 ! ( size ++string q ) rot rot ( size subv3 ++string ) swap 4 + swap ( size subv3 +++string ) 2dup drop 4 ( size subv3 +++string +++string/4 ) fb64-field ( size subv3 +++string number ) 8 lshift 3 roll ( size +++string number subv3 ) + i 3 * 2 + output1 ! ( size +++string ) swap 4 + swap ( size ++++string ) loop ( job-size garbage garbage ) drop drop 0 output1 swap 3 * ( array word-size ) ;
Now, I suppose that somebody might want to say, Show me that final piece, a FORTH terminal session which reproduced the phrase. So here it is:
dirk@Klystron:~$ cd ~/Programs dirk@Klystron:~/Programs$ gforth fb64-parse-2.fs Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type generate1234TERRIBLE.,? ok bye dirk@Klystron:~/Programs$
(Edit : )
One fact which I was not yet satisfied with, when I wrote the posting above, was that my code defines a custom structure globally – which is fine – but that I am constrained to making the declarations of instances of that structure global as well. What this could mean, is that if I wanted to embed more than one piece of binary data in my code, they would all potentially be sharing the buffer I happened to name ‘output1′. This is not fine !
So to improve on this initial design-flaw – the software was still in its earliest development-stage – I decided to create a single buffer-object named ‘fb64-output’, but to make that a 2-dimensional array instead of a 1-dimensional array. So now I get to specify not only how many work units each buffer should have, but also how many parallel instances, of that many work units should be preallocated.
Then, I can use infix notation, even though the language is stack-based, postfix FORTH, when I make my invocations to ‘fb64-parse’, to tell this subroutine which buffer to use, starting from number (0). In this case I started out specifying a default limit of 3 work units – which implies 36 bytes each – and 2 buffers, numbered (0, 1).
I don’t really feel the need to show the reader the improved source code, especially since I may easily improve it again on a whim. But below I have another text-clipping, of how this newer method is used:
dirk@Klystron:~/Programs$ gforth fb64-parse-3.fs Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 0 4 * type generate1234TERRIBLE.,? ok S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 1 4 * type generate1234TERRIBLE.,? ok bye dirk@Klystron:~/Programs$