Completed Project: FORTH Base-64 Decoder

One of the subjects which I had written about earlier, was the fact that FORTH – the programming language – seems to output text fine, in a peculiar form of Base-64 that must have been quick to encode, and which I assume takes very little CPU power. This Base-64 is not compatible with the Base-64 used in email, and elsewhere where MIME-Types are used. The MIME-Base-64 is the standard one, not the FORTH approach.

This observation next led me to the question of whether FORTH can in fact decode its own, native, Base-64 – i.e. take the text and parse it into numbers again. And surprisingly the answer was No !

So a following question which has gripped me, was how easy or hard it might be to write a standard Base-64 Decoder / Parser, in FORTH.

I have just completed the project, and am willing to share all my results with the reader.

First of all, the Linux shell is an ideal place to test various encodings, because we have command-lines such as ‘base64′ as well as ‘tr’, and we can execute our other text-based programs, for the purpose of taking a picture of the text, which will show reproducible results when we were successful. And so the first text-clipping I have here, is an assortment of experiments I did in the terminal-window. It confirms the weird FORTH mapping, and also gives the standard Base-64 mapping of a snippet of text which I chose, that having been ‘generate1234TERRIBLE.,?‘ , and then uses ‘tr’ to show what that looks like when output by FORTH. Although, I have learned that standard Base-64 reverses its groups of 3 binary bytes, that each lead to 4 Base-64 characters, compared to the usual endianess of Linux computers and others. Therefore, if FORTH was made not to print out a single numeral, but rather a stream, its mapping might be different again from what is shown, due to this endianess.


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok



tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

echo 'generate1234TERRIBLE.,?' | base64
echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64
dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type


I did not write an Encoder, i.e. a binary-to-Base-64 module. My main question was, if I wanted to embed binary data as prepared literal strings in FORTH, could I use a FORTH word defined in FORTH? The answer is Yes.

Also, I did not implement the ‘=’ symbol, which is used to denote that the Byte-alignment is not in sync with the character-alignment. Every 4 characters correspond to 3 Bytes, and if a random number of Bytes is assumed, there is a 2/3 chance that the last Byte or Bytes in the sequence, will not be a multiple of 3, and thus not conform to a multiple of 4 characters. Normally, this is why some Base-64 streams end in 1 or 2 ‘=’ symbols, to denote how many virtual Bytes were set to zero to allow the encoding to take place, but which are not to be decoded.

In fact, I made the rather blunt assumption that my FORTH code would assume that an integer number of work-units is to be completed, each of which is to take up 3 * 32 bits of binary, corresponding to 16 Base-64 characters. Incomplete work units will not be parsed by this program. Hence, if it was to be used to embed literal binary data, then those literals would certainly need to be prepared by software.

(Edit 07/22/2017 : The reason fw my Base-256 text-snippet only has 23 visible characters, where 24 might seem in order, thus completing 2 work units, was my knowledge that my software ends its strings with NULL characters – which FORTH actually does not do. If this was all there is to it, then the 4th-to-last Base-64 character should be an ‘A’, since the Bytes are read in in Big-Endian form, the way a standard stream is encoded. But what the Linux-based ‘echo’ command next does, is recognize the fact that this NULL characters ends the string and ignore it. And then, ‘echo’ appends a newline-character in its place. )

Below is the source-code, on which further source-code could also be based, to output the text:

Continue reading Completed Project: FORTH Base-64 Decoder