About Encoding And Decoding Base-64 In FORTH

In This Previous Posting, I wrote that I had written some source-code in the language FORTH, that decodes standard Base-64 into a binary array of data, in output sizes that are multiples of 36 Bytes. For my own purposes, there might be no need to output Base-64, because I can use command-line utilities to prepare Base-64 strings, and then only use those as a means to enter the data, and embed it into future, hypothetical source code.

But the purposes of other, hypothetical software-developers have not been met with this exercise, because those people may need to be able to output Base-64, which means they’d need a matching encoder.

Unfortunately, the language does not lend itself to that easily, if a standard Base-64 radix is being implied, because 6-bit output-numerals would need to be bit-aligned, and trying to align fields of bits in FORTH is difficult.

(Edit 07/25/2017 : )

One subject which I have investigated more completely now, is the fact that the numeral-to-text conversion utilities built-in to FORTH, seem to continue to produce output, even if a Base of 64 has been set. In theory, the FORTH developers could have adopted a custom radix, in order to be able to state, that their binary-to-FB64 conversion is computed faster, than standard Base-64 could be. But OTOH, the characters output, could just become garbage, by the time 24-bit numerals are to be streamed:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok
bye 
dirk@Klystron:~$ 


 

My conclusion is, that This pseudo- Base-64 streaming remains usable, even when 24-bit numerals are given.

This conclusion reverses a negative, tentative conclusion, which I had only given yesterday.

I have by now coded both the encoder and decoder for standard Base-64, which I’ve named ‘b64-stream’ and ‘b64-parse’ respectively, but as well the encoder and decoder for the pseudo- Base-64, which I call ‘fb64-stream’ and ‘fb64-parse’. At this point, Base-64 has been implemented in a way software-experts would consider complete, with a full non-standard version of Base-64. This is what the code ultimately does:

 


dirk@Klystron:~$ cd ~/Programs
dirk@Klystron:~/Programs$ gforth fb64-parse-6.fs
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
S" generate1234TERRIBLE.,?-" 4 / b64-stream b64-parse 0 4 * type generate1234TERRIBLE.,?- ok
S" generate1234TERRIBLE.,?-" 4 / fb64-stream fb64-parse 0 4 * type generate1234TERRIBLE.,?- ok
bye 
dirk@Klystron:~/Programs$ 


 

My custom-semantics assume that on the stack, a binary array exists, with a numeric value placed on top of it, which warns each encoder, how many 32-bit words each array holds. OTOH, the input to each decoder expects a standard, full string, which the corresponding encoder outputs, and which also exist as two items on the stack each time, where the top numeral states how many characters long the string is, as per standard FORTH.

And below is the source-code (Updated 08/02/2017 : )

Continue reading About Encoding And Decoding Base-64 In FORTH

Completed Project: FORTH Base-64 Decoder

One of the subjects which I had written about earlier, was the fact that FORTH – the programming language – seems to output text fine, in a peculiar form of Base-64 that must have been quick to encode, and which I assume takes very little CPU power. This Base-64 is not compatible with the Base-64 used in email, and elsewhere where MIME-Types are used. The MIME-Base-64 is the standard one, not the FORTH approach.

This observation next led me to the question of whether FORTH can in fact decode its own, native, Base-64 – i.e. take the text and parse it into numbers again. And surprisingly the answer was No !

So a following question which has gripped me, was how easy or hard it might be to write a standard Base-64 Decoder / Parser, in FORTH.

I have just completed the project, and am willing to share all my results with the reader.

First of all, the Linux shell is an ideal place to test various encodings, because we have command-lines such as ‘base64′ as well as ‘tr’, and we can execute our other text-based programs, for the purpose of taking a picture of the text, which will show reproducible results when we were successful. And so the first text-clipping I have here, is an assortment of experiments I did in the terminal-window. It confirms the weird FORTH mapping, and also gives the standard Base-64 mapping of a snippet of text which I chose, that having been ‘generate1234TERRIBLE.,?‘ , and then uses ‘tr’ to show what that looks like when output by FORTH. Although, I have learned that standard Base-64 reverses its groups of 3 binary bytes, that each lead to 4 Base-64 characters, compared to the usual endianess of Linux computers and others. Therefore, if FORTH was made not to print out a single numeral, but rather a stream, its mapping might be different again from what is shown, due to this endianess.

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok
bye 
dirk@Klystron:~$ 

Base64:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

fb64:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv

tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

echo 'generate1234TERRIBLE.,?' | base64
echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'

dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64
Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K
dirk@Klystron:~$ echo 'generate1234TERRIBLE.,?' | base64 | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/' '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuv'
PmLePN9XT6KhCZCkL4LIK[\2J4KeB3sA
dirk@Klystron:~$ 

S" Z2VuZXJhdGUxMjM0VEVSUklCTEUuLD8K" fb64-parse 4 * type


 

I did not write an Encoder, i.e. a binary-to-Base-64 module. My main question was, if I wanted to embed binary data as prepared literal strings in FORTH, could I use a FORTH word defined in FORTH? The answer is Yes.

Also, I did not implement the ‘=’ symbol, which is used to denote that the Byte-alignment is not in sync with the character-alignment. Every 4 characters correspond to 3 Bytes, and if a random number of Bytes is assumed, there is a 2/3 chance that the last Byte or Bytes in the sequence, will not be a multiple of 3, and thus not conform to a multiple of 4 characters. Normally, this is why some Base-64 streams end in 1 or 2 ‘=’ symbols, to denote how many virtual Bytes were set to zero to allow the encoding to take place, but which are not to be decoded.

In fact, I made the rather blunt assumption that my FORTH code would assume that an integer number of work-units is to be completed, each of which is to take up 3 * 32 bits of binary, corresponding to 16 Base-64 characters. Incomplete work units will not be parsed by this program. Hence, if it was to be used to embed literal binary data, then those literals would certainly need to be prepared by software.

(Edit 07/22/2017 : The reason fw my Base-256 text-snippet only has 23 visible characters, where 24 might seem in order, thus completing 2 work units, was my knowledge that my software ends its strings with NULL characters – which FORTH actually does not do. If this was all there is to it, then the 4th-to-last Base-64 character should be an ‘A’, since the Bytes are read in in Big-Endian form, the way a standard stream is encoded. But what the Linux-based ‘echo’ command next does, is recognize the fact that this NULL characters ends the string and ignore it. And then, ‘echo’ appends a newline-character in its place. )

Below is the source-code, on which further source-code could also be based, to output the text:

Continue reading Completed Project: FORTH Base-64 Decoder

I first learned about FORTH as a child.

(Last Updated 07/22/2017 , 18h20 ; See Below.)

When I was a child, I used to subscribe to a magazine called “Byte Magazine”. In it, every few months a different programming language was featured, each for several magazine issues, and each a language that had markedly different semantics. Among those languages were:

  • Pascal
  • C
  • LISP
  • FORTH
  • Prolog
  • Smalltalk
  • etc..

I do not really recall everything I read about some of the languages, such as about Pascal or Smalltalk. But certain languages stood out, among those LISP, FORTH and Prolog, and later in life, I studied C++  – which did not exist yet in those earlier years –  in depth.

What I also learned, was that the way in which Byte ‘taught’ those languages was sometimes flawed, and I eventually saw no urgent need to hold on to the old magazines, say as souvenirs.

The sum-total of what I really learned from that magazine, about FORTH specifically, can be summarized in these two Web-articles:

https://users.ece.cmu.edu/~koopman/forth/hopl.html

http://www.forth.org/svfig/Len/definwds.htm

What I noticed yesterday, was that when I install ‘GNU-ish Forth’ on one of my 64-bit Linux systems, the cells of data become 64-bit cells and not 32-bit cells, and the following code-snippet demonstrates that:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
1 cells . 8  ok
bye 
dirk@Klystron:~$ 


 

The FORTH-word ‘cells‘ multiplies its input by the (fixed) number of bytes each cell of data occupies, which is useful for computing byte-aligned addresses. The FORTH-word ‘.‘ prints the top object on the stack, and consumes it.

This 64-bit and GNU support can actually stand in the way of using FORTH, as the language will sometimes be embedded into devices that only have very rudimentary CPUs. When embedded, the language has as main advantage, being able to carry out complex tasks with very compact code, and to do so very fast. But, in order really to be Linux-compatible, some of its low-level words have been redefined, to take into account the fact that busy-wait loops are not an acceptable form of I/O-control, and to take memory-management into account. When embedded, a 32-bit CPU is assumed at max that has no protected-memory features. Here, busy-wait loops are the way to go…

But of course, my interest in Forth has been awakened briefly, because I did read that it gets used in Bitcoin Script. One of the facts about a scripting language, that will generally make it different from mainstream programming languages, is that single instructions – in this case FORTH-words – can be made to stand for complex operations, that they would not stand for when the syntax is used in the programming language, which a scripting language was derived from. This goes beyond what an SDK does. I wanted to add a few words on that last detail.

In order to work as Bitcoin Script, FORTH needs to have single words, that also do complex things, such as:

  • Decrypt the top item on the stack, using the second item as the public key, with the intention of generation a 256-bit Transaction-Hash.
  • Hash the top item on the stack, with the intention of generating a public address, on the assumption that the input was a public key…

All the above words require a much-wider cell, or register-width, than 64-bit. And I can think of 3 hypothetical reasons, for which this would work:

  1. Each word to be used in Bitcoin Script could have been defined in FORTH itself, as subroutines and data-structures,
  2. Each word could exist as highly-optimized machine-language, written in Assembler,
  3. The server on which these transactions are processed, could be treating the FORTH-like words as a means of ‘Markup’, to denote what the transaction is supposed to do, but without actually executing them.

And yet, while running on servers with powerful (presumably 64-bit) CPUs, Bitcoin supports 4-byte integers. ( :1 )

In any case, it is not allowed for a subroutine-definition, or a data-structure definition, or a loop, to appear in a Transaction Script. This is only logical.

And as it stands, my interest in this subject are academic in nature, which makes me a very different animal, from people who might be deeply interested in Bitcoin mining, or other Bitcoin / money-related pursuits. To me, a hash-code is an accidental thing a computer is capable of generating as output, and of reusing as input. There exist numerous hashing algorithms, and the computer happens to be able to compute any of them.

To a person who is serious about mining, specialized hardware with hundreds of physical cores, has been optimized to execute one specific hashing-algorithm, hundreds of millions of times per second if not billions, as the main purpose for that device.

I find this twist in the evolution of Computing, ‘Stranger Than Strange’.

Dirk

(Edit 07/18/2017 : )

If the reader truly wanted to embed FORTH, then the only way to go, would be to download the C source-code for the so-called ‘kernel’ (the bytecode interpreter), and cross-compile that for the embedded CPU. In principle, any portable version will do, but the following seems suitable:

Continue reading I first learned about FORTH as a child.