## About Encoding And Decoding Base-64 In FORTH

In This Previous Posting, I wrote that I had written some source-code in the language FORTH, that decodes standard Base-64 into a binary array of data, in output sizes that are multiples of 36 Bytes. For my own purposes, there might be no need to output Base-64, because I can use command-line utilities to prepare Base-64 strings, and then only use those as a means to enter the data, and embed it into future, hypothetical source code.

But the purposes of other, hypothetical software-developers have not been met with this exercise, because those people may need to be able to output Base-64, which means they’d need a matching encoder.

Unfortunately, the language does not lend itself to that easily, if a standard Base-64 radix is being implied, because 6-bit output-numerals would need to be bit-aligned, and trying to align fields of bits in FORTH is difficult.

(Edit 07/25/2017 : )

One subject which I have investigated more completely now, is the fact that the numeral-to-text conversion utilities built-in to FORTH, seem to continue to produce output, even if a Base of 64 has been set. In theory, the FORTH developers could have adopted a custom radix, in order to be able to state, that their binary-to-FB64 conversion is computed faster, than standard Base-64 could be. But OTOH, the characters output, could just become garbage, by the time 24-bit numerals are to be streamed:


dirk@Klystron:~$gforth Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type license' Type bye' to exit : list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ; ok list-forth-b64 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _  a b c d e f g h i j k l m n o p q r s t u v 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G 1H 1I 1J 1K 1L 1M 1N 1O 1P 1Q 1R 1S 1T 1U 1V 1W 1X 1Y 1Z 1[ 1\ 1] 1^ 1_ 1 1a 1b 1c 1d 1e 1f 1g 1h 1i 1j 1k 1l 1m 1n 1o 1p 1q 1r 1s 1t 1u 1v 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 2G 2H 2I 2J 2K 2L 2M 2N 2O 2P 2Q 2R 2S 2T 2U 2V 2W 2X 2Y 2Z 2[ 2\ 2] 2^ 2_ 2 2a 2b 2c 2d 2e 2f 2g 2h 2i 2j 2k 2l 2m 2n 2o 2p 2q 2r 2s 2t 2u 2v 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L 3M 3N 3O 3P 3Q 3R 3S 3T 3U 3V 3W 3X 3Y 3Z 3[ 3\ 3] 3^ 3_ 3 3a 3b 3c 3d 3e 3f 3g 3h 3i 3j 3k 3l 3m 3n 3o 3p 3q 3r 3s 3t 3u 3v ok bye dirk@Klystron:~$




My conclusion is, that This pseudo- Base-64 streaming remains usable, even when 24-bit numerals are given.

This conclusion reverses a negative, tentative conclusion, which I had only given yesterday.

I have by now coded both the encoder and decoder for standard Base-64, which I’ve named ‘b64-stream’ and ‘b64-parse’ respectively, but as well the encoder and decoder for the pseudo- Base-64, which I call ‘fb64-stream’ and ‘fb64-parse’. At this point, Base-64 has been implemented in a way software-experts would consider complete, with a full non-standard version of Base-64. This is what the code ultimately does:


dirk@Klystron:~$cd ~/Programs dirk@Klystron:~/Programs$ gforth fb64-parse-6.fs
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type license'
Type bye' to exit
S" generate1234TERRIBLE.,?-" 4 / b64-stream b64-parse 0 4 * type generate1234TERRIBLE.,?- ok
S" generate1234TERRIBLE.,?-" 4 / fb64-stream fb64-parse 0 4 * type generate1234TERRIBLE.,?- ok
bye
dirk@Klystron:~/Programs\$




My custom-semantics assume that on the stack, a binary array exists, with a numeric value placed on top of it, which warns each encoder, how many 32-bit words each array holds. OTOH, the input to each decoder expects a standard, full string, which the corresponding encoder outputs, and which also exist as two items on the stack each time, where the top numeral states how many characters long the string is, as per standard FORTH.

And below is the source-code (Updated 08/02/2017 : )

A historic concept about LISP still holds true, that this language stores structures which consist of Lists and Atoms. List are represented in binary as “Dotted Pairs”, with a CAR and a CDR register. The CAR holds the first element of the List, while the CDR holds the tail of the List. It is uncommon but possible, for the CDR also to hold an Atom, while more correctly, a value of NIL in the CDR denotes the end of a List. The CDR function of NIL is defined to be NIL.

( Posting Revised on 09/25/2016 )

( German Translation Revised on 09/25/2016 )

The way in which Atoms are represented in binary, is much more implementation-dependent. In theory, an Atom might only consist of a CAR field. But in technical practice, it is possible for all Atoms to be stored at even-field addresses, and to be associated with CDR-like fields, that would actually be CIR-fields, that are named “Property Lists“. One interesting fact to note, is that while historic versions of LISP made extensive use of these Property Lists, let us say in order to associate the Print-Name of an Atom back from the actual Atom, according to a certain article, “Common LISP” makes very little or no use of explicit Property Lists.

From my own experience, I question this article. For example, the mere fact that the Atom ‘list‘, as referring to the Function, needs to be written #'list , in order to be passed in as an argument, suggests that indeed, its Function-code is stored elsewhere, from its value as an Atom…

While all LISP Versions can process Property Lists defined by the programmer, this posting refers to ones, that were associated with Atoms ‘behind the scenes’. According to the article above, in modern versions of Common LISP, the latter type can no longer be derived from an Atom, as its “Disembodied Property List”.


GCL (GNU Common Lisp)  2.6.12 CLtL1    Oct 28 2014 10:02:30
Modifications of this banner must retain notice of a compatible license
Dedicated to the memory of W. Schelter

Use (help) to get some basic information on how to use GCL.
Temporary directory for compiler files set to /tmp/

>list

Error: UNBOUND-VARIABLE :NAME LIST
Signalled by EVAL.
UNBOUND-VARIABLE :NAME LIST

Broken at EVAL.  Type :H for Help.
>>1

Top level.
>(get 'list)

Error: PROGRAM-ERROR "GET [or a callee] requires more than one argument."
Signalled by GET.
PROGRAM-ERROR "GET [or a callee] requires more than one argument."

Broken at GET.  Type :H for Help.
>>1

Top level.
>(get 'list 'name)

NIL

>(get 'list 'type)

NIL

>(defvar *my-list* '(foo blatz bar))

*MY-LIST*

>(get '*my-list* 'name)

NIL

>(get '*my-list* 'type)

NIL

>(get '*my-list* 'value)

NIL

>*my-list*

(FOO BLATZ BAR)

>(symbol-plist 'list)

(COMPILER::INLINE-ALWAYS
(((T T T T T T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T T) T 1 COMPILER::LIST-INLINE)
((T T T T) T 1 COMPILER::LIST-INLINE)
((T T T) T 1 COMPILER::LIST-INLINE)
((T T) T 1 COMPILER::LIST-INLINE) ((T) T 1 "make_cons(#0,Cnil)")
(NIL T 0 "Cnil"))
COMPILER::RETURN-TYPE T COMPILER::ARG-TYPES (*) COMPILER::LFUN
"Llist" SYSTEM::TYPE-PREDICATE LISTP)

>(symbol-plist '*my-list*)

NIL

>(quit)




What follows is an area in the implementations of LISP, which documentation tends to dance around. What documentation writes about extensively, is how Variables exist in LISP, and how different types of Variables can be used – within the text-syntax of LISP itself. It is important to note, that being able to map Variable-Names to values, is more important than the reverse. I.e., code can name Variables and thus Atoms, such that data structures result, and the Atoms that result within the data structures can be bound to new values rapidly, as LISP is being interpreted. The values of the resulting computations need to be output. But the need is not as strict, for Atoms to be reprinted by their names rapidly.

Further, many types of values associated formally with Variables, can in fact be represented either entirely within one Field, or as Lists again, such that Strings, arbitrary-length Integers etc., can all be printed out easily, starting from Atoms.

Variables are not said to be typed, but rather the objects they point to are. And LISP is said to be strongly typed, in that operations are always forbidden, when attempted on illegal data-types. A “Type-Error” will be generated reliably at run-time, for example, if an attempt is made to evaluate (+ 5 "David") , because "David" is not a number.

It would seem clear, that the fastest test which a LISP Interpreter needs to perform on a value, is whether it references a List or an Atom. And when an attempt is made to Evaluate an Atom, the next question which the interpreter needs to answer, is what data-type is stored in the Atom, to determine whether the instructed operation is in fact legal.

1. Since Lists consist of Dotted Pairs, addresses that point to them are bound to point to an even-numbered Address Field Address, if the CDR-field address is derived by incrementing. Originally, the CDR-field was found by decrementing, which gave rise to its name: “Complement Decrement Register”. So then, a pointer to a List would always have pointed to an odd-numbered Address Field, either way because it always pointed to the CAR. Pointing to an alternately-numbered one – to where a CDR would occur in a List – could signal to the Interpreter, that an Atom is being referenced. While this approach only requires that one bit be examined, to reveal whether the current address points to a List or an Atom, the main drawback would be, that whether a value is either, derives only from the Cell which is pointing to it. Some confusion could start, in which certain address fields point to an object as a List, but in which others point to the same objects, potentially, as an Atom. At that point, the binary image might have gotten corrupted, in a way that can easily happen.
But then this would also imply, that every allocated block is an MMAP Object. This might make more sense on a 64-bit machine, where 48-bit look-up tables are impossible. Instead, the size of allocation blocks could be increased to ~1MB, to reduce the number of MMAP Symbols.