(Last Updated 07/22/2017 , 18h20 ; See Below.)
When I was a child, I used to subscribe to a magazine called “Byte Magazine”. In it, every few months a different programming language was featured, each for several magazine issues, and each a language that had markedly different semantics. Among those languages were:
I do not really recall everything I read about some of the languages, such as about Pascal or Smalltalk. But certain languages stood out, among those LISP, FORTH and Prolog, and later in life, I studied C++ – which did not exist yet in those earlier years – in depth.
What I also learned, was that the way in which Byte ‘taught’ those languages was sometimes flawed, and I eventually saw no urgent need to hold on to the old magazines, say as souvenirs.
The sum-total of what I really learned from that magazine, about FORTH specifically, can be summarized in these two Web-articles:
What I noticed yesterday, was that when I install ‘GNU-ish Forth’ on one of my 64-bit Linux systems, the cells of data become 64-bit cells and not 32-bit cells, and the following code-snippet demonstrates that:
dirk@Klystron:~$ gforth Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc. Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license' Type `bye' to exit 1 cells . 8 ok bye dirk@Klystron:~$
The FORTH-word ‘
cells‘ multiplies its input by the (fixed) number of bytes each cell of data occupies, which is useful for computing byte-aligned addresses. The FORTH-word ‘
.‘ prints the top object on the stack, and consumes it.
This 64-bit and GNU support can actually stand in the way of using FORTH, as the language will sometimes be embedded into devices that only have very rudimentary CPUs. When embedded, the language has as main advantage, being able to carry out complex tasks with very compact code, and to do so very fast. But, in order really to be Linux-compatible, some of its low-level words have been redefined, to take into account the fact that busy-wait loops are not an acceptable form of I/O-control, and to take memory-management into account. When embedded, a 32-bit CPU is assumed at max that has no protected-memory features. Here, busy-wait loops are the way to go…
But of course, my interest in Forth has been awakened briefly, because I did read that it gets used in Bitcoin Script. One of the facts about a scripting language, that will generally make it different from mainstream programming languages, is that single instructions – in this case FORTH-words – can be made to stand for complex operations, that they would not stand for when the syntax is used in the programming language, which a scripting language was derived from. This goes beyond what an SDK does. I wanted to add a few words on that last detail.
In order to work as Bitcoin Script, FORTH needs to have single words, that also do complex things, such as:
- Decrypt the top item on the stack, using the second item as the public key, with the intention of generation a 256-bit Transaction-Hash.
- Hash the top item on the stack, with the intention of generating a public address, on the assumption that the input was a public key…
All the above words require a much-wider cell, or register-width, than 64-bit. And I can think of 3 hypothetical reasons, for which this would work:
- Each word to be used in Bitcoin Script could have been defined in FORTH itself, as subroutines and data-structures,
- Each word could exist as highly-optimized machine-language, written in Assembler,
- The server on which these transactions are processed, could be treating the FORTH-like words as a means of ‘Markup’, to denote what the transaction is supposed to do, but without actually executing them.
And yet, while running on servers with powerful (presumably 64-bit) CPUs, Bitcoin supports 4-byte integers. ( :1 )
In any case, it is not allowed for a subroutine-definition, or a data-structure definition, or a loop, to appear in a Transaction Script. This is only logical.
And as it stands, my interest in this subject are academic in nature, which makes me a very different animal, from people who might be deeply interested in Bitcoin mining, or other Bitcoin / money-related pursuits. To me, a hash-code is an accidental thing a computer is capable of generating as output, and of reusing as input. There exist numerous hashing algorithms, and the computer happens to be able to compute any of them.
To a person who is serious about mining, specialized hardware with hundreds of physical cores, has been optimized to execute one specific hashing-algorithm, hundreds of millions of times per second if not billions, as the main purpose for that device.
I find this twist in the evolution of Computing, ‘Stranger Than Strange’.
(Edit 07/18/2017 : )
If the reader truly wanted to embed FORTH, then the only way to go, would be to download the C source-code for the so-called ‘kernel’ (the bytecode interpreter), and cross-compile that for the embedded CPU. In principle, any portable version will do, but the following seems suitable:
Now, there is a Debian-packaged version of this, which has numerous extensions, that the embedded version would not have. That Debian-packaged version is only available in 32-bit form – needless to say. A 64-bit version would be out-of-context here.
If the reader is wondering why I would not recommend trying to port ‘
gforth‘ to an embedded CPU, I would say that simple embedded environments may not even have Hard-Drive support. Even though users may expect Hard-Drive or SSD support, the type of rudimentary CPU I’ve been referring to, only has ROM, and some small amount of RAM as its hardware.
What makes ‘
gforth‘ GNU-compatible, is the fact that it will read its configuration files from standard directories, arranged according to how this version of Linux arranges them. This implies, that Hard-Drive support must be enabled.
Simple embedded CPUs that I think of, operate vending machines, not Hard-Drives.
Now, the authors of ‘
pforth‘ seem to be hinting, that embedded CPUs are also possible, that allow music synthesizers to work. And, music synths will nowadays also possess Hard-Drives, on which their users can store music clips.
From the perspective in which I first heard of FORTH, such a music synthesizer would have been deemed Science-Fiction already. If this more-recently allows a version of FORTH to have been written, with Asynchronous Execution, with HD-support etc., then these extensions should be regarded as luxuries. Porting FORTH to a specific embedded environment is likely to require, that many of them be dropped. For example, if the embedded CPU has No Floating-Point Unit, then this would just be one more extension to drop from the FORTH implementation…
1: ) I make this observation to try to understand, whether the Bitcoin Script could in fact be executed as an example of FORTH, the way the cited reference suggests. And in order to understand this question, we need to separate more-accurately, what the Bitcoin Script is actually expected to evaluate, and how much of the process consists of setting up the context for the script to run in:
- The ‘Transaction ID’ Consists of a 256-bit ‘Transaction Hash’, which is therefore also a ’32-byte’ or a ‘4-cell’ code, plus a 4-Byte ‘Block Index’, which states a single integer. This 4-Byte integer is of some concern, because on a 64-bit, FORTH-capable box, it would only have half the width of a cell. Yet, there already exist reasons I cited in my posting on SegWit, to think that only the 32-Byte value is (re)produced by the actual script. The 4-Byte Block Index would be used by other software running on the server, to fetch a transaction and feed it to the script.
- The 4-Byte ‘Sequence Number’ may equally lie outside the context of the script to compute with. If it did, then this could be another plausible reason fw the “malleability bug” did exist. This bug should successfully be remedied, when and if each output Address has its own script. But as the cited video in my other posting stated, this only came along, with “Hash-To-Script (Type 3) Transactions”.
- Mind you, the possibility should generally exist for the ‘calling context’ to convert a 4-byte number – such as the Sequence Number – into an 8-byte number with the same value, before feeding it to the script. But in that case we should see script somewhere, that accepts this additional parameter. Only the parsing of actual blocks and their transactions – of data structures – would be fettered.
- Finally, while I did state that the Address is a hash of the public key, it is not only that. The Address also contains a 16-bit check-sum. Hence, to protect against possible typos, with a 65535/65536 success-rate, a check-sum is appended to the hash, and an error-message will prevent a naive wallet-user from sending an amount to a mistyped address. This check-sum would also need to be recomputed, IF a FORTH-like script was to try computing a matching, complete Address. OR, the contextual code could drop it from the value entered in the transaction. In any case, This Article Suggests, that ‘while numeric values must be stored within at least single-width cells’, which according to me may be 8-byte cells on a 64-bit implementation, the language contains ‘features that allow the numeric values to be converted into ASCII-sequences’ (i.e. into a textual representation), ‘as creatively as the programmer would like’. And therefore, the task could also be left up to FORTH, to generate an encoded Address…
(Reverted 07/22/2017 , 18h20 : )
According to my analysis, it seems possible, that Bitcoin servers have actually been executing the FORTH-Scripts.
But, an important detail which must be true, in order for this to be so, is that some of the FORTH-words must either have been written in Assembler, or compiled from C with optimization, in order for this to be possible.
And what this means, is that the implementations of the FORTH words must not have been in FORTH itself.