I first learned about FORTH as a child.

(Last Updated 07/22/2017 , 18h20 ; See Below.)

When I was a child, I used to subscribe to a magazine called “Byte Magazine”. In it, every few months a different programming language was featured, each for several magazine issues, and each a language that had markedly different semantics. Among those languages were:

  • Pascal
  • C
  • LISP
  • FORTH
  • Prolog
  • Smalltalk
  • etc..

I do not really recall everything I read about some of the languages, such as about Pascal or Smalltalk. But certain languages stood out, among those LISP, FORTH and Prolog, and later in life, I studied C++  – which did not exist yet in those earlier years –  in depth.

What I also learned, was that the way in which Byte ‘taught’ those languages was sometimes flawed, and I eventually saw no urgent need to hold on to the old magazines, say as souvenirs.

The sum-total of what I really learned from that magazine, about FORTH specifically, can be summarized in these two Web-articles:

https://users.ece.cmu.edu/~koopman/forth/hopl.html

http://www.forth.org/svfig/Len/definwds.htm

What I noticed yesterday, was that when I install ‘GNU-ish Forth’ on one of my 64-bit Linux systems, the cells of data become 64-bit cells and not 32-bit cells, and the following code-snippet demonstrates that:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
1 cells . 8  ok
bye 
dirk@Klystron:~$ 


 

The FORTH-word ‘cells‘ multiplies its input by the (fixed) number of bytes each cell of data occupies, which is useful for computing byte-aligned addresses. The FORTH-word ‘.‘ prints the top object on the stack, and consumes it.

This 64-bit and GNU support can actually stand in the way of using FORTH, as the language will sometimes be embedded into devices that only have very rudimentary CPUs. When embedded, the language has as main advantage, being able to carry out complex tasks with very compact code, and to do so very fast. But, in order really to be Linux-compatible, some of its low-level words have been redefined, to take into account the fact that busy-wait loops are not an acceptable form of I/O-control, and to take memory-management into account. When embedded, a 32-bit CPU is assumed at max that has no protected-memory features. Here, busy-wait loops are the way to go…

But of course, my interest in Forth has been awakened briefly, because I did read that it gets used in Bitcoin Script. One of the facts about a scripting language, that will generally make it different from mainstream programming languages, is that single instructions – in this case FORTH-words – can be made to stand for complex operations, that they would not stand for when the syntax is used in the programming language, which a scripting language was derived from. This goes beyond what an SDK does. I wanted to add a few words on that last detail.

In order to work as Bitcoin Script, FORTH needs to have single words, that also do complex things, such as:

  • Decrypt the top item on the stack, using the second item as the public key, with the intention of generation a 256-bit Transaction-Hash.
  • Hash the top item on the stack, with the intention of generating a public address, on the assumption that the input was a public key…

All the above words require a much-wider cell, or register-width, than 64-bit. And I can think of 3 hypothetical reasons, for which this would work:

  1. Each word to be used in Bitcoin Script could have been defined in FORTH itself, as subroutines and data-structures,
  2. Each word could exist as highly-optimized machine-language, written in Assembler,
  3. The server on which these transactions are processed, could be treating the FORTH-like words as a means of ‘Markup’, to denote what the transaction is supposed to do, but without actually executing them.

And yet, while running on servers with powerful (presumably 64-bit) CPUs, Bitcoin supports 4-byte integers. ( :1 )

In any case, it is not allowed for a subroutine-definition, or a data-structure definition, or a loop, to appear in a Transaction Script. This is only logical.

And as it stands, my interest in this subject are academic in nature, which makes me a very different animal, from people who might be deeply interested in Bitcoin mining, or other Bitcoin / money-related pursuits. To me, a hash-code is an accidental thing a computer is capable of generating as output, and of reusing as input. There exist numerous hashing algorithms, and the computer happens to be able to compute any of them.

To a person who is serious about mining, specialized hardware with hundreds of physical cores, has been optimized to execute one specific hashing-algorithm, hundreds of millions of times per second if not billions, as the main purpose for that device.

I find this twist in the evolution of Computing, ‘Stranger Than Strange’.

Dirk

(Edit 07/18/2017 : )

If the reader truly wanted to embed FORTH, then the only way to go, would be to download the C source-code for the so-called ‘kernel’ (the bytecode interpreter), and cross-compile that for the embedded CPU. In principle, any portable version will do, but the following seems suitable:

Continue reading I first learned about FORTH as a child.

SegWit versus Bitcoin Unlimited

In the operation of Bitcoin, there is a crisis brewing. Bitcoin mining pools are restricted by a feedback-control loop which they defined, in the days of the inception of Bitcoin, only to be able to “mine” 1 Block every 10 minutes, which is inserted into the block-chain when that has been achieved, and which acts as a slate, onto which all Bitcoin transactions must be recorded. A block currently occupies 1MB of data, and the size of individual transactions to be recorded in it, can vary greatly, but averages 500 Bytes for the simplest types of transactions.

What this means, is that 1 Block can typically hold 2000 transactions, but will only be mined every 600 seconds, on the average, so that a global rate of available transactions of 3 /second would seem to follow. Because the world that uses Bitcoin is presently trying to conduct transactions at a faster pace – because in some parts of the world, Bitcoin is a currency – there have sometimes been escalations in the transaction fees, which are charged according to how many KB of one block the transaction consumes. Recently, the transaction fees were around 2 milliBits/kB, or 0.002 BTC/kB. That counts as extremely expensive, since 1 BTC is approximately worth CAD 2000 right now, meaning that the transaction fee could be around CAD 1.- . Many items and services people might want to buy, would not cost much more than the CAD 1.- of my country’s real currency.

Bitcoin operators recognize that a problem exists, and there exist two camps who are in conflict right now, over what to do about it:

  1. Bitcoin Unlimited
  2. Bitcoin Core

Out of these two groups, Bitcoin Core represents the status quo.

What Bitcoin Unlimited is proposing, sounds straightforward: Increase the block-size, and allow for it to be increased again, as the demand for transactions increases in the future. But there are some problems with what Bitcoin Unlimited is proposing. Changing the format of the blocks, and/or the protocol, would essentially create two types of currencies:

  • The legacy currency, which might be referred to as BTC -units, and
  • A new form of Bitcoins, which might be referred to as XBT -units.

If one mining pool was unilaterally to adopt a new Bitcoin block-format, this would be called a ‘Hard Fork’. In the event of a hard fork, people who held Bitcoin-amounts on the legacy block-chain, would see that that legacy block-chain has been imported into both newly-forming block-chains. Wallet-holders would therefore see, that they have a double-dipping; they’d have their present amounts both as BTC and as XBT. But, if they were to mix their Bitcoins into a new block, using a type of transaction that has more than one input, those Bitcoin-amounts would become invalid on the other, newly-differing chain.

There would be a bigger problem in the event of a hard fork, which is known as the risk of a Replay Attack. Because the legacy chain was not designed with the possibility of two future continuations in mind, after a hard fork, if a wallet-holder was to pay for a service in, say, XBT, then the payee – or somebody else – would be able to use the credentials the wallet-holder used in his own transactions, which worked for XBT, and could replay those transactions – actually signing the same inputs as before surreptitiously – but this time, pay out the amounts to another public address, as a BTC -amount. So the payer would see both his XBT and his BTC coins disappear, while only having authorized one of the two transactions himself, out of the legacy-chain.

As an alternative to a hard fork, Bitcoin Core is proposing a short-term solution named “SegWit”. This is a proposed solution, that will cause fewer bytes of data to be written to the block, per transaction, and which it is hoped, will allow the increased number of transactions to be served, that the world is asking for right this instant.

This is a video, which describes in storybook-form, what SegWit proposes. Essentially, a Bitcoin transaction possesses a list of inputs, and a list of outputs. No public address really owns the transaction, but the transaction proposes to use unspent outputs, made to the person who holds the private key, for the public key the outputs were made to – that constitute his Bitcoin balance – as an input or as inputs, and to state somebody else’s public address as the recipient of the outputs.

What Bitcoin experts know, is that the inputs-list of a transaction takes up much more space, on average, than the outputs list does. The reason for this, is the fact that in order to specify an input, Bitcoin is so restrictive, that one specific output must be identified, and not just one public address, to which the output was received. This means, that the complete Transaction ID must be stated in the inputs-list, as well as the sequence-number, the position of the output received, in the outputs-list, of the specified transaction, so that the current wallet-holder can use it as an input. This is expensive in the number of bytes used, because in addition to that, the public key of the recipient must be embedded in the inputs-list, as well as the signature with which the holder of the public address authorized the received output be unlocked.

  1. Transaction ID : 32 + 4 Bytes
  2. Sequence Number : 4 Bytes
  3. Public Key : 33 or 65 Bytes
  4. Signature : ~72 Bytes
  5. Additional Bytes used for script-instructions: (?)

But there is an added twist to how Bitcoin works. We might expect that this information is simply entered by software, into fields that belong to a transaction with a specific syntax. But as it stands, only the fields (1) and (2) above exist statically in a transaction. The rest is embedded as literal constants, in a very scaled-down forth-like language called “Script“.

The way script works in general, is that the piece of script in the inputs-list of the recipient, is executed before the piece of script in the outputs-list of the sender. The script of the output defines the conditions that need to be met, in order for the recipient – in another transaction than the transaction that states this script as an output – to be able to claim the amount. In order to satisfy those conditions, the script of the recipient – in his inputs-list – leaves a number of objects on a stack, which the script in the outputs-list of the sender can use as input.

According to the way Bitcoin has traditionally worked then, was that the input defined by the recipient, would place his public-key, and his signature onto the stack, in expectation that this is what the output of the sender requires. The input does this in the form of an embedded public key, and an embedded signature, in its actual script, which were placed there by software running normally on computers, and not running through script.

But the output script of the sender would then expect these data to be on the stack in a predefined order, and would verify that two conditions were met:

  1. When hashed, the public-key must equal the public address of the intended recipient (Those are not really the same thing), ( :1 )
  2. The signature must be decrypted by the same public key, to equal a hash of the transaction being claimed as input.

In order for the outputs-script to verify this, it must also receive the Transaction ID of the transaction it belongs to on the stack, but this is not the responsibility of the inputs-script of the recipient to place there.

What SegWit proposes, is that the inputs-script should neither need to state the public key of the recipient anymore, nor his actual signature, thus making the transaction shorter. Thereby, something needs to signal, that this public key should be taken ‘from the side’, from the “Extended Block”, but fed to an established output script.

While this is all very fascinating, it reveals two very serious problems of its own:

Continue reading SegWit versus Bitcoin Unlimited