A historic concept about LISP still holds true, that this language stores structures which consist of Lists and Atoms. List are represented in binary as “Dotted Pairs”, with a
CAR and a
CDR register. The
CAR holds the first element of the List, while the
CDR holds the tail of the List. It is uncommon but possible, for the
CDR also to hold an Atom, while more correctly, a value of
NIL in the
CDR denotes the end of a List. The
CDR function of
NIL is defined to be
( Posting Revised on 09/25/2016 )
( German Translation Revised on 09/25/2016 )
The way in which Atoms are represented in binary, is much more implementation-dependent. In theory, an Atom might only consist of a
CAR field. But in technical practice, it is possible for all Atoms to be stored at even-field addresses, and to be associated with
CDR-like fields, that would actually be
CIR-fields, that are named “Property Lists“. One interesting fact to note, is that while historic versions of LISP made extensive use of these Property Lists, let us say in order to associate the Print-Name of an Atom back from the actual Atom, according to a certain article, “Common LISP” makes very little or no use of explicit Property Lists.
From my own experience, I question this article. For example, the mere fact that the Atom ‘
list‘, as referring to the Function, needs to be written
#'list , in order to be passed in as an argument, suggests that indeed, its Function-code is stored elsewhere, from its value as an Atom…
While all LISP Versions can process Property Lists defined by the programmer, this posting refers to ones, that were associated with Atoms ‘behind the scenes’. According to the article above, in modern versions of Common LISP, the latter type can no longer be derived from an Atom, as its “Disembodied Property List”.
GCL (GNU Common Lisp) 2.6.12 CLtL1 Oct 28 2014 10:02:30 Source License: LGPL(gcl,gmp), GPL(unexec,bfd,xgcl) Binary License: GPL due to GPL'ed components: (XGCL READLINE UNEXEC) Modifications of this banner must retain notice of a compatible license Dedicated to the memory of W. Schelter Use (help) to get some basic information on how to use GCL. Temporary directory for compiler files set to /tmp/ >list Error: UNBOUND-VARIABLE :NAME LIST Fast links are on: do (si::use-fast-links nil) for debugging Signalled by EVAL. UNBOUND-VARIABLE :NAME LIST Broken at EVAL. Type :H for Help. 1 Return to top level. >>1 Top level. >(get 'list) Error: PROGRAM-ERROR "GET [or a callee] requires more than one argument." Fast links are on: do (si::use-fast-links nil) for debugging Signalled by GET. PROGRAM-ERROR "GET [or a callee] requires more than one argument." Broken at GET. Type :H for Help. 1 Return to top level. >>1 Top level. >(get 'list 'name) NIL >(get 'list 'type) NIL >(defvar *my-list* '(foo blatz bar)) *MY-LIST* >(get '*my-list* 'name) NIL >(get '*my-list* 'type) NIL >(get '*my-list* 'value) NIL >*my-list* (FOO BLATZ BAR) >(symbol-plist 'list) (COMPILER::INLINE-ALWAYS (((T T T T T T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T T) T 1 COMPILER::LIST-INLINE) ((T T T T) T 1 COMPILER::LIST-INLINE) ((T T T) T 1 COMPILER::LIST-INLINE) ((T T) T 1 COMPILER::LIST-INLINE) ((T) T 1 "make_cons(#0,Cnil)") (NIL T 0 "Cnil")) COMPILER::RETURN-TYPE T COMPILER::ARG-TYPES (*) COMPILER::LFUN "Llist" SYSTEM::TYPE-PREDICATE LISTP) >(symbol-plist '*my-list*) NIL >(quit)
What follows is an area in the implementations of LISP, which documentation tends to dance around. What documentation writes about extensively, is how Variables exist in LISP, and how different types of Variables can be used – within the text-syntax of LISP itself. It is important to note, that being able to map Variable-Names to values, is more important than the reverse. I.e., code can name Variables and thus Atoms, such that data structures result, and the Atoms that result within the data structures can be bound to new values rapidly, as LISP is being interpreted. The values of the resulting computations need to be output. But the need is not as strict, for Atoms to be reprinted by their names rapidly.
Further, many types of values associated formally with Variables, can in fact be represented either entirely within one Field, or as Lists again, such that Strings, arbitrary-length Integers etc., can all be printed out easily, starting from Atoms.
Variables are not said to be typed, but rather the objects they point to are. And LISP is said to be strongly typed, in that operations are always forbidden, when attempted on illegal data-types. A “Type-Error” will be generated reliably at run-time, for example, if an attempt is made to evaluate
(+ 5 "David") , because
"David" is not a number.
It would seem clear, that the fastest test which a LISP Interpreter needs to perform on a value, is whether it references a List or an Atom. And when an attempt is made to Evaluate an Atom, the next question which the interpreter needs to answer, is what data-type is stored in the Atom, to determine whether the instructed operation is in fact legal.
From what I recall, there are two ways to go about this in principle:
- Since Lists consist of Dotted Pairs, addresses that point to them are bound to point to an even-numbered Address Field Address, if the
CDR-field address is derived by incrementing. Originally, the
CDR-field was found by decrementing, which gave rise to its name: “Complement Decrement Register”. So then, a pointer to a List would always have pointed to an odd-numbered Address Field, either way because it always pointed to the
CAR. Pointing to an alternately-numbered one – to where a
CDRwould occur in a List – could signal to the Interpreter, that an Atom is being referenced. While this approach only requires that one bit be examined, to reveal whether the current address points to a List or an Atom, the main drawback would be, that whether a value is either, derives only from the Cell which is pointing to it. Some confusion could start, in which certain address fields point to an object as a List, but in which others point to the same objects, potentially, as an Atom. At that point, the binary image might have gotten corrupted, in a way that can easily happen.
- Blocks or pages of nodes can be mapped, either as containing Lists or Atoms. Thus, the determination can again be based on their addresses, but without ever introducing addresses, which are technically illegal according to CPU-logic. On a 32-bit machine, this method might work better, because 256K-blocks could be assumed, which means that the 16 Most Significant Bits of the address could be taken directly as a sub-register, could be Logically Right-Shifted 2 bits, could be used to perform a single 14-bit look-up in an array that holds 16K, 1-Byte entries, and the result could be a zero or non-zero. As long as this operation would potentially need to be performed for every address processed, the fact that the look-up table spans 16KB eclipses the L1 cache somewhat. But, each look-up only hits one line of the cache, meaning that performance will only be reduced, if that line of the cache is also being used for something else, or if the block to look up suddenly changes – to map to a different line of the cache.
- For each memory page that holds either Dotted Pairs or Atoms, the lowest address-range that would be needed to store one Dotted Pair, might be conceived never to store one, instead storing meta-data about the page. And then the first piece of meta-data stored, could state whether the page holds Lists or Atoms.
To find this information about an address, would then require the address be masked with ‘…1111000000000000′, after which the value stored at that location reveals the required information. The main drawback is that while this needs to be found about every address processed, it would still slow down the Interpreter. And, it isn’t compatible with with the way memory cache is implemented, generally hitting the same line in the cache every time.
I could expect that an even better solution can exist, which takes advantage of modern, Virtual Memory: Even Virtual Address Blocks can contain Dotted Pairs, while odd Virtual Address Blocks can contain Atoms. The beauty of this approach would be the fact, that again a single bit can be tested to recognize this. But after that, the same address can be dereferenced as-is, to reveal the eventual value of an Atom.
But then this would also imply, that every allocated block is an
MMAP Object. This might make more sense on a 64-bit machine, where 48-bit look-up tables are impossible. Instead, the size of allocation blocks could be increased to ~1MB, to reduce the number of
Yet, increasing the size of allocation blocks dramatically, also reduces the probability with which any one block can be de-allocated again, which can result in memory images that just continue to grow.
The simplest type of Atom, is just called a ‘Symbol’. If the system possesses a Property List, this is initialized to
NIL. But more -interesting types of Atoms also exist, and I read that the data-types within LISP form a hierarchy, such that belonging to one data-type is more a True or False condition, than an exclusive state. The parent of all LISP data-types is in fact
T. A specific data-type can be denoted by a single Atom, which is forward-linked to its parent data-type, by way of its Property List, all the way up to
A Symbol could be any Atom which has as value, either another single Atom, or
NIL. If this principle was combined with the general omission of Property Lists,
- The memory pages that contain Atoms could be made greatly fewer than those which contain Dotted Pairs, and
- The evaluation of Symbols could be greatly accelerated, because then, their values would simply be their (only)
What strikes me as most-sensible, is not always how it is done. But most sensibly to me, what the Atom points to, with its
CAR, could alternatively be a linked list, the first element of which (
CAAR ) would be the data-type-atom, including the type-atom for another List, even if the defined List is
NIL. The actual value might start from the
CDAR field, of the Atom…
BTW: One of the Properties of an Atom which Common LISP does keep track of, was the Package it originated from. And then, before programmers decide that the value should be stored at the (virtual)
CAAR of the Atom, the type at the
CADAR, and the Package at the
CADDAR… It starts to make more sense to use the Property List for that, which starts at the virtual
CDR of the Atom, as described in the link at the top of my posting.
Further, if there are Properties which need to be associated, even in cases where the value is just another Atom – i.e. how I have defined a Symbol – then the only place to store them, would be in this Property List.
And, It is no longer common to request that a function definition be printed, from within a LISP session, just because most functions get compiled into machine code these days. Therefore, to be able to derive the Print-Names of Atoms rapidly is no longer as important, as it once was.
(Edit 09/21/2016 : ) Common LISP possesses an Object-Oriented Extension, called
CLOS, also known as the Common LISP Object System. When some people write about ‘data-types’, they might be assuming OO-based classes.
But in dealing with LISP, it is important to distinguish, between the built-in types, and the classes which the programmer may define himself at will, of which there can be many.
If LISP is being used as such,
(defparameter *my-string* (string "Hello World!"))
The built-in function ‘
string‘ tells LISP that the List which it is about to construct, represents a String, as opposed, let us say, to a List of Characters, which would be defined by
(concatenate 'list "Hello" " " "World!")
*my-string* above is a Lexical Variable, which stores its String in an Atom, but because the data is variable-length, the internal representation will bear some similarity, to a List. Only, this will be a List referenced by the Atom, with a type attached somehow.
It would be a huge waste of resources, to make this association using
CLOS actually needs to be loaded as a Module, before the programmer can use it. It is a collection of 33 Functions, 8 Macros and several actual, additional LISP Types.
Just as ‘
concatenate‘ above recognizes the Symbolic Atom
'list , it also recognizes
'string . Without the programmer having to load
CLOS. Hence, the following code is also a very legitimate alternative to using the ‘
(concatenate 'string "Hello" " " "World!")
It is the reasoning of my posting, that since it is under the control of the LISP Compiler and / or Interpreter, how an Atom hides a structure of Lists, the arrangement of these Lists is a somewhat arbitrary detail, of one specific version of LISP. And my posting intends to show, that several solutions are possible, that all satisfy the needs.