eForth Overview

 

 

Before diving directly into eForth, I would like to discuss the general principles of Forth language. The language consists of a collection of words, which reside in the memory of a computer and can be executed by entering their names on the computer keyboard.  A list of words can be compiled, given a new name and made a new word.  In fact, most words in Forth are defined as lists of existing words.  A small set of primitive words are defined in machine code of the native CPU.  All other words are built from this primitive words and eventually refer to them when executed.

 

Words are similar to procedures and subroutines in other languages.  The difference is that Forth words are executable interactively when referenced by name, and they can be compiled into lists which can be referenced as new words.  Programming in Forth is to define new and more powerful words as lists of existing words.  This process continues until the final word becomes the solution to an application.

 

Here I will state 'The Forth Law of Computing" without a proof:

 

All computable functions can be constructed by defining new words as lists of words which include a small number of primitive words.

 

This eForth model consists of about 200 words, of which only 31 are primitive words.  Although it is very difficult to prove the above law, I will demonstrate it to you that from this small set of primitive words a complete operating system with many tools, that is the eForth model itself, can be built.  If an operating system can be built this way, it is not difficult to understand that any application can be so developed.

 

Forth is very similar to machine code.  In a computer, the CPU has a finite set of machine instructions, and all computable functions are implemented as lists of these machine instructions.  High level languages generally replace machine instruction lists by statements, functions, subroutines, and procedures, which can be used to construct procedures and subroutines at higher levels until the last procedure which is the application.  This also helps demonstrating the validity of the above law.

 

The primitive words must be constructed using native machine code of the host computer.  They are also called low level words or code words.  All other words are constructed as lists of existing words.  They are called high level words or colon words because ":" (colon) is a Forth word which defines or constructs new words to replace lists of existing words.

 

Forth as a computing system has two principal components: an user interface as the Forth language processor which interprets the commands entered from keyboard or equivalent devices; and a machine interface which interprets lists or words recursively until it can issue machine instructions in the primitive words to the host computer for execution.  The user interface processes commands in text form.  It is often referred to as the text interpreter and the outer interpreter. 

 

The machine interface executes words by processing recursively the word lists compiled in colon words to reach the primitive words which are handed to the host computer for execution. It is often called the inner interpreter and the address interpreter, because the word lists are often stored in the dictionary as address lists.

 

 

Virtual Forth Computer

 

Forth is a computer model which can be implemented on any real CPU with reasonable resources.  This model is often called a virtual Forth computer.  The minimal components of a virtual Forth computer are:

 

1.   A dictionary in memory to hold all the execution procedures.

2.   A return stack to hold return addresses of procedures yet to be executed.

3.   A data stack to hold parameters passing between procedures.

4.   A user area in RAM memory to hold all the system variables.

5.   A CPU to move date among stacks and memory, and to do ALU operations to parameters stored on the data stack.

 

The eForth model is a detailed specification of a virtual Forth computer which can be implemented on many different CPU's and forces them to behave identically in executing an identical Forth instruction set. It was first implemneted on a PC using Intel 8086 CPU as a guiding model for other implementations.  Here we will try to describe precisely the behavior of the virtual Forth computer. To describe precisely how this computer functions, we will use the 8086 machine code to clarify the specification.

 

The following registers are required for a virtual Forth computer:

 

Forth Register 8086 Register               Function          

           

IP                                 SI                              Interpreter Pointer       

SP                                SP                                         Data Stack Pointer      

RP                               RP                             Return Stack Pointer

WP                              AX                               Word or Work Pointer

UP                               (in memory )               User Area Pointer               

 

In the dictionary, each procedure (or word in Forth terminology) occupies an area called code field, which contains executable machine code and data required by the code. There are two types of words used in eForth: code word whose code field contains only machine instructions, and colon word whose code field contains a call to the list processing subroutine and a list of word addresses.  A word address is the code field address of the word in the dictionary.  4 bytes are allocated for the call to list processor. Word addresses are 2 bytes in length, and are pointers to code fields of words in the dictionary.  The length of a code field varies depending upon the complexity of the word.

 

In the code field of a code word there is a list of machine instructions of the native CPU.  The machine instructions are terminated by a group of instructions, generally specified as a macro instruction named $NEXT.  The function of $NEXT is to fetch the next word pointed to by the Interpreter Pointer IP, increment IP to point to the next word in the word list, and jump to the address just fetched.  Since a word address points to a code field containing executable machine instructions, executing a word means jumping directly to the code field pointed to by the word address.  $NEXT thus allows the virtual Forth computer to execute a list of words with very little CPU overhead.  In the 8086 implementation, $NEXT is a macro assembling the following two machine instructions as shown below.

 

In a colon word, the first four byte in the code field must be a subroutine call instruction to process the address list following this call instruction.  This address list processing subroutine is named doLIST.  doLIST pushes the contents in IP onto the return stack, copies the address of the first entry in its address list into IP and then calls $NEXT.  $NEXT will then start executing this list of addresses in sequence.

 

The last entry in the address list of a colon word must be EXIT.  EXIT is a code word which undoes what doLIST accomplished.  EXIT pops the top item on the return stack into the IP register. Consequently, IP points to the address following the colon word just executed.  EXIT then invokes $NEXT which continues the processing of the word list, briefly interrupted by the last colon word in this word list.

 

$NEXT   MACRO

        LODSW                       \ load next word into WP (AX)

        JMP     AX                  \ jump directly to the word thru WP

        ENDM                        \ IP (SI) now points to the next word

 

doLIST      ( a -- )                \ Run address list in a colon word.

      XCHG  BP,SP                   \ exchange pointers

      PUSH  SI                      \ push return stack

      XCHG  BP,SP                   \ restore the pointers

      POP   SI                      \ new list address

      $NEXT

 

CODE  EXIT                          \ Terminate a colon definition.

      XCHG  BP,SP                   \ exchange pointers

      POP   SI                      \ pop return stack

      XCHG  BP,SP                   \ restore the pointers

      $NEXT

 

It is interesting to note that in this eForth implementation, $NEXT is a macro, doLIST is a subroutine, and EXIT is actually a Forth code words.  $NEXT, doLIST and EXIT are collectively  call the 'inner interpreters' and 'address interpreters' of Forth.  They are the corner stones of a virtual Forth computer as they control the execution flow of Forth words in the system.

 

Based on the above mechanism to execute code words and colon words, a Forth computer can be constructed using a small set of machine dependent code words and a much larger set of colon words.  Tools are provided so that the user can extend the system by adding new words in truly modular fashion to solve any practical problems.

 

There are 190 high level words in eForth, built on the 31 low level primitive words.  The high level word set is required to build the outer interpreter and the associated utility words.  As the outer interpreter itself represents a fairly substantial application, the word set necessary to build the outer interpreter forms a very solid foundation to build most other applications.  However, for any real world application one would not expect that this eForth word set is sufficient.  The beauty of Forth is that in programming an application, the user designs and implements a new word set best tailored to his application.  Forth is an open system, assuming that no operating system can be complete and all-encompassing.  The user has the best understanding of his own needs, and he knows the best way to accomplish his goal.

 

 

Memory Map

 

The most important contribution by von Neumann to the computer design was the recognition that a single, uniform memory device can be used to store program and data, contrasting to the then prevailing architectures in which program and data were stored separately and most often using very different storage media.  It greatly simplified the design of computers and had become the dominant computer architecture for all the important computer families ever since.

 

Memory space is a concept of paramount importance in computer hardware and assembly programming, but often hidden and ignored in most conventional high level languages.  High level languages and operating systems hide the addressable memory space from the user in order to protect the operating system, because there are very sensitive areas in the memory space and unintentional alterations to the information stored in these areas would cause the system to malfunction or even to crash.  The point of view from the operating system and from the computer priesthood,  these sensitive areas must be protected at all cost, and they are the reserved territory of the systems programmers.  Ordinary applications programmers are allocated only enough space to run their programs safely, for their own good.

 

Forth opens the entire memory space to the user.  The user can freely store data and code into memory and retrieve them from the memory.  Coming with the freedom is the responsibility of handling the memory correctly.

 

Memory used in eForth is separated into the following areas:

 

Cold boot         100H-17FH         Cold start and variable initial values

Code dictionary   180H-1344H        Code dictionary growing upward

Free space        1346H-33E4H       Shared by code and name dictionaries

Name/word         33E6H-3BFFH       Name dictionary growing downward

Data stack        3C00H-3E7FH       Growing downward

TIB               3E80H-            Growing upward

Return stack      -3F7FH            Growing downward

User variables    3F80H-3FFFH

 

 

These areas are allocated by assembly constants and can be changed conveniently to suit the target environment.  The following assembly code segment prescribes the memory allocation in a typical eForth system.  The memory map is also illustrated in a schematic drawing for easier visulization.

 

;; Memory allocation

;; 0//code>--//--<name//up>--<sp//tib>--rp//em

EM    EQU   04000H                  ;top of memory

COLDD EQU   00100H                  ;cold start vector

US    EQU   64*CELLL                ;user area size in cells

RTS   EQU   64*CELLL                ;return stack/TIB size

RPP   EQU   EM-8*CELLL              ;start of return stack (RP0)

TIBB  EQU   RPP-RTS                 ;terminal input buffer (TIB)

SPP   EQU   TIBB-8*CELLL            ;start of data stack (SP0)

UPP   EQU   EM-256*CELLL            ;start of user area (UP0)

NAMEE EQU   UPP-8*CELLL             ;name dictionary

CODEE EQU   COLDD+US                ;code dictionary

 

 

 


eForth Kernel

 

 

 

One of the most important feature of eForth is the small machine dependent kernel, which allows its to be ported to other CPU's very conveniently. The selection of words in this kernel is based on the criteria that they are very difficult if not impossible to synthesize from other primitive words.  From this set of kernel words, all other Forth words have to be built.  The kernel words can be classified as following:

 

System interface:           BYE, ?rx, tx!, !io

Inner interpreters:       doLIT, doLIST, next, ?branch,  branch, EXECUTE, EXIT

Memory access:          ! , @,  C!,  C@

Return stack:                       RP@,  RP!,  R>, R@,  R>

Data stack:                         SP@,  SP!,  DROP, DUP,  SWAP,  OVER

Logic:                              0<,  AND,  OR,  XOR

Arithmetic:              UM+

 

The virtual Forth computer is based on a two-stack architecture.  The return stack is used to allow a high level word to be executed in the address list of another high level word.  It is very similar to the return stack used for nested subroutine calls in a conventional computer.  Before executing a high level word in an address list, the next address of the list is pushed on the return stack so that the IP register can be used to scan the address list in the called word.  When the called word is executed to completion, the stored address on the returned stack is popped back into IP register and execution of the calling word list can be continued.

 

The data stack is used to pass parameters from one word to another.  Conventional computers use the return stack to do the parameter passing, and it takes a very complicated compiler to figure out which are return addresses and which are parameters.  Forth segregated these two types of information on two separate stacks and thus greatly simplies the execution and compilation of words.  Passing parameter on the data stack also reduces the syntactical complexity of Forth language to the minimum and allows words to be strung together into lists with minimum overhead in compilation and interpretation.

 

The kernal words move and process data and address among the stacks and the memory.  They emcompass the minimal functionality necessary to make a computer to behave like a Forth computer.  A complete understanding of these kernel word is vital to the understanding of a virtual Forth computer.  However, it is not difficult to understand the kernel words, because there are only 31 of them.

 

It is my intention to use this eForth model to illustrate the validity of 'the Forth Law of Computing', which stated that all computable functions can be constructed by lists of these kernel words and the high level words built from these kernel words.  The eForth model includes a text interpreter which allows the user to type lists of word names and execute them in sequence, a compiler which allows the user to name lists of words and compile new words, and utilities like memory dump, stack dump, and a colon word decompiler.  Thus the eForth system forms a fairly complete software development environment for the user to develop applications.  If such a system can be built from this small set of kernel words, it should be obvious that most practical applications can also be built from it. 

 

 


System Interface

 

BYE returns control from eForth back to the operating system.  !io initializes the serial I/O device in the system so that it can interact with the user through a terminal.  These two words are not needed once the eForth system is up and running, but they are essential to bring the system up in DOS.  ?rx is used to implement ?KEY and KEY, and tx! is used to implement EMIT.  eForth communicates with the user through these words which supports terminal interactions and file download/upload.  Here these words are defined using the DOS service calls.  For embedded controllers, these three words must be defined for the specific I/O devices.

 

?RX is a unique design invented by Bill Muench to support serial input .  ?RX provides the functions required of both KEY and KEY? which accept input from a terminal.  ?RX inspects the terminal device and returns a character and a true flag if the character has been received and is waiting to be retrieved.  If no character was received, ?RX simply returns a false flag.  With ?RX, both KEY and KEY? can be defined as high level colon definitions.

 

TX! sends a character on the data stack to the terminal device.  Both ?RX and TX! are coded here as DOS calls.  In embedded applications, they will have to be coded in machine specific code to handle the specific serial I/O device.  !IO initializes the serial I/O device, which is not necessary here because it is taking care of by the DOS.  In embedded systems, the I/O device must be initialized by !IO.

 

CODE BYE    ( -- , exit Forth )

      INT   020H                    \ return to DOS

 

CODE  ?RX   ( -- c T | F )          \ Return input character and true,

                                    \ or a false if no input.

      $CODE 3,'?RX',QRX

      XOR   BX,BX                   \ BX=0 setup for false flag

      MOV   DL,0FFH                 \ input command

      MOV   AH,6                    \ MS-DOS Direct Console I/O

      INT   021H

      JZ    QRX3                    \ ?key ready

      OR    AL,AL                   \ AL=0 if extended char

      JNZ   QRX1                    \ ?extended character code

      INT   021H

      MOV   BH,AL                   \ extended code in msb

      JMP   QRX2

QRX1: MOV   BL,AL

QRX2: PUSH  BX                      \ save character

      MOV   BX,-1                   \ true flag

QRX3: PUSH  BX

      $NEXT

 

CODE  TX!   ( c -- )                \ Send character c to output device.

      POP   DX                      \ char in DL

      CMP   DL,0FFH                 \ 0FFH is interpreted as input

      JNZ   TX1                     \ do NOT allow input

      MOV   DL,32                   \ change to blank

TX1:  MOV   AH,6                    \ MS-DOS Direct Console I/O

      INT   021H                    \ display character

      $NEXT

 

CODE  !IO   ( -- )                  \ Initialize the serial I/O devices.

      $NEXT

 


Inner Interpreter

 

In the word list of a colon definition, it is generally assumed that words are execution addresses, which can be executed sequentially by the address interpreter $NEXT.  However, occasionally we do need to compile other types of data in-line with the words.  Special mechanisms must be used to tell the address interpreter to treat these data differently.  All data entries must be preceded by special words which can handle the data properly.  A special word and its associated data form a data structure.  Data structures are extensions of words and can be thought of as building blocks to form lists in colon definitions.

 

$NEXT must be assembled at the end of a code word.  It fetches the next address in the address list pointed to by IP and jumps to that address.  It allows an address list to be scanned and thus executed.  doLIST starts the execution of an address list by saving IP on the return stack and stores the starting address of an address list into IP, and then $NEXT starts executing this address list.  EXIT must be compiled as the last entry in an address list.  It terminates the execution of the current address list and returns execution to the address saved on the return stack.

 

EXECUTE takes the execution address from the data stack and executes that word.  This powerful word allows the user to execute any word which is not a part of an address list.

 

doLIT pushes the next word onto the data stack as an integer literal instead of as an addresses to be executed by $NEXT.  It allows numbers to be compiled as in-line literals, supplying data to the data stack at run time.  doLIT is not used by itself, but rather compiled by LITERAL which inserts doLIT and its asociated integer into the address list under construction.  Anytime you see a number in a colon definition, LITERAL is invoked to compile an integer literal with doLIT.

 

Integer literals are by far the most numerous data structures in colon definitions other than regular words.  Address literals are used to build control structures.  String literals are used to embed text strings in colon definitions.

 

$NEXT   MACRO

        LODSW                       \ load next word into WP (AX)

        JMP     AX                  \ jump directly to the word thru WP

        ENDM                        \ IP (SI) now points to the next word

 

doLIST      ( a -- )                \ Run address list in a colon word.

      XCHG  BP,SP                   \ exchange pointers

      PUSH  SI                      \ push return stack

      XCHG  BP,SP                   \ restore the pointers

      POP   SI                      \ new list address

      $NEXT

 

CODE  EXIT                          \ Terminate a colon definition.

      XCHG  BP,SP                   \ exchange pointers

      POP   SI                      \ pop return stack

      XCHG  BP,SP                   \ restore the pointers

      $NEXT

 

CODE  EXECUTE     ( ca -- )         \ Execute the word at ca.

      POP   BX

      JMP   BX                      \ jump to the code address

 

CODE  doLIT ( -- w )                \ Push inline literal on data stack.

      LODSW                         \ get the literal compiled in-line

      PUSH  AX                      \ push literal on the stack

      $NEXT                         \ execute next word after literal

 

 


Loops and Branches

 

eForth uses three different types of address literals. 'next', '?branch' and 'branch' are followed not by word addresses but by pointers to locations in a list to be executed next.  These address literals are the building blocks upon which loops and branching structures are constructed.  An address literal is followed by a branch pointer which causes execution to be transferred to that location.  The branch location most often points to a different location in the address list of the same colon word. 

 

CODE  next  ( -- )                  \ Decrement index and exit loop

                                    \ if index is less than 0.

      SUB   WORD PTR [BP],1         \ decrement the index

      JC    NEXT1                   \ ?decrement below 0

      MOV   SI,0[SI]                \ no, continue loop

      $NEXT

NEXT1:ADD   BP,2                    \ yes, pop the index

      ADD   SI,2                    \ exit loop

      $NEXT

 

CODE  ?branch     ( f -- )          \ Branch if flag is zero.

      POP   BX                      \ pop flag

      OR    BX,BX                   \ ?flag=0

      JZ    BRAN1                   \ yes, so branch

      ADD   SI,2                    \ point IP to next cell

      $NEXT

BRAN1:MOV   SI,0[SI]                \ IP:=(IP), jump to new address

      $NEXT

 

CODE  branch      ( -- )            \ Branch to an inline address.

      MOV   SI,0[SI]                \ jump to new address unconditionally

      $NEXT

 

Address literals are used to construct control structures in colon definitions.  'next' is compiled by NEXT.  '?branch' is compiled by IF, WHILE and UNTIL.  'branch' is compiled by AFT, ELSE, REPEAT and AGAIN.  In the colon words to be discussed in the later sections, you will not see these kernel words but words which construct loops and branches.  For examples:

 

            IF ( compiles ?branch and address after THEN ) <true clause>  THEN

            IF ( compiles ?branch and address after ELSE )  <true clause>

                        ELSE ( compiles branch and address after THEN ) <false clause>

                        THEN

            BEGIN (marks current address ) <loop clause>

                        AGAIN ( compiles branch and address after BEGIN )

            BEGIN ( mark current address ) <loop clause>

                        UNTIL ( compiles ?branch and address after BEGIN )

            BEGIN ( mark current address ) <loop clause>

                        WHILE ( compiles ?branch and address after REPEAT ) <true clause>

                        REPEAT ( compile branch and address after BEGIN )

            FOR  ( set up loop, mark current address ) <loop clause>

                        NEXT ( compile next and address after FOR )

            FOR ( set up loop, mark current address ) <loopclause>

                        AFT ( change marked address to current address, compile branch

                                    and address after THEN ) <skip clause>

                        THEN <loop clause>  NEXT ( compile next and address after AFT )

                       


Memory Access

 

Four memory accessing words are included in the eForth kernel: ! (store), @ (fetch), C! (C-store) and C@ (C-fetch).  ! and @ access memory in cells, whose size depends on the CPU underneath.  eForth assumes that the CPU can access memory in bytes and that all addresses are in the units of bytes.   C! and C@ allow the user access memory in bytes. 

 

The two most important resources in a computer are the CPU and the memory.  There is not much one can do with the CPU, except to use its instruction set to write programs.  However, the real usefulness and intelligence lies with the memory, which holds both the program and the data.  In conventional languages, you humbly request memory to store your data, and the compiler reluctantly allocate it to you.  If you exceed your memory allocation, your program will be ruthlessly terminated. 

 

In Forth, you have all the memory and you are allowed to do anything with the memory.  !, @, C! and C@ do not place restriction on their use.  You can use them to write self-modifying code if you like.  However, you must know exactly what you are doing.

 

It is not a very good idea to change the contents of the dictionary, except in the parameter fields of variables and arrays you defined specifically for data storage.  The space occupied by the stacks should be respected, too.  The user variable area holds vital information for the system to run correctly.  The space bewteen the code dictionary and the name dictionary are not used and you are free to use it to store temporary data.  Be reminded, however, that as you define new words, the dictionaries are extended and may over-write data you placed there.

 

The moral is: Use @ and C@ freely, but be careful with ! and C!.

 

CODE  !     ( w a -- )              \ Pop the data stack to memory.

      POP   BX                      \ get address from tos

      POP   0[BX]                   \ store data to that adddress

      $NEXT

 

CODE  @     ( a -- w )              \ Push memory location to data stack.

      POP   BX                      \ get address

      PUSH  0[BX]                   \ fetch data

      $NEXT

 

CODE  C!    ( c b -- )              \ Pop data stack to byte memory.

      POP   BX                      \ get address

      POP   AX                      \ get data in a cell

      MOV   0[BX],AL                \ store one byte

      $NEXT

 

CODE  C@    ( b -- c )              \ Push byte memory content on data stack.

      POP   BX                      \ get address

      XOR   AX,AX                   \ AX=0 zero the hi byte

      MOV   AL,0[BX]                \ get low byte

      PUSH  AX                      \ push on stack

      $NEXT

 

 


Return Stack

 

RP! pushes the address on the top of the data stack to the return stack and thus initializes the return stack.  RP! is only used to initialize the system and are seldom used in applications.  RP@ pushes the contents of the return stack pointer RP on the data stack.  It is also used very rarely in applications.

 

>R pops a number off the data stack and pushes it on the return stack..  R> does the opposite.  R@ copies the top item on the return stack and pushes it on the data stack.

 

The eForth system uses the return stack for two specific purposes: to save addresses while recusing through an address list, and to store the loop index during a FOR-NEXT loop.  As the addresses piled up on the return stack changes dynamically as words are executed, there is very little useful information the user can get from the return stack at the run time.  In setting up a loop, FOR compiles >R, which pushes the loop index from the data stack to the return stack.  Inside the FOR-NEXT loop, the running index can be recalled by R@.  NEXT compiles 'next' with an address after FOR.  when 'next' is executed, it decrements the loop index on the top of the return stack.  If the index becomes negative, the loop is terminated; otherwise, 'next' jumps back to the word after FOR.

 

Return stack is used by the virtual Forth computer to save return addresses to be processes later.  It is also a convenient place to store data temporarily.  The return stack can thus be considered as a extension of the data stack.  However, one must be very careful in using the return stack for temporary storage.  The data pushed on the return stack must be popped off before EXIT is executed.  Otherwise, EXIT will get the wrong address to return to, and the system generally will crash.

 

CODE  RP@   ( -- a )                \ Push current RP to data stack.

      PUSH  BP                      \ copy address to return stack

      $NEXT                         \ pointer register BP

 

CODE  RP!   ( a -- )                \ Set the return stack pointer.

      POP   BP                      \ copy (BP) to tos

      $NEXT

 

CODE  R>    ( -- w )                \ Pop return stack to data stack.

      PUSH  0[BP]                   \ copy w to data stack

      ADD   BP,2                    \ adjust RP for popping

      $NEXT

 

CODE  R@    ( -- w )                \ Copy top of return stack to data stack.

      PUSH  0[BP]                   \ copy w to data stack

      $NEXT

 

CODE  >R    ( w -- )                \ Push data stack to return stack.

      SUB   BP,2                    \ adjust RP for pushing

      POP   0[BP]                   \ push w to return stack

      $NEXT

 

 


Data Stack

 

The data stack is the centralized location where all numerical data are processed, and where parameters are passed from one word to another.  The stack items has to be arranged properly so that they can be retrieved properly in the Last-In-First-Out (LIFO) manner.  When stack items are out of order, they can be rearranged by the stack words DUP, SWAP, OVER and DROP.  There are other stack words useful in manipulating stack items, but these four are considered to be the minimum set.

 

Data stack is initialized by SP!.  The depth of data stack can be examined by SP@.  These words, as RP@ and RP! are only used by the system and very rarely used in applications.  These words are necessary in the Forth kernel because you cannot operate a stack-based computer without these instructions. 

 

CODE  DROP  ( w -- )                \ Discard top stack item.

      ADD   SP,2                   \ adjust SP to pop

      $NEXT

 

CODE  DUP   ( w -- w w )            \ Duplicate the top stack item.

      MOV   BX,SP                   \ use BX to index the stack

      PUSH  0[BX]

      $NEXT

 

CODE  SWAP  ( w1 w2 -- w2 w1 )      \ Exchange top two stack items.

      POP   BX                      \ get w2

      POP   AX                      \ get w1

      PUSH  BX                      \ push w2

      PUSH  AX                      \ push w1

      $NEXT

 

CODE  OVER  ( w1 w2 -- w1 w2 w1 )   \ Copy second stack item to top.

      MOV   BX,SP                   \ use BX to index the stack

      PUSH  2[BX]                   \ get w1 and push on stack

      $NEXT

 

CODE  SP@   ( -- a )                \ Push the current data stack pointer.

      MOV   BX,SP                   \ use BX to index the stack

      PUSH  BX                      \ push SP back

      $NEXT

 

CODE  SP!   ( a -- )                \ Set the data stack pointer.

      POP   SP                      \ safety

      $NEXT

 

 


Logical Words

 

The only primitive word which cares about logic is '?branch'.  It tests the top item on the stack.  If it is zero, ?branch will branch to the following address.  If it is not zero, ?branch will ignore the address and execute the word after the branch address.  Thus we distinguish two classes of numbers, zero for 'false' and non-zero for 'true'.  Numbers used this way are called logic flags which can be either true or false.  The only primitive word which generates flags is '0<', which examines the top item on the data stack for its negativeness.  If it is negative, '0<' will return a -1 for true.  If it is 0 or positive, '0<' will return a 0 for false.

 

The three logic words AND, OR and XOR are bitwise logic operators over the width of a cell.  They can be used to operate on real flags (0 and -1) for logic purposes.  The user must be aware of the distinct behaviors between the real flags and the generalized flags.

 

CODE  0<    ( n -- f )              \ Return true if n is negative.

      POP   AX

      CWD                           \ sign extend AX into DX

      PUSH  DX                      \ push 0 or -1

      $NEXT

 

CODE  AND   ( w w -- w )            \ Bitwise AND.

      POP   BX

      POP   AX

      AND   BX,AX

      PUSH  BX

      $NEXT

 

CODE  OR    ( w w -- w )            \ Bitwise inclusive OR.

      POP   BX

      POP   AX

      OR    BX,AX

      PUSH  BX

      $NEXT

 

CODE  XOR   ( w w -- w )            \ Bitwise exclusive OR.

      POP   BX

      POP   AX

      XOR   BX,AX

      PUSH  BX

      $NEXT

 

 


Primitive Arithmetic

 

The only primitive arithmetic word in the eForth kernel is UM+.  All other arithmetic words, like +, -, * and / are derived from UM+ as colon definitions.  This design emphasize portability over performance, because it greatly reduces the efforts in moving eForth into CPU's which do not have native multiply and divide instructions.  Once eForth is implemented on a new CPU, the more complicated arithmetic words are the first ones to be optimized to enhance the performance.

 

UM+ adds two unsigned number on the top of the data stack and returns to the data stack the sum of these two numbers and the carry as one number on top of the sum.  To handle the carry this way is very inefficient, because most CPU's have carry as a bit in the status register, and the carry can be accessed by many machine instructions.  It is thus more convenient to use carry in machine code programming.  eForth provides the user a handle on the carry in high level, making it easier for the user to deal with it directly. 

 

CODE  UM+   ( w w -- w cy )

\     Add two numbers, return the sum and carry flag.

      XOR   CX,CX                   \ CX=0 initial carry flag

      POP   BX

      POP   AX

      ADD   AX,BX

      RCL   CX,1                    \ get carry

      PUSH  AX                      \ push sum

      PUSH  CX                      \ push carry

      $NEXT

 


High Level Forth Words

 

 

 

Following are the eForth words defined as high level colon definitions.  They are built from the primitive eForth words and other high level eForth words, including data structures and control structures.  Since eForth source is coded in Microsoft MASM assembler, the word lists in the colon definitions are constructed as data in MASM, using the DW directive.  This form of representation, though very effective, is very difficult to read.  The original model of eForth as provided by Bill Muench was in the form of a Forth source listing.  This listing is much simpler and easy to read, assuming that the reader has some knowledge of the Forth syntax.  This listing is also a very good source to learn a good coding style of Forth.  I therefore think it is better to present the high level Forth colon definitions in this form.  As the 8086 eForth implementation deviates slightly from the original Forth model, I tried to translate the 8086 implementation faithfully back to the Forth style for our discussion here.

 

The sequence of words is exactly the same as that in the MASM assembly source listing.  The reader is encouraged to read the MASM source listing along with the text in this book.  Reading two descriptions of the same subject often enable better comprehension and understanding.

 

 

Variables  and User Variables

 

The term user variable was codified in earlier Forth systems on the mini-computers in which multitasking was an integral part of the Forth operating system.  In a multitasking system, many user share CPU and other resources in the computing system.  Each user has a private memory area to store essential information about its own task so that the system can leave a task temporarily to serve other users and return to this task continuing the unfinished work.  In a single user environment, the user variables have the same functionality as system variables.

 

In eForth, all variables used by the system are merged together and are implemented uniformly as user variables.  A special memory area in the high memory is allocated for all these variables, and they are all initialized by copying a table of initial values stored in the cold boot area.  A significant benefit of this scheme is that it allows the eForth system to operate in ROM memory naturally.  It is very convenient for embedded system applications which preclude mass storage and file downloading.

 

In an application, the user can choose to implement variables in the forms of user variables or regular variables when running in RAM memory.  To run things in ROM, variables must be defined as user variables.  Although eForth in the original model allows only a small number of user variable to be defined in an application, the user area can be enlarged at will by changing a few assembly constants and equates.

 

In eForth only one vocabulary is used.  The name of this vocabulary is FORTH.  When FORTH is executed, the address of the pointer to the top of the dictionary is written into the first cell in the CONTEXT array.  When the text interpreter searches the dictionary for a words, it picks up the pointer in CONTEXT and follow the thread through the name dictionary.  If the name dictionary is exhausted, the text interpreter will pick up the next cell in the CONTEXT array and do the search.  The first cell in CONTEXT array containing a 0 stops the searching.  There are 8 cells in the CONTEXT array. Since the last cell must be zero, eForth allows up to 8 context vocabularies to be searched.

 

There are two empty cells in the code field of FORTH.  The first cell stores the pointer to the last name field in the name dictionary.  The second field must be a 0, which serves to terminate a vocabulary link when many vocabularies are created.  Vocabularies are useful in reducing the number of words the text interpreter must search to locate a word, and allowing related words to be grouped together as logic modules.  Although the eForth itself only uses one vocabulary, the mechanism is provided to define multiple vocabularies in large applications.

 

The CONTEXT arrays is designed as a vocabulary stack to implement the ONLY- ALSO concept of vocabulary search order first  proposed by Bill Ragsdale in the Forth 83 Standard.

 

CURRENT points to a vocabulary thread to which new definitions are to be added. 

 

: doVAR ( -- a ) R> ;

 

VARIABLE UP ( -- a, Pointer to the user area.)

 

: doUSER    ( -- a, Run time routine for user variables.)

      R> @                          \ retrieve user area offset

      UP @ + ;                      \ add to user area base addr

 

: doVOC ( -- ) R> CONTEXT ! ;

 

: FORTH ( -- ) doVOC [ 0 , 0 ,

 

: doUSER ( -- a ) R> @ UP @ + ;

 

eForth provides many functions in the vectored form to allow the behavior the these functions to be changed dynamically at run time.  A vectored function stores a code address in a user variable. @EXECUTE is used to execute the function, given the address of the user variable.  Following is the list of user variables defined in eForth:

 

SP0         ( -- a, pointer to bottom of the data stack.)

RP0         ( -- a, pointer to bottom of the return stack.)

'?KEY       ( -- a, execution vector of ?KEY.  Default to ?rx.)

'EMIT       ( -- a, execution vector of EMIT.  Default to tx!)

'EXPECT     ( -- a, execution vector of EXPECT.  Default to 'accept'.)

'TAP        ( -- a, execution vector of TAP.  Defulat the kTAP.)

'ECHO       ( -- a, execution vector of ECHO.  Default to tx!.)

'PROMPT     ( -- a, execution vector of PROMPT.  Default to '.ok'.)

BASE        ( -- a,.radix base for numeric I/O.  Default to 10.)

tmp         ( -- a, a temporary storage location used in parse and find.)

SPAN        ( -- a, hold character count received by EXPECT.)

>IN         ( -- a, hold the character pointer while parsing input stream.)

#TIB        ( -- a, hold the current count and address of the terminal input        buffer. Terminal Input Buffer used one cell after #TIB.)

CSP         ( -- a, hold the stack pointer for error checking.)

'EVAL       ( -- a, execution vector of EVAL. Default to EVAL.)

'NUMBER     ( -- a, address of number conversion.  Default to NUMBER?.)

HLD         ( -- a, hold a pointer in building a numeric output string.)

HANDLER     ( -- a, hold the return stack pointer for error handling.)

CONTEXT     ( -- a, a area to specify vocabulary search order.  Default to          FORTH. Vocabulary stack, 8 cells follwing CONTEXT.)

CURRENT     ( -- a, point to the vocabulary to be extended.  Default to FORTH.

            Vocabulary link uses one cell after CURRENT.)

CP          ( -- a, point to the top of the code dictionary.)

NP          ( -- a, point to the bottom of the name dictionary.)

LAST        ( -- a, point to the last name in the name dictionary.)

 

 

Common Functions

 

This group of Forth words are commonly used in writing Forth applications.  They are coded in high level to enhance the portability of eForth.  In most Forth implementations, they are coded in machine language to increase the execute speed.  After an eForth system is ported to a new CPU, this word set should be recoded in assembly to improve the run time performance of the system.

 

?DUP, ROT, 2DROP,  and 2DUP are stack operators supplementing the four classic stack operators DUP, SWAP, OVER and DROP.

 

ROT is unique in that it accesses the third item on the data stack.  All other stack operators can only access one or two stack items.  In Forth programming, it is generally accepted that one should not try to access stack items deeper than the third item.  When you have to access deeper into the data stack, it is a good time to re-evaluate your algorithm.  Most often, you can avoid this situation by factoring your code into smaller parts which do not reach so deep.

 

+, - and D+ are simple extensions from the primitive word UM+.  It is interesting to see how the more commonly used arithmetic operators are derived.  + is UM+ with the carry discarded.  NOT returns the ones compliment of a number, and NEGATE returns the two's compliment.  Because UM+ preserves the carry, it can be used to form multiple precision operators like D+.  Later we will see how UM+ is used to do multiplication and division.

 

: ?DUP ( w -- w w | 0 ) DUP IF DUP THEN ;

: ROT ( w1 w2 w3 -- w2 w3 w1 ) >R SWAP R> SWAP ;

: 2DROP ( w w  -- ) DROP DROP ;

: 2DUP ( w1 w2 -- w1 w2 w1 w2 ) OVER OVER ;

: + ( w w -- w ) UM+ DROP ;

: NOT ( w -- w ) -1 XOR ;

:  NEGATE ( n -- -n ) NOT 1 + ;

: DNEGATE ( d -- -d ) NOT >R NOT 1 UM+ R> + ;

: D+ ( d d -- d ) >R SWAP >R UM+ R> R> + + ;

: - ( w w -- w ) NEGATE + ;

: ABS ( n -- +n ) DUP 0< IF NEGATE THEN ;

 

 


Comparison

 

The primitive comparison word in eForth is ?branch and 0<.  However, ?branch is at such a low level that it can not be readily used in high level Forth code.  ?branch is secretly compiled into the high level Forth words by IF as an address literal.  For all intentions and purposes, we can consider IF the equivalent of ?branch.  When IF is encountered, the top item on the data stack is considered a logic flag.  If it is true (non-zero), the execution continues until ELSE, then jump to THEN, or to THEN directly if there is no ELSE clause.

 

The following logic words are constructed using the IF...ELSE...THEN structure with 0< and XOR.  XOR is used as 'not equal' operator, because if the top two items on the data stack are not equal, the XOR operator will return a non-zero number, which is considered to be 'true'. 

 

U< is used to compared two unsigned numbers.  This operator is very important, especially in comparing addresses, as we assume that the addresses are unsigned numbers pointing to unique memory locations.  The arithmetic comparison operator < cannot be used to determine whether one address is higher or lower than the other.  Using < for address comparison had been the single cause of many failures in the annals of Forth.

 

MAX retains the larger of the top two items on the data stack.  Both numbers are assumed to be signed integers.

 

MIN retains the smaller of the top two items on the data stack.  Both numbers are assumed to be signed integers.

 

WITHIN checks whether the third item on the data stack is within the range as specified by the top two numbers on the data stack.  The range is inclusive as to the lower limit and exclusive to the upper limit.  If the third item is within range, a true flag is returned on the data stack.  Otherwise, a false flag is returned.  All numbers are assumed to be unsigned integers. 

 

: = ( w w -- t ) XOR IF 0 EXIT THEN -1 ;

: U< ( u u -- t ) 2DUP XOR 0< IF SWAP DROP 0< EXIT THEN - 0< ;

:  < ( n n -- t ) 2DUP XOR 0< IF      DROP 0< EXIT THEN - 0< ;

: MAX ( n n -- n ) 2DUP      < IF SWAP THEN DROP ;

: MIN ( n n -- n ) 2DUP SWAP < IF SWAP THEN DROP ;

: WITHIN ( u ul uh -- t ) \ ul <= u < uh

  OVER - >R - R> U< ;

 

 


Divide

 

This group of words provide a variety of multiplication and division functions.  The most interesting feature of this word set is that they are all based on the primitive UM+ operator in the kernel. Building this word set in high level has the penalty that all math operations will be slow.  However, since eForth needs these functions only in numeric I/O conversions, the performance of eForth itself is not substantially affected by them.  Nevertheless, if an application requires lots of numeric computations, a few critical words in this word set should be recoded in assembly.  The primary candidates for optimization are UM/MOD and UM*, because all other multiply and divide operators are derived from these two words.

 

UM/MOD and UM* are the most complicated and comprehensive division and multiplication operators.  Once they are coded, all other division and multiplication operators can be derived easily.  It has been a tradition in Forth coding that one solves the most difficult problem first, and all other problems are solved by themselves.

 

UM/MOD divides an unsigned double integer by an unsigned signal integer.  It returns the unsigned remainder and unsigned quotient on the data stack.

 

M/MOD divides a signed double integer by a signed signal integer.  It returns the signed remainder and signed quotient on the data stack.  The signed division is floored towards negative infinity.

 

/MOD divides a signed single integer by a signed integer.  It returns the signed remainder and quotient.  MOD is similar to /MOD, except that only the signed remainder is returned.  / is also similar to /MOD, except that only the signed quotient is returned.

 

In most advanced microprocessors like 8086, all these division operations can be performed by the CPU as native machine instructions.  The user can take advantage of these machine instructions by recoding these Forth words in machine code.

 

: UM/MOD ( ud u -- ur uq )

  2DUP U<

  IF NEGATE  15

    FOR >R DUP UM+ >R >R DUP UM+ R> + DUP

        R> R@ SWAP >R UM+  R> OR

      IF >R DROP 1 + R> ELSE DROP THEN R>

    NEXT DROP SWAP EXIT

  THEN DROP 2DROP  -1 DUP ;

 

: M/MOD ( d n -- r q ) \ floored division

  DUP 0<  DUP >R

  IF NEGATE >R DNEGATE R>

  THEN >R DUP 0< IF R@ + THEN R> UM/MOD R>

  IF SWAP NEGATE SWAP THEN ;

 

: /MOD ( n n -- r q ) OVER 0< SWAP M/MOD ;

: MOD ( n n -- r ) /MOD DROP ;

: / ( n n -- q ) /MOD SWAP DROP ;

 

 


Multiply

 

UM* is the most complicated multiplication operation.  Once it is coded, all other multiplication words can be derived from it. 

 

UM* multiplies two unsigned single integers and returns the unsigned double integer product on the data stack.  M*  multiplies two signed single integers and returns the signed double integer product on the data stack.  * multiplies two signed single integers and returns the signed single integer product on the data stack.

 

Again, advanced CPU's generally have these multiplication operations as native machine instructions.  The user should take advantage of these resources to enhance the eForth system.

 

Forth is very close to the machine language that it generally only handles integer numbers.  There are floating point extensions on many more sophisticated Forth systems, but they are more exceptions than rules.  The reason that Forth has traditionally been an integer language is that integers are handled faster and more efficiently in the computers, and most technical problems can be solved satisfactorily using integers only.  A 16-bit integer has the dynamic range of 110 dB which is far more than enough for most engineering problems.  The precision of a 16-bit integer representation is limited to one part in 65535, which could be inadequate for small numbers.  However, the precision can be greatly improved by scaling; i.e., taking the ratio of two integers.  It was demonstrated that pi, or any other irrational numbers, can be represented accurately to 1 part in 100,000,000 by a ratio of two 16-bit integers.

 

The scaling operators */MOD and */ are useful in scaling number n1 by the ratio of n2/n3.  When n2 and n3 are properly chosen, the scaling operation can preserve precision similar to the floating point operations at a much higher speed.  Notice also that in these scaling operations, the intermediate product of n1 and n2 is a double precision integer so that the precision of scaling is maintained.

 

*/MOD multiplies the signed integers n1 and n2, and then divides the double integer product by n3.  It in fact is ratioing n1 by n2/n3.  It returns both the remainder and the quotient.  */ is similar to */MOD except that it only returns the quotient.

 

: UM* ( u u -- ud )

  0 SWAP ( u1 0 u2 ) 15

  FOR DUP UM+ >R >R DUP UM+ R> + R>

    IF >R OVER UM+ R> + THEN

  NEXT ROT DROP ;

 

: * ( n n -- n ) UM* DROP ;

 

: M* ( n n -- d )

  2DUP XOR 0< >R  ABS SWAP ABS UM*  R> IF DNEGATE THEN ;

 

: */MOD ( n n n -- r q ) >R M* R> M/MOD ;

: */ ( n n n -- q ) */MOD SWAP DROP ;

 

 


Memory  Alignment

 

The most serious problem in porting system from one computer to another is that different computers have different sizes for their addresses and data.  We generally classify computers as 8, 16, 32, ... , bit machines, because they operate on data of these various sizes.  It is thus difficult to port a single programming model as eForth to all these computers.  In eForth, a set of memory alignment words helps to make it easier to port the eForth model to different machines.

 

We assume that the target computer can address it memory in 8 bit chunks (bytes).  The natural width of data best handled by the computer is thus a multiple of bytes.   A unit of such data is a cell.  An 16 bit machine handles data in 2 byte cells, and a 32 bit machine handles data in 4 byte cells.

 

CELL+ increments the memory address by the cell size in bytes, and CELL- decrements the memory address by the cell size.  CELLS multiplies the cell number on the stack by the cell size in bytes.  These words are very useful in converting a cell offset into a byte offset, in order to access integers in a data array.

 

ALIGNED converts an address on the stack to the next cell boundary, to help accessing memory by cells.

 

The blank character (ASCII 32) is special in eForth because it is the most often used character to delimit words in the input stream and the most often used character to format the output strings.  It is used so often that it is advantageous to define an unique word for it.  BL simply returns the number 32 on the data stack.

 

>CHAR is very important in converting a non-printable character to a harmless 'underscare' character(ASCII 95).  As eForth is designed to communicate with a host computer through the serial I/O device, it is important that eForth will not emit control characters to the host and causes unexpected behavior on the host computer.  >CHAR thus filters the characters before they are sent out by EMIT.

 

DEPTH returns the number of items currently on the data stack to the top of the stack.  PICK takes a number  n  off the data stack and replaces it with the n'th item on the data stack.  The number  n  is 0-based; i.e., the top item is number 0,  the next item is number 1, etc.  Therefore, 0 PICK is equivalent to DUP, and 1 PICK is  equivalent to OVER.

 

: CELL- ( a -- a ) -2 + ;

: CELL+ ( a -- a )  2 + ;

: CELLS ( n -- n )  2 * ;

 

: ALIGNED ( b -- a )

  DUP 0 2 UM/MOD DROP DUP

  IF 2 SWAP - THEN + ;

 

: BL ( -- 32 ) 32 ;

 

: >CHAR ( c -- c )

  $7F AND DUP 127 BL WITHIN IF DROP 95 THEN ;

 

: DEPTH ( -- n ) SP@ SP0 @ SWAP - 2 / ;

 

: PICK ( +n -- w ) 1 + CELLS SP@ + @ ;

 

 

Memory Access                                                    

 

Here are three useful memory operators.  +! increments the contents of a memory location by an integer on the stack.  2! and 2@ store and fetch double integers to and from memory.

 

There are three buffer areas used often in the eForth system.  HERE returns the address of the first free location above the code dictionary, where new words are compiled.  PAD returns the address of the text buffer where numbers are constructed and text strings are stored temporarily.  TIB is the terminal input buffer where input text string is held.

 

@EXECUTE is a special word supporting the vectored execution words in eForth.  It takes the word address stored in a memory location and executes the word.  It is used extensively to execute the vectored words in the user area.

 

A memory array is generally specified by a starting address and its length in bytes.  In a string, the first byte is a count byte, specifying the number of bytes in the following string.  This is called a counted string.  String literals in the colon definitions and the name strings in the name dictionary are all represented by counted strings.  Following are special words which handles memory arrays and strings.

 

COUNT converts a string array address to the address-length representation of a counted string.    CMOVE copies a memory array from one location to another.  FILL fills a memory array with the same byte.

 

Arrays and strings are generally specified by the address of the first byte in the array or string, and the byte length.  This specification of course is the consequence that the memory is byte addressable.  In a CPU which address memory in cells, these words must be defined in terms of an artificial byte space.

  

-TRAILING removes the trailing white space characters from the end of a string. White space characters include all the non-printable characters below ASCII 32. This word allows eForth to process text lines in files downloaded from a host computer.  It conveniently eliminates carriage-returns, life-feeds, tabs and spaces at the end of the text lines.

 

PACK$ is an important string handling word used by the text interpreter.  It copies a text string from on location to another.  In the target area,  the string is converted to a counted string by adding a count byte before the text of the string.  This word is used to build the name field of a new word at the bottom of the name dictionary.  PACK$ is designed so that it can pack bytes into cells in a cell addressable machine.

 

A cheap way to implement eForth on a cell addressable machine is to equate cell addresses to byte addresses, and to store one byte in a cell.  This scheme is workable, but very inefficient in the memory utilization.  PACK$ is a tool which helps the implementor to bridge the gap.

 

: +! ( n a -- ) SWAP OVER @ + SWAP ! ;

 

: 2! ( d a -- ) SWAP OVER ! CELL+ ! ;

: 2@ ( a -- d ) DUP CELL+ @ SWAP @ ;

 

: COUNT ( b -- b +n ) DUP 1 + SWAP C@ ;

 

: HERE ( -- a ) CP @ ;

: PAD ( -- a ) HERE 80 + ;

: TIB ( -- a ) #TIB CELL+ @ ;

 

: @EXECUTE ( a -- ) @ ?DUP IF EXECUTE THEN ;

 

: CMOVE ( b b u -- )

  FOR AFT >R DUP C@ R@ C! 1 + R> 1 + THEN NEXT 2DROP ;

 

: FILL ( b u c -- )

  SWAP FOR SWAP AFT 2DUP C! 1 + THEN NEXT 2DROP ;

 

: -TRAILING ( b u -- b u )

  FOR AFT BL OVER R@ + C@ <

    IF R> 1 + EXIT THEN THEN

  NEXT 0 ;

 

: PACK$ ( b u a -- a ) \ null fill

  ALIGNED  DUP >R OVER

  DUP 0 2 UM/MOD DROP

  - OVER +  0 SWAP !  2DUP C!  1 + SWAP CMOVE  R> ;

 


Text Interpreter

 

 

 

The text interpreter is also called the outer interpreter in Forth.  It is functionally equivalent to an operating system in a conventional computer.  It accepts command similar to English entered by a user, and carries out the tasks specified by the commands.  As an operating system, the text interpreter must be complicated, because of all the things it has to do.  However, because Forth employs very simple syntax rules, and has very simple internal structures, the Forth text interpreter is much simpler that conventional operating systems.  It is simple enough that we can discuss it completely in a single chapter, admitted that this is a long chapter.

 

Let us summarize what a text interpreter must do:

 

            Accept text input from a terminal

            Parse out commands from input text

            Search dictionary

            Execute commands

            Translate numbers into binary

            Display numbers in text form

            Handle errors gracefully

           

Forth allows us to build and integrate these required functions gradually in modules.  All the modules finally fall into their places in the word QUIT, which is the text interpreter itself.

 

You might want to look up the code of QUIT first and see how the modules fit together.  A good feeling about the big picture will help you in the study of the smaller modules.  Nevertheless, we will doggedly follow the loading order of the source code, and hope that you will not get lost too far in the progress.

 

 


Numeric Output

 

Forth is interesting in its special capabilities in handling numbers across the man-machine interface.  It recognizes that the machine and the human prefer very different representations of numbers.  The machine prefers the binary representation, but the human prefers decimal Arabic digital representations.  However, depending on circumstances, the human may want numbers to be represented in other radices, like hexadecimal, octal, and sometimes binary.

 

Forth solves this problem of internal (machine) versus external (human) number representations by insisting that all numbers are represented in the binary form in the CPU and in memory.  Only when numbers are imported or exported for human consumption are they converted to the external ASCII representation.  The radix of external representation is controlled by the radix value stored in the user variable BASE.

 

Since BASE is a user variable, the user can select any reasonable radix for entering numbers  into the computer and format ting numbers to be shown to the user.  Most programming languages can handle a small set of radices, like decimal, octal, hexadecimal and binary. 

 

DIGIT converts an integer to a digit.  EXTRACT extracts the least significan digit from a number n.  n is divided by the radix in BASE and returned on the stack.

 

The output number string is built below the PAD buffer.  The least significant digit is extracted from the integer on the top of the data stack by dividing it by the current radix in BASE.  The digit thus extracted are added to the output string backwards from PAD to the low memory. The conversion is terminated when the integer is divided to zero. The address and length of the number string are made available by #> for outputting.

 

An output number conversion is initiated by <# and terminated by #>.  Between them, # converts one digit at a time, #S converts all the digits, while HOLD and SIGN inserts special characters into the string under construction.  This set of words is very versatile and can handle many different output formats.

 

: DIGIT ( u -- c ) 9 OVER < 7 AND + 48 + ;

: EXTRACT ( n base -- n c ) 0 SWAP UM/MOD SWAP DIGIT ;

 

: <# ( -- ) PAD HLD ! ;

 

: HOLD ( c -- ) HLD @ 1 - DUP HLD ! C! ;

 

: # ( u -- u ) BASE @ EXTRACT HOLD ;

 

: #S ( u -- 0 ) BEGIN # DUP WHILE REPEAT ;

 

: SIGN ( n -- ) 0< IF 45 HOLD THEN ;

 

: #> ( w -- b u ) DROP HLD @ PAD OVER - ;

 

: str ( n -- b u ) DUP >R ABS <# #S R> SIGN #> ;

 

: HEX ( -- ) 16 BASE ! ;

: DECIMAL ( -- ) 10 BASE ! ;

 

 


Number Output

 

With the number formatting word set as shown above, one can format numbers for output in any form desired.  The free output format is a number string preceded by a single space.  The fix column format displays a number right-justified in a column of pre-determined width.  The words ., U., and ? use the free format.  The words .R and U.R use the fix format.

 

: str ( n -- b u )

      ( Convert a signed integer to a numeric string.)

      DUP >R                        ( save a copy for sign)

      ABS                           ( use absolute of n)

      <# #S                         ( convert all digits)  

      R> SIGN                       ( add sign from n)

      #> ;                          ( return number string addr and length)

 

: HEX ( -- )

      ( Use radix 16 as base for numeric conversions.)

      16 BASE ! ;

 

: DECIMAL   ( -- )

      ( Use radix 10 as base for numeric conversions.)

      10 BASE ! ;

 

: .R  ( n +n -- )

      ( Display an integer in a field of n columns, right justified.)

      >R str                        ( convert n to a number string)

      R> OVER - SPACES              ( print leading spaces)

      TYPE ;                        ( print number in +n column format)

 

: U.R ( u +n -- )

      ( Display an unsigned integer in n column, right justified.)

      >R                            ( save column number)

      <# #S #> R>                   ( convert unsigned number)

      OVER - SPACES                 ( print leading spaces)

      TYPE ;                        ( print number in +n columns)

 

: U.  ( u -- )

      ( Display an unsigned integer in free format.)

      <# #S #>                      ( convert unsigned number)

      SPACE                         ( print one leading space)

      TYPE ;                        ( print number)

 

: .   ( w -- )

      ( Display an integer in free format, preceeded by a space.)

      BASE @ 10 XOR                 ( if not in decimal mode)

      IF U. EXIT THEN               ( print unsigned number)

      str SPACE TYPE ;              ( print signed number if decimal)

 

: ?   ( a -- )

      ( Display the contents in a memory cell.)

      @ . ;                         ( very simple but useful command)

 

 


Numeric Input

 

The Forth text interpreter also handles the number input to the system.  It parses words out of the input stream and try to execute the words in sequence.  When the text interpreter encounters a word which is not the name of a word in the dictionary, it then assumes that the word must be a number and attempts to convert the ASCII string to a number according to the current radix.  When the text interpreter succeeds in converting the string to a number, the number is pushed on the data stack for future use if the text interpreter is in the interpreting mode.  If it is in the compiling mode, the text interpreter will compile the number to the code dictionary as an integer literal so that when the word under construction is later executed, this literal integer will be pushed on the data stack.

 

If the text interpreter fails to convert the word to a number, there is an error condition which will cause the text interpreter to abort, posting an error message to the user, and then wait for the user's next line of commands.

 

Only two words are needed in eForth to handle input of single precision integer numbers. 

 

DIGIT? converts a digit to its numeric value according to the current base, and NUMBER? converts a number string to a single integer.  NUMBER? is vectored through 'NUMBER to convert numbers.

 

NUMBER? converts a string of digits to a single integer.  If the first character is a $ sign, the number is assumed to be in hexadecimal.  Otherwise, the number will be converted using the radix value stored in BASE.  For negative numbers, the first character should be a - sign.  No other characters are allowed in the string.  If a non-digit character is encountered, the address of the string and a false flag are returned.  Successful conversion returns the integer value and a true flag.  If the number is larger than 2**n, where n is the bit width of the single integer, only the modulus to 2**n will be kept.

 

: DIGIT? ( c base -- u t )

  >R 48 - 9 OVER <

  IF 7 - DUP 10 < OR THEN DUP R> U< ;

 

: NUMBER? ( a -- n T | a F )

  BASE @ >R  0 OVER COUNT ( a 0 b n)

  OVER C@ 36 =

  IF HEX SWAP 1 + SWAP 1 - THEN ( a 0 b' n')

  OVER C@ 45 = >R ( a 0 b n)

  SWAP R@ - SWAP R@ + ( a 0 b" n") ?DUP

  IF 1 - ( a 0 b n)

    FOR DUP >R C@ BASE @ DIGIT?

      WHILE SWAP BASE @ * +  R> 1 +

    NEXT DROP R@ ( b ?sign) IF NEGATE THEN SWAP

      ELSE R> R> ( b index) 2DROP ( digit number) 2DROP 0

      THEN DUP

  THEN R> ( n ?sign) 2DROP R> BASE ! ;

 

 


Basic I/O

 

The eForth system assumes that the system will communicate with its environment only through a serial I/O interface.  To support the serial I/O, only three words are needed:

 

?KEY returns a false flag if no character is pending on the receiver. If a character is received, the character and a true flag are returned.  This word is more powerful than that usually defined in most Forth systems because it consolidate the functionality of KEY into ?KEY.  It simplifies the coding of the machine dependent I/O interface.

 

KEY will execute ?KEY continually until a valid character is received and the character is returned.  EMIT sends a character out throughout the transmit line.

 

?KEY and EMIT are vectored through '?KEY and 'EMIT, so that their function can be changed dynamically at run time.  Normally ?KEY executes ?RX and EMIT executes TX!.  ?RX and TX! are machine dependent kernel words.  Vectoring the I/O words allows the eForth system to changes its I/O channels dynamically and still uses all the existing tools to handle input and output transactions.

 

All I/O words are derived from ?KEY, KEY and EMIT.  The following set defined in eForth is particularly useful in normal programming:

 

SPACE outputs a blank space character.  SPACES output n blank space characters.  CR outputs a carriage-return and a line-feed.  PACE outputs an ASCII 11 character to acknowledge lines received during file downloading.

 

NUF? returns a false flag if no character is pending in the input buffer.  After receiving a character, pause and wait for another character.  If this character is CR, return a true flag; otherwise, return false.  This word is very useful in user interruptable routines.

 

TYPE outputs n characters from a string in memory.

 

With the number formatting word set as shown above, one can format numbers for output in any form desired.  The free output format is a number string preceded by a single space.  The fix column format displays a number right-justified in a column of pre-determined width.  The words ., U., and ? use the free format.  The words .R and U.R use the fix format.

 

String literals are data structures compiled in colon definitions, in-line with the words.  A string literal must start with a string word which knows how to handle the following string at the run time.  Let us show two examples of the string literals:

 

: xxx    ...   " A compiled string"  ...   ;

: yyy    ...   ." An output string"  ...   ;

 

In xxx, "  is an immediate word which compiles the following string as a string literal preceded by a special word $"|.  When $"| is executed at the run time, it returns the address of this string on the data stack.  In yyy,  ." compiles a string literal preceded by another word ."|, which prints the compiled string to the output device.

 

Both $"| and ."| use the word do$, which retrieve the address of a string stored as the second item on the return stack.  do$ is a bit difficult to understand, because the starting address of the following string is the second item on the return stack.  This address is pushed on the data stack so that the string can be accessed.  This address must be changed so that the address interpreter will return to the word right after the compiled string.  This address will allow the address interpreter to skip over the string literal and continue to execute the word list as intended.

 

: ?KEY ( -- c T | F ) '?KEY @EXECUTE ;

: KEY ( -- c ) BEGIN ?KEY UNTIL ;    

: EMIT ( c -- ) 'EMIT @EXECUTE ;

 

: NUF? ( -- f ) ?KEY DUP IF 2DROP KEY 13 = THEN ;

 

:  PACE ( -- ) 11 EMIT ;

: SPACE ( -- ) BL EMIT ;

 

: CHARS ( +n c -- ) \ ???ANS conflict

  SWAP 0 MAX FOR AFT DUP EMIT THEN NEXT DROP ;

 

: SPACES ( +n -- ) BL CHARS ;

 

: TYPE ( b u -- ) FOR AFT DUP C@ EMIT 1 + THEN NEXT DROP ;

 

: CR ( -- ) 13 EMIT 10 EMIT ;

 

: do$ ( -- a )

  R> R@ R> COUNT + ALIGNED >R SWAP >R ;

 

: $"| ( -- a ) do$ ;

 

: ."| ( -- ) do$ COUNT TYPE ; COMPILE-ONLY

 

:  .R ( n +n -- ) >R str      R> OVER - SPACES TYPE ;

: U.R ( u +n -- ) >R <# #S #> R> OVER - SPACES TYPE ;

 

: U. ( u -- ) <# #S #> SPACE TYPE ;

:  . ( n -- ) BASE @ 10 XOR IF U. EXIT THEN str SPACE TYPE ;

 

: ? ( a -- ) @ . ;

 

 


Parsing

 

Parsing is always thought of as a very advanced topic in computer sciences.  However, because Forth uses very simple syntax rules, parsing is easy.  Forth source code consists of words, which are ASCII strings separated by spaces and other white space characters like tabs, carriage returns, and line feeds.  The text interpreter scans the source code, isolates words and interprets them in sequence.  After a word is parsed out of the input text stream, the text interpreter will 'interpret' it--execute it if it is a word, compile it if the text interpreter is in the compiling mode, and convert it to a number if the word is not a Forth word.

 

PARSE scans the source string in the terminal input buffer from where >IN points to till the end of the buffer, for a word delimited by character c.  It returns the address and length of the word parsed out.  PARSE calls 'parse' to do the detailed works.  PARSE is used to implement many specialized parsing words to perform different source code handling functions.  These words, including (, \, CHAR, WORD, and WORD are discussed in the next section.

 

'parse'  ( b1 u1 c -- b2 u2 n )  is the elementary command to do text parsing.  From the source string starting at b1 and of u1 characters long, parse out the first word delimited by character c.  Return the address b2 and length u2 of the word just parsed out and the difference n between b1 and b2.  Leading delimiters are skipped over.  'parse' is used by PARSE.

 

.(  types the following string till the next ).  It is used to output text to the terminal.  (  ignores the following string till the next ).  It is used to place comments in source text.  \  ignores all characters till end of input buffer.  It is used to insert comment lines in text.

 

CHAR  parses the next word but only return the first character in this word.  Get an ASCII character from the input stream.  WORD parses out the next word delimited by the ASCII character c.  Copy the word to the top of the code dictionary and return the address of this counted string.   WORD parses the next word from the input buffer and copy the counted string to the top of the name dictionary.  Return the address of this counted string. 

 

: parse ( b u c -- b u delta ; <string> )

  tmp !  OVER >R  DUP \ b u u

  IF 1 -  tmp @ BL =

    IF \ b u' \ 'skip'

      FOR BL OVER C@ - 0< NOT  WHILE 1 +

      NEXT ( b) R> DROP 0 DUP EXIT \ all delim

        THEN  R>

    THEN OVER SWAP \ b' b' u' \ 'scan'

    FOR tmp @ OVER C@ -  tmp @ BL =

      IF 0< THEN WHILE 1 +

    NEXT DUP >R  ELSE R> DROP DUP 1 + >R

                 THEN OVER -  R>  R> - EXIT

  THEN ( b u) OVER R> - ;

 

: PARSE ( c -- b u ; <string> )

  >R  TIB >IN @ +  #TIB @ >IN @ -  R> parse >IN +! ;

 

: .( ( -- ) 41 PARSE TYPE ; IMMEDIATE

: ( ( -- )  41 PARSE 2DROP ; IMMEDIATE

: \ ( -- ) #TIB @ >IN ! ; IMMEDIATE

 

: CHAR ( -- c ) BL PARSE DROP C@ ;

 

: TOKEN ( -- a ; <string> )

  BL PARSE 31 MIN NP @ OVER - CELL- PACK$ ;

 

: WORD ( c -- a ; <string> ) PARSE HERE PACK$ ;

 

 


Dictionary Search

 

In eForth, headers of word definitions are linked into a name dictionary which is separated from the code dictionary.  A header contains three fields: a word field holding the code address of the word, a link field holding the name field address of the previous header and a name field holding the name as a counted string.  The name dictionary is a list linked through the link fields and the name fields. The basic searching function is performed by the word 'find'.  'find' follows the linked list of names to find a name which matches a text string, and returns the address of the executable word and the name field address, if a match is found.

 

eForth allows multiple vocabularies in the name dictionary.  A dictionary can be divided into a number of independently linked sublists through some hashing mechanism.  A sublist is called a vocabulary.  Although eForth itself contains only one vocabulary, it has the provision to build many vocabularies and allows many vocabularies to be searched in a prioritized order.  The CONTEXT array in the user area has 8 cells and allows up to 8 vocabularies to be searched in sequence.  A null entry in the CONTEXT array terminates the vocabulary search.

 

find ( a va -- ca na, a F)  A counted string at a is the name of a word to be looked up in the dictionary. The last name field address of the vocabulary is stored in location va. If the string is found, both the word (code address) and the name field address are returned.  If the string is not the name a word, the string address and a false flag are returned.

 

To located a word, one could follow the linked list and compare the names of defined words to the string to be searched.  If the string matches the name of a word in the name dictionary, the word and the address of the name field are returned.  If the string is not a defined word, the search will lead to either a null link or a null name field.  In either case, the search will be terminated and a false flag returned.  The false flag thus indicates that the word searched is not in this vocabulary.

 

'find' runs through the name dictionary very quickly because it first compares the length and the first character in the name field as a cell.  In most cases of mismatch, this comparison would fail and the next name can be reached through the link field.  If the first two characters match, then SAME? is invoked to compare the rest of the name field, one cell at a time.  Since both the target text string and the name field are null filled to the cell boundary, the comparison can be performed quickly across the entire name field without worrying about the end conditions.

 

NAME?  ( a -- ca na, a F)   Search all the vocabularies in the CONTEXT array for a name at address a.  Return the word and a name address if a matched word is found.  Otherwise, return the string address and a false flag.  The CONTEXT array can hold up to 8 vocabulary links.  However, a 0 which is not a valid vocabulary link in this array will terminate the searching.  Changing the vocabulary links in this array and the order of these links will alter the searching order and hence the searching priority among the vocabularies.

 

: NAME> ( a -- xt ) CELL- CELL- @ ;

 

: SAME? ( a a u -- a a f \ -0+ )

  FOR AFT OVER R@ CELLS + @

          OVER R@ CELLS + @ -  ?DUP

    IF R> DROP EXIT THEN THEN

  NEXT 0 ;

 

: find ( a va -- xt na | a F )

  SWAP              \ va a

  DUP C@ 2 / tmp !  \ va a  \ get cell count

  DUP @ >R          \ va a  \ count byte & 1st char

  CELL+ SWAP        \ a' va

  BEGIN @ DUP       \ a' na na

    IF DUP @ [ =MASK ] LITERAL AND  R@ XOR \ ignore lexicon bits

      IF CELL+ -1 ELSE CELL+ tmp @ SAME? THEN

    ELSE R> DROP EXIT

    THEN

  WHILE CELL- CELL- \ a' la

  REPEAT R> DROP SWAP DROP CELL-  DUP NAME> SWAP ;

 

: NAME? ( a -- xt na | a F )

  CONTEXT  DUP 2@ XOR IF CELL- THEN >R \ context<>also

  BEGIN R>  CELL+  DUP >R  @  ?DUP

  WHILE find  ?DUP

  UNTIL R> DROP EXIT THEN R> DROP  0 ;

 

 


Terminal

 

The text interpreter interprets source text  stored in the terminal input buffer.  To process characters from the input device, we need three special words to deal with backspaces and carriage return from the input device:

 

kTAP  processes a character c received from terminal.  b1 is the starting address of the input buffer.  b2 is the end of the input buffer. b3 is the currently available address in the input buffer.  c is normally stored into b3, which is bumped by 1 and becomes b5.  In this case, b4 is the same as b2.  If c is a carriage-return, echo a space and make b4=b5=b3.  If c is a back-space, erase the last character and make b4=b2, b5=b3-1.  TAP  echoes c to output device, store c in b3, and bump b3.

 

^H   processes the back-space character.  Erase the last character and decrement b3.  If b3=b1, do nothing because you cannot backup beyond the beginning of the input buffer.

 

QUERY is the word which accepts text input, up to 80 characters, from the input device and copies the text characters to the terminal input buffer.  It also prepares the terminal input buffer for parsing by setting #TIB to the character count and clearing >IN.

 

EXPECT  accepts u characters to a memory buffer starting at b.  The input is terminated upon receiving a carriage-return.  The number of characters actually received is stored in SPAN.  EXPECT is called by QUERY to put characters into the terminal input buffer.  However, EXPECT is useful by itself because one can use it to place input text anywhere in the memory.  QUERY and EXPECT are the two words most useful in accepting text from the terminal. 

 

'accept'  accepts u1 characters to b.  u2  returned is the actual count of characters received.

 

: ^H ( b b b -- b b b ) \ backspace

  >R OVER R> SWAP OVER XOR

  IF  8 'ECHO @EXECUTE

     32 'ECHO @EXECUTE \ distructive

      8 'ECHO @EXECUTE \ backspace

  THEN ;

 

: TAP ( bot eot cur c -- bot eot cur )

  DUP 'ECHO @EXECUTE OVER C! 1 + ;

 

: kTAP ( bot eot cur c -- bot eot cur )

  DUP 13 XOR

  IF 8 XOR IF BL TAP ELSE ^H THEN EXIT

  THEN DROP SWAP DROP DUP ;

 

: accept ( b u -- b u )

  OVER + OVER

  BEGIN 2DUP XOR

  WHILE  KEY  DUP BL -  95 U<

    IF TAP ELSE 'TAP @EXECUTE THEN

  REPEAT DROP  OVER - ;

 

: EXPECT ( b u -- ) 'EXPECT @EXECUTE SPAN ! DROP ;

 

: QUERY ( -- )

  TIB 80 'EXPECT @EXECUTE #TIB !  DROP 0 >IN ! ;

 

 

Error Handling

 

This error handling mechanism was first developed by Mitch Bradley in his ForthMacs and then adopted by the ANS Forth Standard.  It is very simple yet very powerful in customizing system responses to many different error conditions.

 

CATCH setups a local error frame and execute the word referenced by the execution word ca. It returns a non-zero error code or a zero if no error occurred.  As the assigned word at ca is executing, any error condition will execute THROW, which pushes an error code on the data stack, restore the return stack to the state before CATCH was executed, and execute the error handler stored in HANDLER.  Since the error handler frame is saved on the return stack, many layers of safety nets can be laid down nested.

 

CATCH pushes SP and HANDLER on the return stack, saves RP in HANDLER, and then execute the word at ca.  If no error occurred,  HANDLER and SP are restored from the return stack and a 0 is pushed on the data stack.

 

THROW throws the system back to CATCH so that the error condition can be processed.  CATCH is backtracked by restoring the return stack from the pointer stored in HANDLER and popping the old handler and SP off the error frame on the return stack.

 

: CATCH ( ca -- err#/0 )

      ( Execute word at ca and set up an error frame for it.)

      SP@ >R            ( save current stack pointer on return stack )

      HANDLER @ >R      ( save the handler pointer on return stack )

      RP@ HANDLER !     ( save the handler frame pointer in HANDLER )

      ( ca ) EXECUTE    ( execute the assigned word over this safety net )

      R> HANDLER !      ( normal return from the executed word )

                        ( restore HANDLER from the return stack )

      R> DROP           ( discard the saved data stack pointer )

      0 ;               ( push a no-error flag on data stack )

 

: THROW ( err# -- err# )

      ( Reset system to current local error frame an update error flag.)

      HANDLER @ RP!     ( expose latest error handler frame on return stack )

      R> HANDLER !      ( restore previously saved error handler frame )

      R> SWAP >R        ( retrieve the data stack pointer saved )

      SP!               ( restore the data stack )

      DROP

      R> ;              ( retrived err# )

 

 

NULL$ is the address of a string with a zero count.  This address is used by ABORT and abort" to terminate the interpreting of the current command line.  QUIT tests the address reported by CATCH.  If this address is NULL$, the termination is normal and no error message will be issued.  If CATCH reports a different address, QUIT will display the contents of the string at that address.

 

ABORT" is used only within a definition to compile an inline packed string terminated by the " double quote character. At run-time, if the flag is false, execute the sequence of words following the string. Otherwise, the string is displayed on the current output device, and execution is then passed to an error handling routine.

 

You have to study the code in QUIT carefully with this section to get a better understanding of the CATCH-THROW error handling mechanism.

 

: CATCH ( xt -- 0 | err# )

  SP@ >R  HANDLER @ >R  RP@ HANDLER !

  EXECUTE

  R> HANDLER !  R> DROP  0 ;

 

: THROW ( err# -- err# )

  HANDLER @ RP!  R> HANDLER !  R> SWAP >R SP! DROP R> ;

                       

CREATE NULL$ 0 , $," coyote"

 

: ABORT ( -- ) NULL$ THROW ;

 

: abort" ( f -- ) IF do$ THROW THEN do$ DROP ;

 

Let's look at how the CATCH-THROW pair is used.  In QUIT, there is this indefinite loop:

 

BEGIN QUERY [ ' EVAL ] LITERAL            CATCH

?DUP UNTIL

 

QUERY get a line of text and CATCH causes EVAL to interpret the line. CATCH also sets up an error handling frame on the return stack and saves the return stack pointer in the user variable HANDLER.  The error handling frame contains the current data stack pointer and the current contents in HANDLER.  If no error occurred during EVAL, the error frame is popped off the return stack and a false flag is returned on the data stack.  ?DUP UNTIL will loop back to QUERY and the interpretive process will continue.

 

While EVAL interprets the text, any word which decided that it detects an error condition and needs attention, it will execute THROW.  THROW restores the return stack from the pointer stored in HANDLER, making the error handling frame available.  THROW then restores HANDLER from the one saved in the error frame so that the error handling can be nested.  The data stack pointer is also restored from the error frame.  Now THROW passes the address of a error processing routine to the CATCH which built the error frame.

 

$INTERPRET, ?STACK and abort" pass string addresses to THROW.  The strings contains appropriate error messages to be displayed by the text interpreter.  In QUIT, the words between UNTIL and AGAIN deal with the error conditions and then re-initialize the text interpreter.

           

Here are some of the examples which generate error conditions:

 

: ABORT   NULL$ THROW ;

: abort"   IF do$ THROW THEN do$ DROP ;

: ?STACK   DEPTH 0< IF $" underflow" THROW THEN ;

: $INTERPRET   ... 'NUMBER @EXECUTE  IF EXIT THEN THROW ;

 

 

Text Interpreter

 

Text interpreter in Forth is like the operating system of a computer. It is the primary interface a user goes through to get the computer to do work.  Since Forth uses very simple syntax rules--words are separated by spaces, the text interpreter is also very simple.  It accepts a line of text from the terminal, parses out a word delimited by spaces, locates the word of this word in the dictionary and then executes it. The process is repeated until the source text is exhausted.  Then the text interpreter waits for another line of text and interprets it again.  This cycle repeats until the user is exhausted and turns off the computer.

 

In eForth, the text interpreter is encoded in the word QUIT.  QUIT contains an infinite loop which repeats the QUERY EVAL phrase. QUERY accepts a line of text from the terminal and copies the text into the Terminal Input Buffer (TIB).  EVAL interprets the text one word at a time till the end of the text line.

 

One of the unique features in eForth is its error handling mechanism. While EVAL is interpreting a line of text, there could exist many error conditions: a word is not found in the dictionary and it is not a number, a compile-only word is accidentally executed interpretively, and the interpretive process may bee interrupted by the words ABORT or abort".  Wherever the error occurs, the text interpreter must be made aware of it so that it can recover gracefully from the error condition and continue on about the interpreting business.

 

$INTERPRET executes a word whose string address is on the stack.  If the string is not a word, convert it to a number.

 

[  activates the text interpreter by storing the execution address of $INTERPRET into the variable 'EVAL, which is executed in EVAL while the text interpreter is in the interpretive mode.

 

.OK prints the familiar 'ok' prompt after executing to the end of a line.  'ok' is printed only when the text interpreter is in the interpretive mode.  While compiling, the prompt is suppressed.

 

?STACK checks for stack underflow.  Abort if the stack depth is negative.

 

EVAL is the interpreter loop which parses words from the input stream and invokes whatever is in 'EVAL to handle that word, either execute it with $INTERPRET or compile it with $COMPILE.

 

: $INTERPRET ( a -- )

  NAME?  ?DUP

  IF @ $40 AND

    ABORT" compile ONLY" EXECUTE EXIT

  THEN 'NUMBER @EXECUTE IF EXIT THEN THROW ;

 

: [ ( -- ) doLIT $INTERPRET 'EVAL ! ; IMMEDIATE

 

: .OK ( -- ) doLIT $INTERPRET 'EVAL @ = IF ."  ok" THEN CR ;

 

: ?STACK ( -- ) DEPTH 0< ABORT" underflow" ;

 

: EVAL ( -- )

  BEGIN TOKEN DUP C@

  WHILE 'EVAL @EXECUTE ?STACK

  REPEAT DROP 'PROMPT @EXECUTE ;

 

 

Shell

 

Source code can be downloaded to eForth through the serial input device.  The only precaution we have to take is that during file downloading, characters are not echoed back to the host computer.  However, whenever an error occurred during downloading, it is more useful to resume echoing so that error messages can be displayed on the terminal.  It is also convenient to send special pacing characters to the host to tell the host that a line of source code was received and processed correctly.  The following words configure the eForth I/O vectors to have the proper behavior in normal terminal interaction and also during file downloading:

 

FILE turns off character echoing.  After one line of text is processed correctly, a pacing character ASCII 11 is sent to the host.  If an error occurred, send an ESC (ASCII  26) character.  An error will also restore the I/O vectors into the terminal mode.  HAND resumes terminal interaction.  Turn on character echoing, and send normal prompt message after a line is processed correctly.  CONSOLE initializes the serial I/O device for terminal interaction.  ?KEY is vectored to ?RX and EMIT is vectored to TX!.

 

QUIT is the operating system, or a shell, of the eForth system.  It is an infinite loop eForth will never get out.  It uses QUERY to accept a line of commands from the terminal and then let EVAL parse out the words and execute them.  After a line is processed, it displays 'ok' and wait for the next line of commands.  When an error occurred during execution, it displays the command which caused the error with an error message.  After the error is reported, it re-initializes the system using PRESET and comes back to receive the next line of commands.

 

Because the behavior of EVAL can be changed by storing either $INTERPRET or $COMPILE into 'EVAL, QUIT exhibits the dual nature of a text interpreter and a compiler.

 

: PRESET ( -- ) SP0 @ SP!  TIB #TIB CELL+ ! ;

 

: xio ( a a a -- ) \ reset  'EXPECT 'TAP  'ECHO 'PROMPT

  doLIT accept  'EXPECT 2! 'ECHO 2! ; COMPILE-ONLY

 

: FILE ( -- )

  doLIT PACE  doLIT DROP  doLIT kTAP xio ;

 

: HAND ( -- )

  doLIT .OK   doLIT EMIT  [ kTAP  xio ;

 

CREATE I/O  ' ?RX , ' TX! , \ defaults

 

: CONSOLE ( -- ) I/O 2@ '?KEY 2! HAND ;

 

: QUIT ( -- )

  RP0 @ RP!

  BEGIN [COMPILE] [

    BEGIN QUERY doLIT EVAL CATCH ?DUP

    UNTIL 'PROMPT @ SWAP CONSOLE  NULL$ OVER XOR

    IF CR #TIB 2@ TYPE

       CR >IN @ 94 CHARS

       CR COUNT TYPE ."  ? "

    THEN doLIT .OK XOR

    IF $1B EMIT THEN

    PRESET

  AGAIN ;

 


eForth Compiler

 

 

 

After wading through the text interpreter, the Forth compiler will be an easy piece of cake, because the compiler uses almost all the modules used by the text interpreter.  What the compile does, over and above the text interpreter, is to build various structures required by the new words we want to add to the existing system.  Here is a list of these structures:

 

            Name headers

            Colon definitions

            Constants,

            Variables

            User variables

            Integer literals

            String literals

            Address literals

            Control structures

 

A special concept of immediate words is difficult to grasp at first.  It is required in the compiler because of the needs in building different data and control structures in a colon definition.  To understand the Forth compiler fully, you have to be able to differential and relate the actions during compile time and actions taken during executing time.  Once these concepts are clear, the whole Forth system will become transparent.

 

This set stage for enlightenment to strike.

 

 


Interpreter and Compiler

 

The Forth compiler is the twin brother of the Forth text interpreter. They share many common properties and use lots of common code.  In eForth, the implementation of the compiler clearly reflects this special duality.  Two interesting words [ and ] causes the text interpreter to switch back and forth between the compiler mode and interpreter mode of operation.

 

Since 'EVAL @EXECUTE is used in EVAL to process a word parsed out of a line of text, the contents in 'EVAL determines the behavior of the text interpreter.  If $INTERPRET is stored in 'EVAL, as [ does, the words are executed or interpreted.  If we invoked ] to store $COMPILE into 'EVAL, the word will not be executed, but added to the top of the code dictionary.  This is exactly the behavior desired by the colon definition compiler in building a list of words in the code field of a new colon definition on the top of the code dictionary.

 

$COMPILE normally adds a word to the code dictionary.  However, there are two exceptions it must handle.  If the word parsed out of the input stream does not exist in the dictionary, the string will be converted to a number.  If the string can be converted to an integer, the integer is then compiled into the code dictionary as an integer  literal. The integer number is compiled into the code dictionary following the word doLIT.  The other exception is that a word found in the dictionary could be an immediate word, which must be executed immediately, not compiled to the code dictionary.  Immediate words are used to compile special structures in colon definitions.

 

: [ ( -- )

      [ ' $INTERPRET ] LITERAL

      'EVAL !                       ( vector EVAL to $INTERPRET )

      ; IMMEDIATE                   ( enter into text interpreter mode )

 

: ] ( -- )

      [ ' $COMPILE ] LITERAL

      'EVAL !                       ( vector EVAL to $COMPILE )

      ;                

 

 


Primitive Compiler Words

 

Here is a group of words which support the compiler to build new words in the code dictionary.

 

'  (tick) searches the next word in the input stream for a word in the dictionary.  It returns the execution address of the word if successful.  Otherwise, it displays an error message.

 

ALLOT allocates  n bytes of memory on the top of the code dictionary.  Once allocated, the compiler will not touch the memory locations.

 

,  (comma) adds the execution address of a word on the top of the data stack to the code dictionary, and thus compiles a word to the growing word list of the word currently under construction.

 

COMPILE is used in a colon definition.  It causes the next word after COMPILE to be added to the top of the code dictionary.  It therefore forces the compilation of a word at the run time. 

 

[COMPILE] acts similarly, except that it compiles the next word immediately.  It causes the following word to be compiled, even if the following word is an immediate word which would otherwise be executed.

 

LITERAL compiles an integer literal to the current colon definition under construction.  The integer literal is taken from the data stack, and is preceded by the word doLIT.  When this colon definition is executed, doLIT will extract the integer from the word list and push it back on the data stack.  LITERAL compiles an address literal if the compiled integer happens to be an execution address of a word.  The address will be pushed on the data stack at the run time by doLIT.

 

$," compiles a string literal.  The string is taken from the input stream and is terminated by the double quote character.  $," only copies the counted string to the code dictionary.  A word which makes use of the counted string at the run time must be compiled before the string.  It is used by ." and $".

 

RECURSE is an interesting word which allows eForth to compile recursive definitions.  In a recursive definition, the execution address of the word under construction is compiled into its own word list.  This is not allowed normally because the name field of the current word under construction is not yet linked to the current vocabulary and it cannot be referenced inside its own colon definition.  RECURSE stores the address of the name field of the current word into CURRENT, thus enable it to be referenced inside its own definition.  Recursive words are not used in everyday programming.  RECURSE is defined here in eForth merely as a teaser to wet your appetite.  It is not used in eForth.

 

: ' ( -- xt ) TOKEN NAME? IF EXIT THEN THROW ;

 

: ALLOT ( n -- ) CP +! ;

 

: , ( w -- ) HERE DUP CELL+ CP ! ! ; \ ???ALIGNED

 

: [COMPILE] ( -- ; <string> ) ' , ; IMMEDIATE

 

: COMPILE ( -- ) R> DUP @ , CELL+ >R ;

 

: LITERAL ( w -- ) COMPILE doLIT , ; IMMEDIATE

 

: $," ( -- ) 34 WORD COUNT ALIGNED CP ! ;

 

: RECURSE ( -- ) LAST @ NAME> , ; IMMEDIATE

 

 

Structures

 

A set of immediate words are defined in eForth to help building control structures in colon definitions.  The control structures used in eForth are the following:

 

Conditional branch       IF ... THEN

                                    IF ... ELSE ... THEN

Finite loop                  FOR ... NEXT

                                    FOR ... AFT ... THEN...                                 NEXT

Infinite loop                 BEGIN ... AGAIN

Indefinite loop   BEGIN ... UNTIL

                                 BEGIN ... WHILE ...                           REPEAT

 

This set of words is more powerful than the ones in figForth model because they permit multiple exits from a loop.  Many examples are provide in the source code of eForth like NUMBER?, parse, find and >NAME.

 

A control structure contains one or more address literals, which causes execution to branch out of the normal sequence.  The control structure words are immediate words which compile the address literals and resolve the branch address.

 

One should note that BEGIN  and THEN do not compile any code.  They executes during compilation to set up and to resolve the branch addresses in the address literals.  IF, ELSE, WHILE, UNTIL, and AGAIN do compile address literals with branching words.  Here are many excellent examples on the usage of COMPILE and [COMPILE], and they are worthy of careful study.

 

Character strings are very important devices for the program to communicate with the user.  Error messages, appropriate warnings and suggestions must be displayed to help the use to use the system in a friendly way.  Character strings are compiled in the colon definitions as string literals.  Each string literal consists of a string word which will use the compiled string to do things, and a counted string.  The first byte in a counted string is the length of the string.  Thus a string may have 0 to 255 characters in it.  A string is always null-filled to the cell boundary.

 

ABORT" compiles an error message.  This error message is display when the top item on the stack is non-zero.  The rest of the words in the definition is skipped and eForth re-enters the interpreter loop.  This is the universal response to an error condition.  More sophisticated programmer can use the CATCH-THROW mechanism to customize the responses to special error conditions.

 

." compiles a character string which will be printed which the word containing it is executed in the runtime.  This is the best way to present messages to the user. 

 

$" compiles a character string.  When it is executed, only the address of the string is left on the data stack.  The programmer will use this address to access the string and individual characters in the string as a string array.

 

: <MARK ( -- a ) HERE ;

: <RESOLVE ( a -- ) , ;

: >MARK ( -- A ) HERE 0 , ;

: >RESOLVE ( A -- ) <MARK SWAP ! ;

 

: FOR ( -- a ) COMPILE >R <MARK ; IMMEDIATE

: BEGIN ( -- a ) <MARK ; IMMEDIATE

: NEXT ( a -- ) COMPILE next <RESOLVE ; IMMEDIATE

: UNTIL ( a -- ) COMPILE ?branch <RESOLVE ; IMMEDIATE

: AGAIN ( a -- ) COMPILE  branch <RESOLVE ; IMMEDIATE

: IF ( -- A )   COMPILE ?branch >MARK ; IMMEDIATE

: AHEAD ( -- A ) COMPILE branch >MARK ; IMMEDIATE

: REPEAT ( A a -- ) [COMPILE] AGAIN >RESOLVE ; IMMEDIATE

: THEN ( A -- ) >RESOLVE ; IMMEDIATE

: AFT ( a -- a A ) DROP [COMPILE] AHEAD [COMPILE] BEGIN SWAP ; IMMEDIATE

: ELSE ( A -- A )  [COMPILE] AHEAD SWAP [COMPILE] THEN ; IMMEDIATE

: WHEN ( a A -- a A a ) [COMPILE] IF OVER ; IMMEDIATE

: WHILE ( a -- A a )    [COMPILE] IF SWAP ; IMMEDIATE

 

: ABORT" ( -- ; <string> ) COMPILE abort" $," ; IMMEDIATE

 

: $" ( -- ; <string> ) COMPILE $"| $," ; IMMEDIATE

: ." ( -- ; <string> ) COMPILE ."| $," ; IMMEDIATE

 

 


Compiler

 

We had discussed how the compiler compiles words and structures into the code field of a colon definition in the code dictionary.  To build a new definition, we have to build its header in the name dictionary also.  A header has a word pointer field, a link field, and a name field.  Here are the tools to build these fields.

 

?UNIQUE is used to display a warning message to show that the name of a new word is a duplicate to a word already existing in the dictionary.  eForth does not mind your reusing the same name for different words.  However, giving many words the same name is a potential cause of problems in maintaining software projects.  It is to be avoided if possible and ?UNIQUE reminds you of it.

 

$,n builds a new entry in the name dictionary using the name already moved to the bottom of the name dictionary by PACK$.  It pads the word field with the address of the top of code dictionary where the new code is to be built, and link the link field to the current vocabulary.  A new word can now be built in the code dictionary.

 

$COMPILE builds the body of a new colon definition.  A complete colon definition also requires a header in the name dictionary, and its code field must start with a CALL doLIST instruction. These extra works are performed by :.  Colon definitions are the most prevailing type of words in eForth.  In addition, eForth has a few other defining words which create other types of new definitions in the dictionary.

 

OVERT links a new definition to the current vocabulary and thus makes it available for dictionary searches.

 

; terminates a colon definition.  It compiles an EXIT to the end of the word list, links this new word to the current vocabulary, and then reactivates the interpreter.

 

]  turns the interpreter to a compiler.

 

:   creates a new header and start a new colon word.  It takes the following string in the input stream to be the name of the new colon definition, by building a new header with this name in the name dictionary.  It then compiles a CALL doLIST instruction at the beginning of the code field in the code dictionary.  Now, the code dictionary is ready to accept a word list.  ] is now invoked to turn the text interpreter into a compiler, which will compile the following words in the input stream to a word list in the code dictionary.  The new colon definition is terminated by ;, which compiles an EXIT to terminate the word list, and executes [ to turn the compiler back to text interpreter.

 

call,  compiles the CALL doLIST instruction as the first thing in the code field of a colon definition.

 

IMMEDIATE sets the immediate lexicon bit in the name field of the new definition just compiled.  When the compiler encounters a word with this bit set, it will not compile this words into the word list under construction, but execute the word immediately.  This bit allows structure words to build special structures in the colon definitions, and to process special conditions when the compiler is running.

 

: ?UNIQUE ( a -- a )

  DUP NAME? IF ."  reDef " OVER COUNT TYPE THEN DROP ;

 

: $,n ( a -- )

  DUP C@

  IF ?UNIQUE

    ( na) DUP LAST ! \ for OVERT

    ( na) HERE ALIGNED SWAP

    ( cp na) CELL-

    ( cp la) CURRENT @ @

    ( cp la na') OVER !

    ( cp la) CELL- DUP NP ! ( ptr) ! EXIT

  THEN $" name" THROW ;

 

.( FORTH Compiler )

 

: $COMPILE ( a -- )

  NAME? ?DUP

  IF @ $80 AND

    IF EXECUTE ELSE , THEN EXIT

  THEN 'NUMBER @EXECUTE

  IF [COMPILE] LITERAL EXIT

  THEN THROW ;

 

: OVERT ( -- ) LAST @ CURRENT @  ! ;

 

: ; ( -- )

  COMPILE EXIT [COMPILE] [ OVERT ; IMMEDIATE

 

: ] ( -- ) doLIT $COMPILE 'EVAL ! ;

 

: call, ( xt -- ) \ DTC 8086 relative call

  $E890 , HERE CELL+ - , ;

 

: : ( -- ; <string> ) TOKEN $,n doLIT doLIST  call, ] ;

 

: IMMEDIATE ( -- ) $80 LAST @ @ OR LAST @ ! ;

 

 


Defining Words

 

Defining words are molds which can be used to define many words which share the same run time execution behavior.  In eForth, we have : , USER, CREATE, and VARIABLE.

 

USER creates a new user variable.  The user variable contains an user area offset, which is added to the beginning address of the user area and to return the address of the user variable in the user area.  CREATE creates a new array without allocating memory.  Memory is allocated using ALLOT.  VARIABLE creates a new variable, initialized to 0.

 

eForth does not use CONSTANT, because a integer literal is more economical than a constant.  One can always use a variable for a constant.

 

: USER ( n -- ; <string> )

  TOKEN $,n OVERT

  doLIT doLIST COMPILE doUSER , ;

 

: CREATE ( -- ; <string> )

  TOKEN $,n OVERT

  doLIT doLIST COMPILE doVAR ;

 

: VARIABLE ( -- ; <string> ) CREATE 0 , ;

 


Utilities

 

 

eForth is a very small system and only a very small set of tools are provided in the system.  Nevertheless, this set of tools is powerful enough to help the user debug new words he adds to the system.  They are also very interesting programming examples on how to use the words in eForth to build applications. 

 

Generally, the tools presents the information stored in different parts of the memory in the appropriate format to let the use inspect the results as he executes words in the eForth system and words he defined himself.  The tools include memory dump, stack dump, dictionary dump, and a colon definition decompiler.

 

 

Memory Dump

 

DUMP dumps u bytes starting at address b to the terminal.  It dumps 16 bytes to a line.  A line begins with the address of the first byte, followed by 16 bytes shown in hex, 3 columns per bytes.  At the end of a line are the 16 bytes shown in characters.  The character display is generated by _TYPE, which substitutes non-printable characters by underscores.  Typing a key on the keyboard halts the display.  Another CR terminates the display.  Any other key resumes the display.

 

dm+  displays u bytes from b1 in one line.  It leave the address b1+u on the stack for the next dm+ command to use.

 

_TYPE  is similar to TYPE.  It displays u characters starting from b.  Non-printable characters are replaced by underscores.

 

: _TYPE ( b u -- )

  FOR AFT DUP C@ >CHAR EMIT 1 + THEN NEXT DROP ;

 

: dm+ ( b u -- b )

  OVER 4 U.R SPACE FOR AFT DUP C@ 3 U.R 1 + THEN NEXT ;

 

: DUMP ( b u -- )

  BASE @ >R HEX  16 /

  FOR CR 16 2DUP dm+ ROT ROT 2 SPACES _TYPE NUF? NOT WHILE

  NEXT ELSE R> DROP THEN DROP  R> BASE ! ;

 

 


Stack Tools

 

Data stack is the working place of the Forth computer.  It is where words receive their parameters and also where they left their results.  In debugging a newly defined word which uses stack items and which leaves items on the stack, the best was to check its function is to inspect the data stack.  The number output words may be used for this purpose, but they are destructive.  You print out the number from the stack and it is gone.  To inspect the data stack non-destructively, a special utility word .S is provided in most Forth systems.  It is also implemented in eForth.

 

.S  dumps the contents of the data stack on the screen in the free format.  The bottom of the stack is aligned to the left margin.  The top item is shown towards the left and followed by the characters '<sp'.  .S does not change the data stack so it can be used to inspect the data stack non-destructively at any time.

 

One important discipline in learning Forth is to learn how to use the data stack effectively.  All words must consume their input parameters on the stack and leave only their intended results on the stack.  Sloppy usage of the data stack is often the cause of bugs which are very difficult to detect later as unexpected items left on the stack could result in unpredictable behavior.  .S should be used liberally during Forth programming and debugging to ensure that the correct data are left on the data stack.

 

.S is useful in checking the stack interactively during the programming and debugging.  It is not appropriate for checking the data stack at the run time.  For run time stack checking, eForth provides !CSP and ?CSP.  They are not used in the eForth system itself, but are very useful for the user in developing serious applications.

 

To do run time stack checking, at some point the program should execute !CSP to mark the depth of the data stack at that point.  Later, the program would execute ?CSP to see if the stack depth was changed.  Normally, the stack depth should be the same at these two points.  If the stack depth is changed, ?CSP would abort the execution.

 

One application of stack checking is to ensure compiler security.  Normally, compiling a colon definition does not change the depth of the data stack, if all the structure building immediate words in a colon definition are properly paired.  If they are not paired, like IF without a THEN, FOR without a NEXT, BEGIN without an AGAIN or REPEAT, etc., the data stack will not be balanced and ?CSP is very useful in catching these compilation errors. This stack check is a very simple but powerful tool to check the compiler.  !CSP and CSP are the words to monitor the stack depth.

 

!CSP  stores the current data stack pointer into tan user variable CSP.  The stack pointer saved will be used by ?CSP for error checking.

 

?CSP compares the current stack pointer with that saved in CSP.  If they are different, abort and display the error message 'stack depth'.

 

: .S ( -- ) CR DEPTH FOR AFT R@ PICK . THEN NEXT ."  <sp" ;

: .BASE ( -- ) BASE @ DECIMAL DUP . BASE  ! ;

: .FREE ( -- ) CP 2@ - U. ;

 

: !CSP ( -- ) SP@ CSP ! ;

: ?CSP ( -- ) SP@ CSP @ XOR ABORT" stack depth" ;

 

 


Dictionary Dump

 

The Forth dictionary contains all the words defined in the system, ready for execution and compilation.  WORDS allows you to examine the dictionary and to look for the correct names of words in case you are not sure of their spellings.  WORDS follows the vocabulary thread in the user variable CONTEXT and displays the names of each entry in the name dictionary.  The vocabulary thread can be traced easily because the link field in the header of a word points to the name field of the previous word.  The link field of the next word is one cell below its name field. 

 

WORDS  displays all the names in the context vocabulary.  The order of words is reversed from the compiled order.  The last defined words is shown first.

 

.ID displays the name of a word, given the word's name field address.  It also replaces non-printable characters in a name by under-scores.

 

Since the name fields are linked into a list in the name dictionary, it is fairly easy to locate a word by searching its name in the name dictionary.  However, finding the name of a word from the execution address of the word is more difficult, because the execution addresses of words are not organized in any systematic way.

 

It is necessary to find the name of a word from its execution address, if we wanted to decompile the contents of a word list in the code dictionary.  This reversed search is accomplished by the word >NAME.

 

>NAME  finds the name field address of a word from the execution address of the word.  If the word does not exist in the CURRENT vocabulary, it returns a false flag.  It is the mirror image of the word NAME>, which returns the execution address of a word from its name address.  Since the execution address of a word is stored in the word field, two cells below the name, NAME> is trivial.  >NAME is more complicated because the entire name dictionary must be searched to locate the word.   >NAME only  searches the CURRENT vocabulary.

 

Bill Muench and I spent much of our spare time in July, 1991 to build and polish the eForth Model and the first implementation on 8086/MS-DOS.  One evening he called me and told me about this smallest and greatest Forth decompiler, only three lines of source code.  I was very skeptical because I knew how to build a Forth decompiler.  If a Forth colon definition contains only a simple list of execution addresses, it is a trivial task to decompile it.  However, there are many different data and control structures in a colon definition.  To deal with all these structures, it is logically impossible to have a three line decompiler.

 

I told Bill that I had to see it to believe.  The next time we met, he read the source code in assembly and I entered it into the eForth model.  The decompiler had 24 words and worked the first time after we reassemble the source code. 

 

SEE searches the dictionary for the next word in the input stream and returns its code field address.  Then it scans the list of execution addresses (words) in the colon definition.  If the word fetched out of the list matches the execution address of a word in the name dictionary, the name will be displayed by the command '.ID'.  If the word does not match any execution address in the name dictionary,  it must be part of a structure and it is displayed by 'U.'.  This way, the decompiler ignores all the data structures and control structures in the colon definition, and only displays valid words in the word list. 

 

: >NAME ( xt -- na | F )

  CURRENT

  BEGIN CELL+ @ ?DUP WHILE 2DUP

    BEGIN @ DUP WHILE 2DUP NAME> XOR

    WHILE CELL-

    REPEAT      THEN SWAP DROP ?DUP

  UNTIL SWAP DROP SWAP DROP EXIT THEN DROP 0 ;

 

: .ID ( a -- )

  ?DUP IF COUNT $01F AND _TYPE EXIT THEN ." {noName}" ;

 

: SEE ( -- ; <string> )

  ' CR CELL+

  BEGIN CELL+ DUP @ DUP IF >NAME THEN ?DUP

    IF SPACE .ID ELSE DUP @ U. THEN NUF?

  UNTIL DROP ;

 

: WORDS ( -- )

  CR  CONTEXT @

  BEGIN @ ?DUP

  WHILE DUP SPACE .ID CELL-  NUF?

  UNTIL DROP THEN ;

 

 


Startup

 

Since we expect eForth to evolve as experience is accumulated with usage, and as it has to track the ANS Forth Standard under development, version control becomes an important issue.  To assure compatibility at different stages of development, the user can always inquire the version number of the eForth he is running.  With the version number, corrective actions can be taken to put an overlay on the system to force it to be compatible with another eForth of a different version.

 

VER returns the version number of this eForth system.  The version number contains two bytes: the most significant byte is the major revision number, and the least significant byte is the minor release number.

 

'hi' is he default start-up routine in eForth.  It initializes the serial I/O device and then displays a sign-on message.  This is where the user can customized his application.  From here one can initialize the system to start his customized application.

 

Because all the system variable in eForth are implemented as user variables and the name dictionary is separated from the code dictionary, eForth dictionary is eminently ROMmable and most suitable for embedded applications.  To be useful as a generic model for many different processors and applications, a flexible mechanism is designed to help booting eForth up in different environments. Before falling into the QUIT loop, the COLD routine executes a boot routine whose code address is stored in 'BOOT.  This code address can be vectored to an application routine which defines the proper behavior of the system.

 

After the computer is turned on, it executes some native machine code to set up the CPU hardware so that it emulates a virtual Forth computer.  Then it jumps to COLD to initialize the eForth system.  It finally jumps to QUIT which is the operating system in eForth.  COLD and QUIT are the topmost layers of an eForth system.

 

'BOOT  is an variable vectored to 'hi'.

 

COLD is a high level word executed upon power-up.  Its most important function is to initialize the user area and execute the boot-up routine vectored through 'BOOT, and then falls into the text interpreter loop QUIT.

 

: VER ( -- u ) $101 ;

 

: hi ( -- )

  !IO BASE @ HEX \ initialize IO device & sign on

  CR ." eFORTH V" VER <# # # 46 HOLD # #> TYPE

  CR ;

 

: EMPTY ( -- )

  FORTH CONTEXT @ DUP CURRENT 2!  6 CP 3 MOVE OVERT ;

 

CREATE 'BOOT  ' hi , \ application vector

 

: COLD ( -- )

  BEGIN

    U0 UP 74 CMOVE

    PRESET  'BOOT @EXECUTE

    FORTH CONTEXT @ DUP CURRENT 2! OVERT

    QUIT

  AGAIN ;

 

 

ColdBoot from DOS

 

DOS starts executing the object code at 100H. The eForth Model is configured for a DOS machine.  It can be modified to jump to any memory location from where the CPU boots up.  What we have to do here is to set up the 8086 CPU so that it will emulate the virtual Forth computer as we discussed before.  All the pertinent registers have to be initialized properly.  Since eForth is very small and fits comfortably in a single 64 Kbyte code segment, we will use only one segment for code, data, as well as the two stacks.  Therefore, both the DS and SS segment registers are initialized to be the same as the CS register.  Then, the data stack pointer SP and the return stack pointer RP (BP in 8086) are initialized.  To prevent the eForth from being forced back into DOS accidentally, the Control-C interrupt is made benign by vectoring it to a simple IRET instruction.

 

Now we are ready to start the Forth computer.  Simply jumping to COLD will do it.  COLD is coded as a colon word, containing a list of words.  This word list does more initialization in high level, including initializing the user area, and setting up the terminal input buffer.  At the end, COLD executes QUIT, the text interpreter, which contains an infinite loop to receive commands from a user and executes them repeatedly.

 

The user area contains vital information for Forth to perform its functions.  It contains important pointers specifying memory areas for various activities, such as the data stack, the return stack, the terminal input buffer, where the code dictionary and the name dictionary end, and the execution addresses of many vectored words like KEY, EMIT, ECHO, EXPECT,  NUMBER, etc.

 

The user area must be located in the RAM memory, because the information contained in it are continuously updated when eForth is running.  The default values are stored in the code segment starting at UZERO and covering an area of 74 bytes.  This area is copied to the user area in RAM before starting the eForth computer.  The sequence of data in UZERO must match exactly the sequence of user variables.

 

;; Main entry points and COLD start data

MAIN  SEGMENT

ASSUME  CS:MAIN,DS:MAIN,ES:MAIN,SS:MAIN

ORG   COLDD                         ;beginning of cold boot

ORIG: MOV   AX,CS

      MOV   DS,AX                   ;DS is same as CS

      CLI                           ;disable interrupts, old 808x CPU bug

      MOV   SS,AX                   ;SS is same as CS

      MOV   SP,SPP                  ;initialize SP

      STI                           ;enable interrupts

      MOV   BP,RPP                  ;initialize RP

      MOV   AL,023H                 ;interrupt 23H

      MOV   DX,OFFSET CTRLC

      MOV   AH,025H                 ;MS-DOS set interrupt vector

      INT   021H

      CLD                           ;direction flag, increment

      JMP   COLD                    ;to high level cold start

CTRLC:IRET                          ;control C interrupt routine

 

; COLD start moves the following to USER variables.

; MUST BE IN SAME ORDER AS USER VARIABLES.

$ALIGN                              ;align to cell boundary

UZERO:      DW    4 DUP (0)         ;reserved

      DW    SPP                     ;SP0

      DW    RPP                     ;RP0

      DW    QRX                     ;'?KEY

      DW    TXSTO                   ;'EMIT

      DW    ACCEP                   ;'EXPECT

      DW    KTAP                    ;'TAP

      DW    TXSTO                   ;'ECHO

      DW    DOTOK                   ;'PROMPT

      DW    BASEE                   ;BASE

      DW    0                       ;tmp

      DW    0                       ;SPAN

      DW    0                       ;>IN

      DW    0                       ;#TIB

      DW    TIBB                    ;TIB

      DW    0                       ;CSP

      DW    INTER                   ;'EVAL

      DW    NUMBQ                   ;'NUMBER

      DW    0                       ;HLD

      DW    0                       ;HANDLER

      DW    0                       ;CONTEXT pointer

      DW    VOCSS DUP (0)           ;vocabulary stack

      DW    0                       ;CURRENT pointer

      DW    0                       ;vocabulary link pointer

      DW    CTOP                    ;CP

      DW    NTOP                    ;NP

      DW    LASTN                   ;LAST

ULAST:

 

 

 


Some Final Thoughts

 

 

 

Congratulations if you reach this point the first time.  As you can see, we have traversed a complete Forth system from the beginning to the end, and it is not as difficult as you might have thought before you began.  But, think again what we have accomplished.  It is a complete operating system with an integrated interpreter and an integrated compiler all together.  If you look in the memory, the whole system is less than 7 Kbytes.  What else can you do with 7 Kbytes these days?

 

Forth is like Zen.  It is simple, it is accessible, and it can be understood in its entirety without devoting your whole life to it.

 

Is this the end?  Not really.  There are many topics important in Forth but we had chose to ignore in this simple model.  They include multitasking, virtual memory, interrupt control, programming style, source code management, and yes, metacompilation.  However, these topics can be considered advanced applications of Forth.  Once the fundamental principles in Forth are understood, these topics can be subject for further investigations at your leisure.

 

Forth is not an end to itself.  It is only a tool, as useful as the user intends it to be.  The most important thing is how the user can use it to solve his problems and build useful applications.  What eForth gives you is the understanding of this tool.  It is up to you to make use of it.