Toy interpreter for a stack-based FORTH implementation
This was originally based on a class project for my Data Structures course. I rewrote the entire parser and interpreter from scratch. For the project we were given a reverse-Polish notation calculator program with a working parser, and asked to implement the operators described below (RANDOM
, INPUT
, and the ability to define functions were not included in the class project specification, I came up with those features on my own).
The entire language is implemented in postfix notation and utilizes a stack to execute statements. For example, the statement 5 4 +
will place 5 and 4 on the parameter stack, then execute the addition operation when the +
is encountered, adding 5 and 4 together and placing 9 on the stack.
The parser reads every token in the code and determines what type of token they are. Each token is added to a list of tokens, known as the token buffer. The interpreter reads each token from this list and either adds it to the parameter stack or performs a specific operation.
Any time a string literal, integer, or unknown token is read from the token buffer, it will be pushed onto the parameter stack. Any time a keyword or operator is encountered, the operation associated with that keyword/operator will be executed. This will typically involve popping tokens from the parameter stack, performing an operation on them, and pushing the results of the operation onto the stack. Keywords, operators, variables, and functions are stored in a symbol table. Keyword and operator symbols will have a function pointer which gets called when the token for that symbol is read by the interpreter.
Here is an example hello world program in this language:
."Hello, World!" . CR
Any characters in between ."
and "
are part of a string literal, and get pushed onto the stack as a single token.
The .
operator is the print operation. It prints the token on the top of the stack.
The CR
operator represents a carriage return, and prints a newline.
There are even more operators:
+
, -
, *
, /
, and %
are the same arithmetic operators as in C++, only they use postfix notation.
Operations where the order of the operands matters might be a bit confusing when using postfix notation, but you just put the operator between them. So x y /
, x y %
, and x y -
in postfix are equivalent to x / y
, x % y
, x - y
in the standard infix notation.
NEG
will multiply the item on top of the stack by -1.
.
will print the item on top of the parameter stack.
CR
will print a new line.
SP
will print a space.
EMIT
will print an int as a char using its ASCII value.
DUP
will duplicate the item on top of the stack.
DROP
will discard the item on top of the stack.
SWAP
will swap the positions of the top two items on the stack.
ROT
will move the third item to the top of the stack and push the top two items down.
SET
will initialize a variable using the item on top of the stack as an identifier and the next item as its value.
@
is the "at" or "fetch" operator and is used to access a variable. It will place the value of the variable on the top of the stack.
!
is the "store" operator, and is used to set a variable's value to the item on the top of the stack.
ALLOT
is used to allocate an array of ints. 10 arr ALLOT
will create an array called 'arr' of size 10. Using ALLOT
on an array that already exists will reallocate the array with its original elements (will truncate elements if the new size is smaller).
#@
returns the value at an array's index. arr 3 #@
places the value at index 3 of array 'arr' onto the stack.
#!
stores a value into an array index. 12 arr 4 #!
stores the value 12 at index 4 of array 'arr'.
<
, <=
, ==
, !=
, >=
, and >
all work the same as in C++, only using postfix notation.
Boolean values in this language work just like in C/C++, with 0 being false and anything else being true.
AND
is the same as "&&" in C++.
OR
is the same as "||" in C++.
NOT
is the same as "!" in C++.
IFTHEN
begins an if statement. Uses the value on top of the stack to determine whether to execute the if or else statement.
ELSE
begins the else statement to be executed if the value checked by IFTHEN
is false.
ENDIF
ends an if statement. all IFTHEN
tokens must have a corresponding ENDIF
.
DO
begins a loop. Loops are executed one or more times.
UNTIL
is the ending conditional for a loop. If the value on the top of the stack is true, the loop will end.
DEFINE
defines the token on the top of the stack as an identifier for a function.
END
ends the function definition.
Note that "functions" in this language are actually more like macros than functions, as they simply splice any tokens within the function body into the token buffer. However, argument passing can be simulated with some clever stack operations.
OPEN
will open a text file for reading using the item on top of the stack as the file name.
READ
will read a single character from the open file and place its ASCII value onto the stack.
DUMP
will print the entire contents of the parameter stack, and is used for debugging a program.
RANDOM
will push a random integer on to the stack. Uses the C++ rand() function.
INPUT
will get user input using std::cin and push it onto the stack.
//
represents a comment, just as in C++.
."
begins a string literal and "
ends the string literal.
Here is an example program that prints the numbers 0 to 10:
0 i SET //initialize i to 0
DO //begin loop
i @ . CR //prints i
i @ 1 + i ! //increments i
i @ 10 > UNTIL //until i > 10
Here is the same program that uses the stack instead of a variable to hold the value:
0 DO //begin loop with value of 0
DUP . CR //print the value on top of the stack
1 + //increment the item on top of the stack
DUP 10 > UNTIL //until the value is greater than 10
DROP //discard the loop value