Skip to content

setl4/setl4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SETL4 - An Implementation of SETL Written in SPITBOL

What is SPITBOL?

SNOBOL is a general purpose programming language with special emphasis processing strings and text that was developed at Bell Labs in the 1960's by a team led by Ralph Griswold. The final version was called SPITBOL4.

Macro SPITBOL, or SPITBOL, is a very efficient implementation of SNOBOL4. Created by Robert B. K. Dewar and Ken Belcher in 1969, SPITBOL/360 was written in IBM/360 assembly language, in a style Dewar termed "aggressive assembly." That is, using every trick in the book to write the fastest, most compact, code possible.

For example, SNOBOL4 requires maintaining a count of the number of statements executed and the ability to stop execution when a specified number of statements have been executed. SPITBOL/360 does this by computing an unnormalized floating point constant such that successive increments of it result in floating point overflow when the limit is reached.

Dewar and Belcher also wrote Realia COBOL, a COBOL compiler for Intel/x86 written in COBOL. It produced very efficient code, better code than, for example, that produced by IBM's product COBOL compiler.

Dewar ported SPITBOL/360 to several machines. Dewar joined the faculty of the Courant Institute of Mathematical Sciences (CIMS) of New York University (NYU) as a Professor in the early 1970's. In 1973, while working with Anthony P. "Tony" McCann of Leeds University, Dewar created Minimal, a portable assembly language. Dewar and McCann then rewrote SPITBOL in Minimal, resulting in Macro SPITBOL, or SPITBOL. The implementation has proved very stable, with only minor changes in the last several decades.

Macro SPITBOL is remarkably small. The code consists of about 28,000 lines: 2000 lines of comments defining Minimal, 5000 lines of constant/data declarations, and 21,000 lines of code. Every line of code has a comment. The executable for x86 Linux is less than 150 kilobytes.

SPITBOL was implemented for several machines and operating systems by a small team that included Steve Duff, Mark Emmer, Bob Goldberg, and Dave Shields: ICL 1900, Univac, CDC 6600, IBM PC, Apple Macintosh, SUN Solaris Sparc, Microsoft (DOS/NT/Windows), Intel x86_64 (Unix/Linux), and x86_64 Apple iOS. Special credit is due Mark Emmer, who led the project from the mid 1980's to 2009, when Dave Shields became the maintainer of SPITBOL.

SETL4 requires SPITBOL V4.0 (or later), the binary of which can be found in the file `./bin/setl4'.

What is SETL?

SETL (SET Language) is a programming language with finite sets as the fundamental data type. It was created by Jacob T. "Jack" Schwartz of the Courant Institute of Mathematical Sciences (CIMS) of New York University (NYU).

Jack founded the SETL project in 1970. It was later funded by two five-year grants from the Office of Naval Research.

SETL was used to write NYU Ada/Ed, the first validated Ada compiler. Devloped on the DEC Vax 11/780, Ada/Ed was ported to the IBM PC (DOS) by a team led by Dave Shields.

Why the name SETL4?

The SETL Project produced three implementations of SETL. The first was written by Dave Shields in BALM, a language developed by Prof. Malcom Harrison of CIMS, the second by Henry S. "Hank" Warren in PL/I while on leave from IBM, and the third by Robert B. K. Dewar and Art Grand in LITTLE, a low-level implementation language developed at CIMS.

SETL4 is the fourth implementation of SETL produced during the course of the SETL Project. Dave Shields wrote the first implementation. This was followed by two more implementations. This is the fourth implementation written by a member of the SETL project, hence the name SETL4.

Brief Introduction to SETL4

SETL4 extends SPITBOL by adding the datatype set to represent finite sets.

A set is a collection of distinct elements: for example, {a,b,c} is a set, as is {a,b,c,a}; however, {a,b,c,a} has only three elements, and is equal to {a,b,c}.

For example, the folowing SETL4 expression determines if the integer P is prime:

       !exists(|'int 2 P' @ 'multiple(P,this)')

where multiple(a,b) is true if and only if a is a multiple of b.

Sets in SETL4

SETL4 provides several kinds of sets.

Sets of kind 'set' are a collection of distinct members.

Sets of kind 'map' consist of a series or ordered pairs, called entries. The first element of an entry is the key, and the second is the value. No two entries in the map can have the same key but different values.

Sets of kind 'iterator' consist of integers of the form

   low, low + step, ... high

or

   high, high - step ... low

If only low is specified, then the iteator is interpreted as 'iterator 1 low'. For example, 'iterator 100' is the set 1, 2, ... 100. and 'iterator 5 25 5' is the set {5 10 15 20 25}; where iteration over the iterator will go in order 5, 10 ... 25. 'iterator 25 5 -5' is the same set; but iteration will go in the order 25,20 ... 5.

If high is less than low, then a negative step is undersood, even if it is given as a positive value. For example, both iter 25 5 -5 and `iter 25 5 5' go in order "25, 20 ... 1"

Iterators support efficient iteration, but they provide only the operations of iteration and membership; you cannot add or remove an element from an iterator.

Sets of kind 'integer' are intended or sets of integers containing many elements. See below for more details.

A sequence is a map defined on a set of consecutive positive integers. For example 'sequence 5' is a map defined on 1, 2, ... 5.

Sequences are represented in the same way as a map. They differ from a map only in that iteration over a sequence returns the values of the sequence, not the instances of entry that are returned when iterating over a map.

By convention, the kinds are writtten 'int', 'iter', 'map', 'seq', and 'set'.

Sets of integers

SETL4 sets of kind integer provide an efficient implementation of a set with many non-negative integers, such as a set with more than a million integers.

The set is represented by a table of strings, each with setl4.config.int characters, so that the first block represents the integers `0 .. setl4_config.int -1 and so forth. Addition to the set is done by setting the appropriate character in a block to '+'. New blocks are allocated only when necesary.

The SPITBOL function BREAK is used to find the next element in the set when iterating over the set.

Ordered Pairs

In set theory, sets are not ordered. The set {a,b} is the same as (equal to) the set {b,a}.

Although sets are not ordered, sets can be used to define an ordered pair:

data('pair(first,second)')

by defining pair(a,b) to be

_{a,{a,b}}_

It is easy to prove that, given this definition, pair(a,b) is equal to pair(c,d) if and only if a is equal to c and b is equal to d.

Maps

A map defined on a finite set is a set of ordered pairs, or entries, that define a relation between the first element of an entry, its key, and the second element of the entry, the value of the map for key.

data('entry(key,value)')

A set of ordered pairs is a map if and only if there do not exist two entries entry(a,b) and entry(c,d) such that a is equal to c and b is not equal to d. For example, the map defined by the set {[one,1],[2,two],[one,1]} is valid, but one cannot define a map based on the set {[one,1],[2,two],[one,3]}.

However, SETL4 does allow the use of a set of pairs that would not be allowed in stanard set theory to define a map. If two of more pairs have the same first value, the the last such pair encountered defines the value to be used. For exampe, the set {entry(one,1),entry(2,two),entry(one,3)} produces the same map as {entry(one,3),entry(2,two)}.

Though you can use _entry to give a key and its associated value, the preferred convention is separate a key and its value with a colon. allows you to write "key:value,"

       new('map' one:1 two:2 three:3')

instead of

       new('map' entry('one',1) ' ' entry('two',2) ' ' entry('three',3) )

Though in set theory, maps are defined in terms of sets, in the SETL4 implementation, sets are represented are maps, by mapping each element to itself. For example, the set {a,b,c} is kept as the map: {a:a,b:b,c:c}.

SETL4 supports adding entries to a set of kind map by using the key and value in the entry to define the value of the map for key(entry) to be value(entry).

The SPITBOL datatype TABLE provides the machinery to implement maps. SPITBOL provides no way to determine the number of entries in a table other than converting the map to an array. This is the only way in SPITBOL to iterate over a SPITBOL table.

For this reason, maps are implemented using two tables: index and key. Each new entry added to a map is given an integer id, starting with one. The field index maps the id's to the associated pairs. The entries in index allow efficient iteration over the

The table key maps keys in the map to the corresponding entry in the map. The entries in key permit efficient retrieval of the map's value for a given key.

SETL4 supports stacks by implementing them as a sequence. The fuction push() pushes a value on to the stack; the function pop removes the top value.

Loops, Iterators

Loops in SPITBOL have the form of a test followed by the loop body, with the body ending with a branch back to the test.

For example, here is a loop that prints the first one hundred integers:

test
       i = i + 1
       le(i,100)                       :f(done)
       output = i                      :(test)
done

This can also be written as:

test
       output = le(i = i + 1,100) i    :s(test)

Loops in SETL4 are constructed using the loop and next statements.

A loop has the form:

       loop(set,thisname)

   set.next
       var = next(set)     :f(set.done)
       out(this)           :(set.next)
   set.done

The first operand specifies of loop gives the set or range of values of the iteration. The optional second argument names a variable that is updated as part of successful call to next. The name this is used if the second argument is not given. The loop statement above can be written as loop(set) or also loop(set,.this) or _loop(set,'this').

Loops can have a specific variable associated with a loop.

       loop(set,.this.s)
       ...
   set.next
       next(set)           :f(set.done)
       out(this.s)         :(set.next)
   set.done

The loop statement creates an instance of iterator to control the iteration. The next statement advances the iteration by producing the next element or failing if no more elements remain.

See the functions prime and primes defined below to see loop and next in action. See also the code for exists and forall.

Showing Values

Use the function show() to display the value of a SETL4 or SPITBOL object

Sets are listed with the values enclosed in '{}'. Sequences of ten or fewer elements are listed with the values enclosed in '()'. Sequences of more than ten elements are listed with the index and value enclosed in '[]'). Pairs and entries are listed with the component values enclosed in '()'.

Additional Operations Provided as an Artifact of the Implementation

The use of the tables set.index and set.key to implement a set makes it possible to provide some operations not available in in standard set theory. For example, it is possible to retrieve elements using the function get. Get(s,i) retrieves the i-th element in the set.

The function sorter can be used to determine the order in which the elements of a set are accessed during an iteration, or to 'sort' the tables implementing the set. For example, sorter(s,'+v') sorts a map in increasing order of the values of the map. sorter(s,'-v') sorts the values in descendng order, and so forth. For example, here is the code to find the ten members with the largest values in a map:

       sorter(map,'-v')
       largest = new('set')
       iter = new('iter 1 10')
       loop(set)

   largest.add

       add(largest,next(set))               :s(largest.add)

Sorter is used by the show function to display the values of a set or map in a standard order. For example, if s is a set with the elements 'a', 'b' and 'c', then show(s) yields set 'a' 'b' 'c' }, not set 'b' 'c' 'a' }.

SETL4 Operator Symbols

SETL4 provides the following operator symbols:

#s returns the size of a string or set s x ~ s tests if x is member of set s s @ e is equivalent to filter(s,e) =s returns the value of the iteration variable (usually 'this') of a loop.

For example, member(x,s) can be written x ~ s, set.size(x) can be written #s, and filter(s,e) can be written s @ e.

Sample data

Programs need data. Since SPITBOL is designed to analyze and manipulate text, SETL4 provides a library consisting of several works chosen from the world's greatest literature, including the works of William Shakespeare, a dictionary, and the text of the King James version of the Bible. Except for the dictionary, all texts were obtained using Project Gutenberg.

As an extension of SPITBOL, SETL4 brings the raw power and speed of SPITBOL to the task of working with text. Additional functions are provided, including a lexical scanner, or tokenizer, to assist in performing deep, detailed analysis of textual structure.

Several of these functions have as their purpose the translation of text into sets, maps, and sequences. For example, the tokenizer returns a sequence consisting of the tokens in a line of text.

In order to be able to use the program ./examples/align.stl to align the goto fields in the source, use the variable char(58) where the colon character enclosed in quotes would normally be used.

SETL4 Functions

SETL4 provides the following functions:

  • add(set,elem) Adds element to set.
  • and(a,b) Tests if both operands are true.
  • append(str,w,ch) Appends ch (or space if ch is null) to str, then w
  • arbitrary(n) Returns arbitary (randomly chosen) integer in 1 .. n.
  • arbitrary(set) Returns arbitary (randomly chosen) element of set.
  • assert(expr) Tests that expr is true, ends execution otherwise.
  • binary(n) Returns string with value of n in binary number.
  • checkout(filename) Returns ('checks out') set or map defined by a text file from the libary
  • compare(a,b) Compares two integers or strings, returning -1 (less), 0 (equal), or +1 (greater).
  • compose(a,b) Returns composition of two maps or tables.
  • defined(map,key) Tests if a map is defined for a specified key.
  • difference(a,b) Returns set of members of set a not in the set b.
  • domain(map) Returns set of elements in the domain of a map.
  • equal(a,b) Tests if two SETL4 objects are equal.
  • error(s) Write out string s and end execution.
  • even(n) Tests if n is even.
  • exists(set,expr) Tests if an expression is true for at least one element in a set.
  • factorial(n) Returns n! = n * (n -1) * ... * 1.
  • false(e) Tests if argument is false.
  • filter(set,expr) Returns set of elements in set for which expr is true.
  • forall(set,expr) Tests if an expression is true for every element of a set.
  • frequency(s) Returns frequency of values in sequence, map or string s.
  • get(map,key) Gets the value of map map for key for a map.
  • int(n) Returns n if n is integer or is string in exponential form
  • integers(n) Returns set of integers 1 ... n.
  • intersection(a,b) Returns set of elements common to two sets.
  • join(a,b) Joins several sequences/strings into one sequence/string.
  • loop(set,this) Set up iteration over set using next. this names variable updated during iteration.
  • multiple(n,m) Tests if n is a multiple of m.
  • odd(n) Tests if n is odd.
  • out(text1,text2) Outputs text1, then text2 enclosed in '[]' if text2 not null.
  • pack(seq) Packs sequence of strings into single string.
  • powerset(set) Return the powerset of set, the set of all the subsets of set.
  • product(seq) Returns (Cartesian) product of a sequence of sets or strings.
  • member(set,elem) Tests if elem is a member of set.
  • new(str) Returns a new set specified by str.
  • next(set) Returns next element in iteration defined by prevous loop, fails if no more elements.
  • not(expr) Tests if operand is false.
  • number(s) Returns integer defined by s.
  • or(a,b) Tests if either operand is true.
  • pop(stack) Pop the top of a stack and return its value.
  • prime(n) Tests if n is prime.
  • primes(n) Returns set of primes less than n.
  • push(stack,value) Push value onto a stack.
  • put(map,key,val) Defines the value of a map key.
  • quicksort(seq) Use Hoare's quicksort algorithm to sort a sequence.
  • random(n) Returns random integer if n is integer, elsel random element of set or map.
  • random.seed() Sets random number seed to initialize random.
  • range(map) Returns set of elements in the range of a map.
  • reader(filename,expr) Returns sequence of lines in file filename, using expr (if given).
  • reader(str,delim) Like reader(filename..) but reads lines from a string with lines.
  • set.size(s) Returns number of elements in set set.
  • #set Same as set.size(set)
  • show(v) Show value of v.
  • show.boolean(e) Shows value of b as boolean.
  • show.eval(expr) Evaluates expr, returns 'success' if true, else 'false'
  • show.text(v) Same as show(), but strings are not enclosd in quotes.
  • sorter(set,type) Sorts a set, map or string according to type.
  • slice(str,first,last) Like SPITBOL substr to work for sequences and tuples.
  • split(s) Returns sequence of whitespace-separated words in s.
  • square.root(n) Returns integer square root on n
  • subset(a,b) Tests if the set b is a subset of set a
  • this(set) Returns current iteration value for set.
  • tokens(s) Returns sequence of the tokens in s.
  • tolower(s) Returns s with upper case letters replaced by lower case equivalent.
  • toupper(s) Returns s with lower case letters replaced by upper case equivalent.
  • top(s) Returns top element of sequence viewed as stack.
  • traceon() Turns on tracing.
  • traceoff() Turns off tracing
  • true(e) Tests if operand is true.
  • union(a,b) Returns set consisting of all the members in two sets.
  • unpack(str) Unpacks string into sequence of its characters.
  • visit(set,expr) Visits each element of set set and evaluates expr for that element.
  • words(s,w) Returns sequence of words (defined by w) in s.
  • writer(lines) Writes sequence lines to standard output.

About

SETL4 brings the raw power and speed of SPITBOL to non-numeric computation using set-theoretic constructs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published