Skip to content

Latest commit

 

History

History
620 lines (482 loc) · 20.3 KB

asm.md

File metadata and controls

620 lines (482 loc) · 20.3 KB

uFork assembly language

This document describes a textual assembly language for uFork.

An example

; A fibonnacci service behavior.

; It expects a message containing a customer and a fixnum. The nth
; fibonnacci number is calculated and sent to the customer.

.import
    std: "./std.asm"

beh:                    ; () <- (cust n)
    msg 2               ; n
    dup 1               ; n n
    push 2              ; n n 2
    cmp lt              ; n n<2
    if std.cust_send    ; n

    msg 1               ; n cust
    push k              ; n cust k
    new -1              ; n k.cust

    pick 2              ; n k n
    push 1              ; n k n 1
    alu sub             ; n k n-1
    pick 2              ; n k n-1 k
    push beh            ; n k n-1 k beh
    new 0               ; n k n-1 k fib.()
    send 2              ; n k

    roll 2              ; k n
    push 2              ; k n 2
    alu sub             ; k n-2
    roll 2              ; n-2 k
    push beh            ; n-2 k beh
    new 0               ; n-2 k fib.()
    send 2              ; --
    ref std.commit

k:                      ; cust <- m
    state 0             ; cust
    msg 0               ; cust m
    push k2             ; cust m k2
    beh 2               ; k2.(cust m)
    ref std.commit

k2:                     ; (cust m) <- n
    state 2             ; m
    msg 0               ; m n
    alu add             ; m+n
    state 1             ; m+n cust
    ref std.send_msg

.export
    beh

Overview

In many ways, the language is similar to traditional assembly languages:

  • At the top level it has directives (such as .import), labels and statements.
  • Each statement consists of an operator followed by its operands.
  • Linear execution is expressed by writing instruction statements one after another.
  • A semicolon (;) marks the remainder of the line as a comment. Blank lines and comments may be injected between any two lines.

In some ways, however, it is quite different:

  • Code is always within a module. Imports and exports are declared explicitly. There is potential for modules to be fetched from the filesystem or via the network.
  • Statements always produce a value. Often the value is an instruction, but it can also be a literal, number, or data structure. Values are immutable. Data structures can contain instructions.
  • By convention, end-of-line comments often show a picture of the stack after an instruction executes, to clarify the effect. The top of the stack is the right-most element.

Statements

Statements begin with an operator (such as alu), followed by zero or more operands. An operand is either an enumeration (such as sub) or an expression (such as 2 or std.commit). Statements are indented by at least one space.

Each statement may be preceeded by one or more labels (and the first statement must be). Labels are not indented. Labels within a module must have unique names.

Expressions

Expressions appear only in operand position.

There are five literal values:

  • #? (undefined)
  • #nil (empty list)
  • #unit (inert result)
  • #t (true)
  • #f (false)

Fixnums are signed integers that fit within a single machine word, for example 7 or -1000. An explicit radix may be provided, such as 16#F0a1. Character literals are recognized inside single-quotes, such as 'A', 'z', or '\n'.

All values have a type, queryable with the typeq instruction. The following types are currently supported:

  • #fixnum_t
  • #type_t
  • #pair_t
  • #dict_t
  • #instr_t
  • #actor_t

Values can also be referenced by name. In singular form, name expressions refer to a labelled statement within the current module. This was used in the introductory example to push the instruction labelled beh onto the stack:

push beh

Names must start with a letter. Names may contain letters and numbers. Groups of letters and numbers may be separated by underscore (_) or hyphen (-). The following are all valid names:

  • send_0
  • CONT_ID
  • isNil
  • take-2nd

Exotic names may be enclosed in double-quotes ("). They may contain any non-control character except double-quote. These names are usually used by code-generators.

There is also a compound form of name, which refers to an imported value. A period (.) separates the module name from the import name:

if std.cust_send

Ref

The ref statement takes an expression as its operand and produces the value of that expression.

Operator First operand
ref expression

ref allows expressions to be labelled, for example

my_alias:
    ref my_import.an_inconveniently_long_name

magic_number:
    ref 42

It also provides an alternative way to continue to a named instruction. The following are equivalent:

explicit_k:
    typeq #fixnum_t my_continuation

implicit_k:
    typeq #fixnum_t
    ref my_continuation

Instructions

The following table just describes the syntax and semantics of instruction statements. The Input depicts the stack before the operation. The Output depicts the stack after the operation. The top of the stack is the right-most element.

Input Instruction Output Description
push value value push literal value on stack
vₙv₁ dup n vₙv₁ vₙv₁ duplicate top n items on stack
vₙv₁ drop n remove n items from stack
vₙv₁ pick n vₙv₁ vₙ copy item n to top of stack
vₙv₁ pick -n v₁ vₙv₁ copy top of stack before item n
vₙv₁ roll n vₙ₋₁v₁ vₙ roll item n to top of stack
vₙv₁ roll -n v₁ vₙv₂ roll top of stack to item n
n alu not ~n bitwise not n
n m alu and n&m bitwise n and m
n m alu or n|m bitwise n or m
n m alu xor n^m bitwise n exclusive-or m
n m alu add n+m sum of n and m
n m alu sub n-m difference of n and m
n m alu mul n*m product of n and m
v typeq T bool #t if v has type T, otherwise #f
u eq v bool #t if u == v, otherwise #f
u v cmp eq bool #t if u == v, otherwise #f
u v cmp ne bool #t if u != v, otherwise #f
n m cmp lt bool #t if n < m, otherwise #f
n m cmp le bool #t if n <= m, otherwise #f
n m cmp ge bool #t if n >= m, otherwise #f
n m cmp gt bool #t if n > m, otherwise #f
bool if T [F] if bool is not falsy*, continue T (else F)
bool if_not F [T] if bool is falsy*, continue F (else T)
k jump continue at k
tail head pair n pair create pair from head and tail (n times)
vₙv₁ pair -1 (v₁vₙ) capture stack items as a single pair list
pair part n tail head split pair into head and tail (n times)
(v₁vₙ) part -1 vₙv₁ spread pair list items onto stack
(v₁vₙ . tailₙ) nth n vₙ copy item n from a pair list
(v₁vₙ . tailₙ) nth -n tailₙ copy tail n from a pair list
dict key dict has bool #t if dict has a binding for key, otherwise #f
dict key dict get value the first value bound to key in dict, or #?
dict key value dict add dict' add a binding from key to value in dict
dict key value dict set dict' replace or add a binding from key to value in dict
dict key dict del dict' remove first binding for key in dict
deque new deque an empty deque
deque deque empty bool #t if deque is empty, otherwise #f
deque value deque push deque' insert value as the first element of deque
deque deque pop deque' value remove the first value from deque, or #?
deque value deque put deque' insert value as the last element of deque
deque deque pull deque' value remove the last value from deque, or #?
deque deque len n count elements in the deque
T quad 1 quad create quad [T, #?, #?, #?]
X T quad 2 quad create quad [T, X, #?, #?]
Y X T quad 3 quad create quad [T, X, Y, #?]
Z Y X T quad 4 quad create quad [T, X, Y, Z]
quad quad -1 T extract 1 quad field
quad quad -2 X T extract 2 quad fields
quad quad -3 Y X T extract 3 quad fields
quad quad -4 Z Y X T extract 4 quad fields
msg 0 msg copy event message to stack
msg n msgₙ copy message item n to stack
msg -n tailₙ copy message tail n to stack
state 0 state copy actor state to stack
state n stateₙ copy state item n to stack
state -n tailₙ copy state tail n to stack
my self actor push actor address on stack
my beh beh push actor behavior on stack
my state vₙv₁ spread actor state onto stack
mₙm₁ actor send n send (m₁mₙ) to actor
msg actor send -1 send msg to actor
sponsor mₙm₁ actor signal n send (m₁mₙ) to actor using sponsor
sponsor msg actor signal -1 send msg to actor using sponsor
vₙv₁ beh new n actor create an actor with code beh and data (v₁vₙ)
state beh new -1 actor create an actor with code beh and data state
(beh . state) new -2 actor create an actor with code beh and data state
[_, _, _, beh] new -3 actor create an actor with code beh and data [_, _, _, beh]
vₙv₁ beh beh n replace code with beh and data with (v₁vₙ)
state beh beh -1 replace code with beh and data with state
(beh . state) beh -2 replace code with beh and data with state
[_, _, _, beh] beh -3 replace code with beh and data with [_, _, _, beh]
reason end abort abort actor transaction with reason
end stop stop current continuation (thread)
end commit commit actor transaction
sponsor new sponsor create a new empty sponsor
sponsor n sponsor memory sponsor transfer n memory quota to sponsor
sponsor n sponsor events sponsor transfer n events quota to sponsor
sponsor n sponsor cycles sponsor transfer n cycles quota to sponsor
sponsor sponsor reclaim sponsor reclaim all quotas from sponsor
sponsor control sponsor start run sponsor under control
sponsor sponsor stop reclaim all quotas and remove sponsor
actual assert expect assert actual == expect, otherwise fail!
debug debugger breakpoint

* For conditionals (if and if_not) the values #f, #?, #nil, and 0 are considered "falsy".

Every instruction (except end) takes a continuation as its final operand. In the following example, the msg instruction continues to the end instruction.

beh:
    msg 1 done
done:
    end commit

But it is not necessary to label every instruction, because the last continuation operand defaults to the next instruction when omitted. The same two instructions could more easily be written

beh:
    msg 1
    end commit

The std.asm module contains (among others) the following definition:

commit:
    end commit

This allows the previous example to be written as:

.import
    std: "./std.asm"

beh:
    msg 1
    ref std.commit

In practice, most instructions are implicitly continued at the subsequent instruction.

Pair-List Indexing

Instructions like msg, state, and nth have an immediate index argument (n) to succinctly designate parts of a pair-list.

  • Positive n designates elements of the list, starting at 1
  • Negative n designates list tails, starting at -1
  • Zero designates the whole list/value
  0              -1              -2              -3
---->[head,tail]---->[head,tail]---->[head,tail]---->...
    +1 |            +2 |            +3 |
       V               V               V

...or more compactly...

0-->[1,-1]-->[2,-2]-->[3,-3]--> ...
     |        |        |
     V        V        V

If the index is out-of-bounds, the result is #? (undefined).

Data

The pair_t and dict_t statements construct data structures from their operands. The type_t statement makes custom types. The quad_1, quad_2, quad_3, and quad_4 statements construct quads, taking a type as the T operand.

Instruction First operand Second operand Third operand Fourth operand
pair_t head [tail]
dict_t key value [next]
type_t arity
quad_1 [T]
quad_2 T [X]
quad_3 T X [Y]
quad_4 T X Y [Z]

Like with instructions, the final operand defaults to the next statement when omitted. In the following example, a is a list of three elements, 0, 1, and 2.

a:
    pair_t 0
    pair_t 1
    pair_t 2
    ref #nil

Pairs do not have to construct lists. Here, b is a pair that simply holds the values true and false.

b:
    pair_t #t #f

An identical pair could be constructed using quad_3 instead.

b:
    quad_3 #pair_t #t #f

Here, c is a dictionary containing entries 0: false and true: 1.

c:
    dict_t 0 #f
    dict_t #t 1
    ref #nil

Quads can also be constructed with custom types.

ternary:
    type_t 3

custom_quad:
    quad_4 ternary 1 2 3

Grammar

The grammar is written in McKeeman form.

Each source unit consists of a single module. A module may begin with comments or blank lines, and must end with a newline character.

module
    maybe_newlines import_declaration definitions export_declaration

The import_declaration, definitions and export_declaration are all optional.

import_declaration
    ""
    ".import" newlines imports

imports
    import
    import imports

import
    indent name ':' spaces string newlines

definitions
    ""
    definition definitions

definition
    labels statements

labels
    label
    label labels

label
    name ':' newlines

statements
    statement
    statement statements

statement
    indent operator operands newlines

operator
    azs
    azs '_' operator

azs
    az
    az azs

az
    'a' . 'z'

operands
    ""
    spaces value operands

value
    literal
    fixnum
    type
    ref

literal
    undef
    nil
    unit
    true
    false

undef
    "#?"

nil
    "#nil"

unit
    "#unit"

true
    "#t"

false
    "#f"

fixnum
    '-' one_to_nine
    '-' one_to_nine digits
    base_ten
    base_ten '#' base_any
    single_quote character_literal single_quote

base_ten
    digit
    one_to_nine digits

digits
    digit
    digit digits

digit
    '0'
    one_to_nine

one_to_nine
    '1' . '9'

base_any
    alphameric
    alphameric base_any

character_literal
    "\b"
    "\t"
    "\n"
    "\r"
    "\'"
    "\\"
    character - single_quote - '\'

single_quote
    '0027'

type
    "#literal_t"
    "#fixnum_t"
    "#type_t"
    "#pair_t"
    "#dict_t"
    "#instr_t"
    "#actor_t"

ref
    name
    name '.' name

Unicode names are not currently supported due to the security concerns described at http://cweb.github.io/unicode-security-guide/, but if support were to be added it would build on the XID_Start and XID_Continue classes mentioned in https://unicode.org/reports/tr31/.

name
    alpha trailing
    string

trailing
    ""
    '_' alphameric trailing
    '-' alphameric trailing
    alphameric trailing

alphameric
    alpha
    digit

alpha
    'a' . 'z'
    'A' . 'Z'

The module must declare at least one export.

export_declaration
    ""
    ".export" newlines exports

exports
    export
    export exports

export
    indent name newlines

A line may end in a comment. A line may be followed by any number of blank lines or comment lines.

maybe_newlines
    ""
    newlines

newlines
    newline
    newline newlines

newline
    crlf
    maybe_spaces ';' maybe_comment crlf

crlf
    '000D' '000A'
    '000A'
    '000D'

maybe_spaces
    ""
    spaces

maybe_comment
    ""
    comment

comment
    nonspace
    character comment

string
    '"' nonquotes '"'

nonquotes
    ""
    nonquote nonquotes

nonquote
    character - '"'

nonspace
    character - ' '

character
    '0020' . '007E'
    '00A0' . '10FFFF'

indent
    spaces

spaces
    space
    space spaces

space
    '0020'