Skip to content

Latest commit

 

History

History
402 lines (364 loc) · 22.5 KB

vm.md

File metadata and controls

402 lines (364 loc) · 22.5 KB

Virtual Machine Semantics

The uFork virtual machine is designed to support machine-level actors. All instructions execute within the context of an actor handling a message-event. There is no support for procedure/function call/return. Instead actors are used to implement procedure/function abstractions. There is no support for address arithmetic or load/store of arbitrary memory. Mutation is always local to an actor's private state. Immutable values are passed between actors via message-events. External events (such as "interrupts") are turned into message-events.

Representation

The quad-cell is the primary internal data-structure in uFork. It consists of four integers (current compile options are 16, 32, and 64 bits).

t x y z
type/proc head/car tail/cdr link/next

The integers in each field carry a type tag in their 2 most-significant bits (MSBs). The 1st MSB is {0=indirect-reference, 1=direct-value}. The 2nd MSB is {0=transparent, 1=opaque}. The resulting type-heirarchy looks like this:

                    any-value
                   0 /     \ 1
               indirect   direct (fixnum)
              0 /    \ 1
(quad) transparent  opaque (ocap)

Direct values (fixnums) are stored in 2's-complement representation, where the 2nd MSB is the sign bit of the integer value.

Indirect values (references) desigate quad-cells (with fields {t, x, y, z}).

Opaque values (object-capabilities) cannot be dereferenced except by the virtual-processor (to implement actor primitive operations).

Transparent values designate quad that may be written as well as read.

Data Structures

Quad-cells are used to encode most of the important data-structures in uFork.

Structure Description
{t:Event_T, x:target, y:msg, z:next} actor event queue entry
{t:IP, x:SP, y:EP, z:next} continuation queue entry
{t:Instr_T, x:opcode, y:data, z:next} machine instruction (typical)
{t:Symbol_T, x:hash, y:string, z:value} immutable symbolic-name
{t:Pair_T, x:head, y:tail} pair-lists of user data (cons)
{t:Pair_T, x:item, y:rest} stack entry holding item
{t:Actor_T, x:beh, y:sp, z:?} idle actor
{t:Actor_T, x:beh', y:sp', z:events} busy actor, intially {z:()}
{t:Fexpr_T, x:actor, y:?, z:?} interpreter (cust ast env)
{t:Fexpr_T, x:self, y:msgs, z:beh} meta-actor transaction
{t:Dict_T, x:key, y:value, z:next } dictionary binding entry
{t:Free_T, z:next} cell in the free-list

Instructions

The uFork instruction execution engine implements a linked-stack machine, however the stack is only used for local state in a computation. The input for each instruction is taken from the stack and the output is placed back onto the stack. Instructions all have a t field containing Instr_T type marker. The operation code is carried in the x field of the instruction. Many instructions also have an immediate value, usually carried in the y field of the instruction. For the typical case of a instruction with a single continuation, the "next instruction" is carried in the z field of the instruction.

Input Instruction Output Description
v {x:VM_typeq, y:T, z:K} bool TRUE if v has type T, otherwise FALSE
T {x:VM_cell, y:1, z:K} cell create cell {t:T}
T X {x:VM_cell, y:2, z:K} cell create cell {t:T, x:X}
T X Y {x:VM_cell, y:3, z:K} cell create cell {t:T, x:X, y:Y}
T X Y Z {x:VM_cell, y:4, z:K} cell create cell {t:T, x:X, y:Y, z:Z}
cell {x:VM_get, y:T, z:K} t get t from cell
cell {x:VM_get, y:X, z:K} x get x from cell
cell {x:VM_get, y:Y, z:K} y get y from cell
cell {x:VM_get, y:Z, z:K} z get z from cell
cell T {x:VM_set, y:T, z:K} cell' set t to T in cell
cell X {x:VM_set, y:X, z:K} cell' set x to X in cell
cell Y {x:VM_set, y:Y, z:K} cell' set y to Y in cell
cell Z {x:VM_set, y:Z, z:K} cell' set z to Z in cell
... tail head {x:VM_pair, y:n, z:K} pair create {t:Pair_T, x:head, y:tail} (n times)
pair {x:VM_part, y:n, z:K} ... tail head split pair into head and tail (n times)
pair {x:VM_nth, y:n, z:K} itemn extract item n from a pair list
pair {x:VM_nth, y:-n, z:K} tailn extract tail n from a pair list
{x:VM_push, y:value, z:K} value push literal value on stack
vn ... v1 {x:VM_depth, z:K} vn ... v1 n count items on stack
vn ... v1 {x:VM_drop, y:n, z:K} remove n items from stack
vn ... v1 {x:VM_pick, y:n, z:K} vn ... v1 vn copy item n to top of stack
vn ... v1 {x:VM_dup, y:n, z:K} vn ... v1 vn ... v1 duplicate top n items on stack
vn ... v1 {x:VM_roll, y:n, z:K} vn-1 ... v1 vn roll item n to top of stack
vn ... v1 {x:VM_roll, y:-n, z:K} v1 vn ... v2 roll top of stack to item n
n {x:VM_alu, y:NOT, z:K} ~n bitwise not n
n m {x:VM_alu, y:AND, z:K} n&m bitwise n and m
n m {x:VM_alu, y:OR, z:K} n|m bitwise n or m
n m {x:VM_alu, y:XOR, z:K} n^m bitwise n exclusive-or m
n m {x:VM_alu, y:ADD, z:K} n+m sum of n and m
n m {x:VM_alu, y:SUB, z:K} n-m difference of n and m
n m {x:VM_alu, y:MUL, z:K} n*m product of n and m
m {x:VM_eq, y:n, z:K} bool TRUE if n == m, otherwise FALSE
n m {x:VM_cmp, y:EQ, z:K} bool TRUE if n == m, otherwise FALSE
n m {x:VM_cmp, y:GE, z:K} bool TRUE if n >= m, otherwise FALSE
n m {x:VM_cmp, y:GT, z:K} bool TRUE if n > m, otherwise FALSE
n m {x:VM_cmp, y:LT, z:K} bool TRUE if n < m, otherwise FALSE
n m {x:VM_cmp, y:LE, z:K} bool TRUE if n <= m, otherwise FALSE
n m {x:VM_cmp, y:NE, z:K} bool TRUE if n != m, otherwise FALSE
n c {x:VM_cmp, y:CLS, z:K} bool TRUE if n in c, otherwise FALSE
bool {x:VM_if, y:T, z:F} continue F if "falsy", otherwise continue T
{x:VM_msg, y:0, z:K} msg copy event message to stack
{x:VM_msg, y:n, z:K} msgn copy message item n to stack
{x:VM_msg, y:-n, z:K} tailn copy message tail n to stack
{x:VM_my, y:SELF, z:K} actor push current actor address on stack
{x:VM_my, y:BEH, z:K} beh push current actor behavior on stack
{x:VM_my, y:STATE, z:K} v1 ... vn push current actor state on stack
msg actor {x:VM_send, y:0, z:K} send msg to actor
mn ... m1 actor {x:VM_send, y:n, z:K} send (m1 ... mn) to actor
beh {x:VM_new, y:0, z:K} actor create new actor with behavior beh
v1 ... vn beh {x:VM_new, y:n, z:K} actor create new actor with (v1 ... vn . beh)
beh {x:VM_beh, y:0, z:K} replace behavior with beh
v1 ... vn beh {x:VM_beh, y:n, z:K} replace behavior with (v1 ... vn . beh)
reason {x:VM_end, y:ABORT} abort actor transaction with reason
{x:VM_end, y:STOP} stop current continuation (thread)
{x:VM_end, y:COMMIT} commit actor transaction
{x:VM_end, y:RELEASE} commit transaction and free actor
chars {x:VM_cvt, y:LST_NUM, z:K} fixnum convert chars to fixnum
chars {x:VM_cvt, y:LST_SYM, z:K} symbol convert chars to symbol
char {x:VM_putc, z:K} write char to console
{x:VM_getc, z:K} char read char from console
value {x:VM_debug, y:tag, z:K} debug_print tag: value to console

Object Graph

The diagram below shows a typical graph of quad-cells representing the contents of the e_queue (event queue) and the k_queue (continuation queue). These two queues, plus the global symbol table, and the interrupt-handling actors, form the root-set of objects for garbage-collection.

e_queue: [head,tail]--------------------------+
          |                                   V
          +-->[Event,to,msg,next]---> ... -->[Event,to,msg,NIL]
                     |   |
                     |   +--> actor message content
                     V
                    [Actor,beh,sp,?]
                            |  |
                            |  +--> actor state (initial SP)
                            |
                            +--> actor behavior (initial IP)

k_queue: [head,tail]--------------------+
          |                             V
          +-->[ip,sp,ep,kp]---> ... -->[ip,sp,ep,NIL]
               |  |  |
               |  |  +-->[Event,to,msg,NIL]
               |  |             |   |
               |  |             |   +--> ...
               |  |             V
               |  |            [Actor,beh',sp',events]---> ... -->[Event,to,msg,NIL]
               |  V
               | [Pair,car,cdr,?]
               |        |   |
               |        |   +--> ... -->[Pair,car,NIL,?]
               |        V
               |       item
               V
              [Op,EQ,0,k]
                       |
                       +--> [Op,IF,t,f]
                                   | |
                                   | +--> ...
                                   V
                                   ...

Pair-List Indexing

Instructions like VM_nth and VM_msg have an immediate index argument (n) to succinctly designate parts of a pair-list.

  • Positive n designates elements of the list, starting at 1.
  • Negative n designates list tails, starting at -1.
  • Zero designates the whole list/value (for messages).
  0            -1            -2            -3
---->[car,cdr]---->[car,cdr]---->[car,cdr]---->...
    +1 |          +2 |          +3 |
       V             V             V

...or more compactly...

0-->[1,-1]-->[2,-2]-->[3,-3]--> ...
     |        |        |
     V        V        V

If the index is out-of-bounds, the result is ? (undefined).

Character Classes

The {x:VM_cmp, y:CLS} instruction produces TRUE if a character value is part of the specified class, or FALSE otherwise. The class is defined as a union of the named classes:

  • CTL Control Character
  • DGT Decimal Digit
  • UPR Uppercase Letter
  • LWR Lowercase Letter
  • DLM Delimiter
  • SYM Name Symbol
  • HEX Hexadecimal Digit
  • WSP Whitespace

The tables below define which characters are included in each class.

ch dec hex CTL DGT UPR LWR DLM SYM HEX WSP
^@ 0 00 x
^A 1 01 x
^B 2 02 x
^C 3 03 x
^D 4 04 x
^E 5 05 x
^F 6 06 x
^G 7 07 x
^H 8 08 x
^I 9 09 x x
^J 10 0a x x
^K 11 0b x x
^L 12 0c x x
^M 13 0d x x
^N 14 0e x
^O 15 0f x
^P 16 10 x
^Q 17 11 x
^R 18 12 x
^S 19 13 x
^T 20 14 x
^U 21 15 x
^V 22 16 x
^W 23 17 x
^X 24 18 x
^Y 25 19 x
^Z 26 1a x
^[ 27 1b x
^\ 28 1c x
^] 29 1d x
^^ 30 1e x
^_ 31 1f x
ch dec hex CTL DGT UPR LWR DLM SYM HEX WSP
32 20 x
! 33 21 x
" 34 22 x
# 35 23 x
$ 36 24 x
% 37 25 x
& 38 26 x
' 39 27 x
( 40 28 x
) 41 29 x
* 42 2a x
+ 43 2b x
, 44 2c x
- 45 2d x
. 46 2e x
/ 47 2f x
0 48 30 x x
1 49 31 x x
2 50 32 x x
3 51 33 x x
4 52 34 x x
5 53 35 x x
6 54 36 x x
7 55 37 x x
8 56 38 x x
9 57 39 x x
: 58 3a x
; 59 3b x
< 60 3c x
= 61 3d x
> 62 3e x
? 63 3f x
ch dec hex CTL DGT UPR LWR DLM SYM HEX WSP
@ 64 40 x
A 65 41 x x
B 66 42 x x
C 67 43 x x
D 68 44 x x
E 69 45 x x
F 70 46 x x
G 71 47 x
H 72 48 x
I 73 49 x
J 74 4a x
K 75 4b x
L 76 4c x
M 77 4d x
N 78 4e x
O 79 4f x
P 80 50 x
Q 81 51 x
R 82 52 x
S 83 53 x
T 84 54 x
U 85 55 x
V 86 56 x
W 87 57 x
X 88 58 x
Y 89 59 x
Z 90 5a x
[ 91 5b x
\ 92 5c x
] 93 5d x
^ 94 5e x
_ 95 5f x
ch dec hex CTL DGT UPR LWR DLM SYM HEX WSP
` 96 60 x
a 97 61 x x
b 98 62 x x
c 99 63 x x
d 100 64 x x
e 101 65 x x
f 102 66 x x
g 103 67 x
h 104 68 x
i 105 69 x
j 106 6a x
k 107 6b x
l 108 6c x
m 109 6d x
n 110 6e x
o 111 6f x
p 112 70 x
q 113 71 x
r 114 72 x
s 115 73 x
t 116 74 x
u 117 75 x
v 118 76 x
w 119 77 x
x 120 78 x
y 121 79 x
z 122 7a x
{ 123 7b x
| 124 7c x
} 125 7d x
~ 126 7e x
^? 127 7f x

Common Code Structures

Many instruction streams end with a common suffix. These immutable continuation sequences are available to be shared by many behaviors.

K_CALL:     [MSG,+0,k]---+
                         |
                         |
RESEND:     [MSG,+0,k]   |    RV_ZERO:    [PUSH,+0,k]-----+
                    |    |                                |
                    v    |                                |
            [SELF,?,k]---+    RV_NIL:     [PUSH,NIL,k]----+
                         |                                |
                         |                                |
RV_SELF:    [SELF,?,k]   |    RV_UNDEF:   [PUSH,UNDEF,k]--+
                    |    |                                |
                    v    |                                |
CUST_SEND:  [MSG,+1,k]<--|--------------------------------+
                    |    |
                    v    |
SEND_0:     [SEND,0,k]<--+    RELEASE_0:  [SEND,0,k]
                    |                             |
                    v                             v
COMMIT:     [END,+1,?]        RELEASE:    [END,+2,?]