Skip to content
This repository has been archived by the owner on Feb 3, 2020. It is now read-only.

[Idea] Brainfuck Assembly #73

Open
sunjay opened this issue Jul 16, 2017 · 2 comments
Open

[Idea] Brainfuck Assembly #73

sunjay opened this issue Jul 16, 2017 · 2 comments
Assignees

Comments

@sunjay
Copy link
Owner

sunjay commented Jul 16, 2017

This brain assembly language can be used as an IR for this compiler. It has a simple bytes-only memory representation and is in a sense much "lower level" than brain.

Notes

Recursion (more than tail call) can be implemented if we implement a stack in bf. Each call operator would go to a new level of the stack (dynamically) instead of trying to determine that stuff statically.

https://cs.stackexchange.com/questions/991/are-there-minimum-criteria-for-a-programming-language-being-turing-complete


  • Be able to sourcemap the output code to the input code (at various levels of the stack + with function calls un-inlined)
// All memory included single bytes are represented as an array of bytes
// Single byte declarations are just for convenience (syntactic sugar)
// `mem @a` and `mem @a[1]` are equivalent
// `mem @b[56]` declares 56 bytes stored contiguously in memory
// Memory allocated with mem is in scope in the current block (denoted within curly braces)
// and is dropped (i.e. zeroed) at the end of that block
// Drops may be optimized away in certain cases

// reverse polish notation calculator (simple -- just addition of two numbers)
// input: 6 7 +
// output: 13
fn main {
    // Syntactic sugar for allocating memory for the size of the string and incrementing each byte
    // to that value -- this is still a static sized array
    // The write instruction will output each character one at a time, maybe with some optimizations
    // if the compiler can determine that this is just outputting a string literal
    write "Enter an expression\n: (e.g. 6 7 +)"

    mem @error

    mem @a[4]
    call u32::read @a @error
    if @error {
        write "First part of expression must be a number\n"
        // ret instruction is like wrapping the remainder of the function in an `else`
        // if used outside of if/else/while, results in the remaining code becoming unreachable
        // The reason we have this instruction is because it is just too inconvenient to keep
        // nesting if/elses deeper and deeper
        // The `break` instruction in a loop is similar to ret except that it only wraps the
        // remaining body of the loop in an else
        // `ret` and `break` can be used in else too
        ret
    }
    
    //TODO: Skip whitespace somehow (requires buffered input)
    mem @b[4]
    call u32::read @b @error
    if @error {
        write "Second part of expression must be a number\n"
        ret
    }
    
    mem @op
    call u32::read @op @error
    if @error {
        write "Final part of expression must be an operator\n"
        ret
    }
    
    // Valid expression, now let's compute it
    mem @out[4]
    
    mem @add
    call u8::eq @add @op b'+'
    if @add {
        call u32::add @out @a @b
    }
    else {
        write "Operators other than `+` are not currently supported!\n"
        ret
    }

    // Print the number
    // NOTE: We can't just use the `write` instruction since that will simply print the character
    // associated with the given byte value. We want the printed number to look like a number.
    call u32::print @out
}

// Modules: in functions, the functions declared in the current module are available
// Modules from the module above can be called via super::foo
// Modules from other parts of the module hierarchy must be imported using:
//     use ::std::u32; // now you can use u32::read, etc.
// The :: prefix means search for a package with that name
mod u32 {
    // Reads a 32-bit unsigned integer with no checks for overflow
    // Always reads at least one character
    // @error is set to 1 if no digits could be read
    //
    // @out must be zero
    fn read @out[4] @error {
        mem @char
        read @char

        mem @is_digit
        // Using byte literals like this is a convenience method for looking up the ASCII value of that character
        call u8::in_range @is_digit b'0' @char b'9'
        
        // Check for no digits
        // The condition byte for if and while blocks MUST be a SINGLE byte
        // Both enter their blocks if the value of the byte is non-zero
        if @is_digit {
            while @is_digit {
                // Add the currently read digit to the output
                
                // in-place subtraction (-=) to get just the actual numerical value of the read digit
                call u8::isub @char b'0'
                // Shift the output one decimal place to the right
                // i.e. performs 10 -> 100 on the output so we can add the newest digit
                call u32::imul @out 10
                // @char is now the last digit of @out
                call u32::iadd @out @char
                // This is the area of this code where we could add checks for overflow (i.e. if the counter loops around)
                
                // Read the next char and check if it is a digit
                read @char
                call u8::in_range @is_digit b'0' @char b'9'
            }
        }
        else {
            call bool::true @error
        }
    }

    // Returns true if a <= x <= b
    fn in_range @out @a[4] @x[4] @b[4] {
        // This function demonstrates how to do a short-circuiting AND operation
        // The second call doesn't execute if the first one returns false
        
        // lower bound
        call ge @out @a @x
        
        if @out {
            call le @out @x @b
        }
    }

    // u32::le computes: a <= b
    fn le @out @a[4] @b[4] {
        //TODO: Store boolean result in @out

        // This special instruction prints a message and then panics
        unimplemented
    }

    // u32::ge computes: a >= b
    fn ge @out @a[4] @b[4] {
        call le @out @b @a
        // or something more efficient could be used...
    }
}

OLD NOTES

Add Span to label so the hash is unique

No whitespace (other than newlines) is significant.

// Syntax:
// @x = address x
// 'y: = label y
// @x[n] = perform pointer arithmetic (n must be a constant)
// b'c' = byte constant given the ascii value of the given character (sugar for writing the actual number)

'main:
    // allocate 4 bytes
    mem @a[4]// could also use b'A', etc.
    ‎incr @a[0] 65 // A
    ‎incr @a[1] 66 // B
    ‎incr @a[2] 67 // C
    ‎incr @a[3] 68 // D// default size is 1 byte
    ‎mem @b
    ‎// move the value of the memory at address @a to address @b// leaves @a as zero// both the source and the target of this operation must be the same sizemove @a[2] -> @b
    ‎// marks memory as available for use but does not zero it until necessary// @b can be longer be used
    ‎drop @b
    ‎// move is more efficient and requires no temporary cell, but copy is useful if you want to keep the memory around// both the source and the target of this operation must be the same size
    ‎mem @c[4]
    ‎copy @a -> @c
    ‎// using a constant like this is syntactic sugar for incrementing its value into a temporary memory allocation and then dropping the memory// call cannot be used in a way that causes a cycle (e.g. recursion) because of inlining (see loop for a way to do loops) - maximum recursion depth is used to avoid infinite compilation
    ‎call foo b'x' @c
    ‎// this ret is required
    ‎ret

// Function blocks can have zero or more arguments
// If no size is given, the default is 1 byte
// All allocated memory inside a function block is dropped at the end
fn foo x y[4] {
    write x
    ‎write y[3]// returns to the instruction after where this was called// without this, there's a chance that we would keep going to the next instruction// the only way that couldn't happen is if the next thing was a function block (in which case you would get a compilation error)
    ‎ret
}

Other instructions:

    // Tests if @a is non zero and jumps to the second argument
    if @a ('foo 33 @a) 'bar

    // Continues to loop while the value at the given memory address is non-zeroloop @a 'body 'done

    // Reads n bytes from the input into @a where n is the declared size of @a
    read @a

    // Decrement is the mirror operation to incr
    ‎decr @a b'0'

    // Set all the bytes at the given address to zero
    ‎zero @a

Drop implementation:
Cells are allocated as each mem call is reached. When drop is called, the cells are marked "dropped", but not zeroed. Cells are not dropped when ret is called. Memory management must be done manually. If the compiler can prove the cells have been zeroed (e.g. if the entire contents were moved), it marks them as "zeroed". When looking for the next cells to allocate, the allocator looks for the nearest cell to the current position that is marked "dropped". If the cells are not marked "zeroed", instructions are inserted to zero those cells before doing anything to that memory. If no dropped cells are available or if it would be closer to just use some memory at the beginning or end of the tape, that memory is preferred.

Todo: determine if manual memory management is actually useful or if we should just have blocks.

@sunjay sunjay self-assigned this Jul 16, 2017
@sunjay
Copy link
Owner Author

sunjay commented Oct 12, 2017

No whitespace (other than newlines) is significant.

// Syntax:
// @x = address x
// 'y: = label y
// @x[n] = perform pointer arithmetic (n must be a constant)
// b'c' = byte constant given the ascii value of the given character (sugar for writing the actual number)

'main:
    // allocate 4 bytes
    mem @a[4]// could also use b'A', etc.
    ‎incr @a[0] 65 // A
    ‎incr @a[1] 66 // B
    ‎incr @a[2] 67 // C
    ‎incr @a[3] 68 // D// default size is 1 byte
    ‎mem @b
    ‎// move the value of the memory at address @a to address @b// leaves @a as zero// both the source and the target of this operation must be the same sizemove @a[2] -> @b
    ‎// marks memory as available for use but does not zero it until necessary// @b can be longer be used
    ‎drop @b
    ‎// move is more efficient and requires no temporary cell, but copy is useful if you want to keep the memory around// both the source and the target of this operation must be the same size
    ‎mem @c[4]
    ‎copy @a -> @c
    ‎// using a constant like this is syntactic sugar for incrementing its value into a temporary memory allocation and then dropping the memory// call cannot be used in a way that causes a cycle (e.g. recursion) because of inlining (see loop for a way to do loops) - maximum recursion depth is used to avoid infinite compilation
    ‎call foo b'x' @c
    ‎// this ret is required
    ‎ret

// Function blocks can have zero or more arguments
// If no size is given, the default is 1 byte
// All allocated memory inside a function block is dropped at the end
fn foo x y[4] {
    write x
    ‎write y[3]// returns to the instruction after where this was called// without this, there's a chance that we would keep going to the next instruction// the only way that couldn't happen is if the next thing was a function block (in which case you would get a compilation error)
    ‎ret
}

Other instructions:

    // Tests if @a is non zero and jumps to the second argument
    if @a ('foo 33 @a) 'bar

    // Continues to loop while the value at the given memory address is non-zeroloop @a 'body

    // Reads n bytes from the input into @a where n is the declared size of @a
    read @a

    // Decrement is the mirror operation to incr
    ‎decr @a b'0'

    // Set all the bytes at the given address to zero
    ‎zero @a

Drop implementation:
Cells are allocated as each mem call is reached. When drop is called, the cells are marked "dropped", but not zeroed. Cells are not dropped when ret is called. Memory management must be done manually. If the compiler can prove the cells have been zeroed (e.g. if the entire contents were moved), it marks them as "zeroed". When looking for the next cells to allocate, the allocator looks for the nearest cell to the current position that is marked "dropped". If the cells are not marked "zeroed", instructions are inserted to zero those cells before doing anything to that memory. If no dropped cells are available or if it would be closer to just use some memory at the beginning or end of the tape, that memory is preferred.

Todo: determine if manual memory management is actually useful or if we should just have blocks.

Add macros for doing things like generating the code to print any string

@sunjay
Copy link
Owner Author

sunjay commented Nov 1, 2017

Old notes from Jul 16:

The operations that we have defined as our IR before code generation can be written out as their own little low-level language.

// The only types are memory blocks denoted [u8; non-negative size] and u8 (a single-byte)
// u8 means "byte" (8-bits) and the size is the size of the memory block in bytes

// Every variable must have a type
let q: [u8; 4];

// can be uninitialized (memory defaults to zero)
let a: [u8; 3];
// can be initialized to a literal
let b: [u8; 3] = [111, 244, 0];
// can be initialized to a single, repeating value
// size can be inferred
let c: [u8; _] = [23; 100];

// variables must be explicitly copied
//WRONG: a = b;
//RIGHT:
copy b -> a;

// move instead of copying (saves instructions)
let d: [u8; 3];
move b -> d;

// We are currently in the top most block known as the "root" block

// blocks can be arbitrarily nested
{
    // variables are only declared in their block and blocks below
    let q: [u8; 1]; // only available until the end of this block
    {
        let r: [u8; 3]; // this memory will be made available again at the end of this block
        copy a -> r;
    }
    let s: [u8; 100];
}

// Set some memory to zero
zero a;
// read into a memory block
read a;
// write memory to output
write a;

// indexing is supported
write a[1..];

// incrementing and decrementing
a[3] += 33;
a[3] -= 37;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant