Rusty Monkey

Monkey is a programming language from the (amazing) book Writing An Interpreter In Go, by Thorsten Ball. Like many have done before, this is my attempt at implementing the interpreter in Rust.

I'm mostly doing this for fun, but also to learn more about the inner workings of programming languages. If you share my interest, you should consider buying the book, and following along yourself!

Quick Overview of Monkey

Variables and data types

Integers and integer expressions:

let favourite_prime = 2;
let number = 3 * 7 + (17 - 13) / 2;

Boolean values:

let likes_banana = true;
let something = !false;

Strings and string concatenation:

let species = "Golden " + "Lion " + "Tamarin";
let escape = "escape sequences: \"\n\t";

Nil value:

let nothing = nil;
(!nil) == true // nil is falsy

Arrays:

let empty = [];
let array = [3, "monkey", false];
puts(array[0]);

Associative arrays, known as "Hashes"*:

let hash = #{
    "kingdom": "Animalia",
    "phylum": "Chordata",
    "order": "Primates"
};
puts(hash["kingdom"]);

Control flow:

if bananas >= 3 {
    is_monkey_happy = true;
} else { // Optional else clause
    find_bananas();
}

Higher order functions and closures:

let neg = fn(x) {
    return -x;
};
let sqr = fn(x) {
    return x * x;
};
let compose = fn(f, g) {
    fn(x) { f(g(x)) } // Implicit return!
};
compose(neg, sqr)(7);

Checkout the official website for more information!

Additions and modifications

As I developed the interpreter, I felt like making some changes to the language. You may not like all of these changes, but I feel like most of them are pretty uncontroversial. I also plan on further extending the language, so this list is subject to change.

Added more operators, like >=, <=, % and ^.
Changed null keyword to nil. I just think it looks nicer.
No top level return statements.
Optional parentheses around if condition. Originally, they were required.
New Hash syntax. This was necessary to avoid ambiguity in the parser*, but I also think it looks good!
```
let hash = #{
    "entry": "something",
    "banana": true
};

let empty = #{};
```

Block expressions. These work like block statements, but are allowed in expression contexts:

let a = {
    let b = 30;
    let c = b * (b - 1) * (b - 2);
    c * c
};

Else-if chains.
Semicolons are now required in most cases, but there are some exceptions*.
Many new built-ins, like type, and import.

What's up with the new hash syntax?

Originally, I wanted to generalise block statements so you can just drop them anywhere to create a new scope. Like this:

let a = 3;
{
    let a = 2;
    puts(a); // prints "2"
}
puts(a); // prints "3"

But this posed a problem. If the parser encountered a "{" token in a statement position, it would just presume that it's a block statement. So, if you wanted to implicitly return a hash, you just couldn't -- the parser would think it's a block statement when it sees the "{", and would return a parser error on the first ":".

let make_hash = fn() {
    { "a": 1, "b": false } // Parser error!
};

One way to fix this was to make the parser first try to parse it as a block statement, and if that doesn't work, backtrack and parse it as a hash. But this would be a lot of work, and would still leave some ambiguity⁽¹⁾. The issue is that the syntax is fundamentally ambiguous, so there was no way to solve this without changing the syntax. So in summary, that's why I decided to introduce the new hash syntax, #{}.

let hash = #{
    "a": 1,
    "b": false
};

Side note: this also had the benefit of freeing up the "{" syntax in expression positions, meaning I could add block expressions. These look a lot like block statements:

let a = {
    let b = 30;
    let c = b * (b - 1) * (b - 2);
    c * c
};

Problems with semicolons

Say you have this function:

let foo = fn(a) {
    let b = [1, 2, a]
    [a]
}

This seems pretty harmless, right? Let's try to run foo(3).

At line 3, column 7:
    Runtime error: index out of bounds: 3

Wait, what? What's going on there? As it turns out, it has to do with semicolons. Instead of parsing the function body as two statements (let b = [1, 2, a] and [a]), like we would expect, because there is no semicolon after the first line, the parser thinks that the [ on the second line indicates an indexing expression. The parser sees the function body as something like this:

let b = ([1, 2, a])[a]

Of course, if a is 3, this will return an index out of bounds error. Any time where a statement doesn't end in a semicolon, and is followed by an expression statement⁽²⁾, this kind of thing can happen. Usually it will result in a parser error, but sometimes it can fail in more unexpected ways, like the earlier example.

let foo = fn() {
    let bar = fn() {
        // ...
    }
    (bar() + 1) // Runtime error: identifier not found: 'bar'
}

I found that the best way to mitigate these issues was to cut back on the whole "optional semicolons" thing. So I made semicolons required after every statement, with some exceptions. Notably, semicolons are optional after expression statements, when:

It's an expression statement with an "if" expression, a function literal, or a block expression. Basically, expressions that end in "}".
```
let a = 3;
{
    let a = 2;
    puts(a);
} // No need for a semicolon here
puts(a);
```

It is the last statement in a block.

let foo = fn(x) {
    let a = x ^ 3;
    a % 4 // No need for a semicolon here
}

It is the last statement in the program. This was added to make working with the REPL a bit less of a pain.
```
monkey » puts("hi")
```

Semicolons after "let" and "return" statements are always required.

I think this is a nice compromise⁽³⁾. Making semicolons always necessary would make programming a pain, so these exceptions are certainly welcome. On the other hand, making them completely optional would make seemingly normal code break in confusing ways, so I had to add at least some restrictions.

(1): For instance, what about {}? Is that an empty hash or an empty block?

(2): In particular, expressions whose prefix token can also be in infix position would be problematic. So, "[" can be the start of an array, but can also be an indexing operation if in infix position. "(" can be the start of a grouped expression or a call expression. And "-" can be prefix negation, or infix subtraction.

(3): By the way, this is still just a compromise, there still is some room for error. The following snippet, for example, still returns a parser error.

let safe_div = fn(a, b) {
    if b == 0 {
        return nil;
    }
    [a / b, a % b]
};

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Rusty Monkey

Quick Overview of Monkey

Variables and data types

Control flow:

Higher order functions and closures:

Additions and modifications

What's up with the new hash syntax?

Problems with semicolons

About

Releases

Packages

Languages

License

bpandreotti/rusty-monkey

Folders and files

Latest commit

History

Repository files navigation

Rusty Monkey

Quick Overview of Monkey

Variables and data types

Control flow:

Higher order functions and closures:

Additions and modifications

What's up with the new hash syntax?

Problems with semicolons

About

Topics

Resources

License

Stars

Watchers

Forks

Languages