Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Proposal: Lox Programming Language Compiler and Bytecode Virtual Machine #409

Open
Arthur-Chang016 opened this issue Oct 27, 2023 · 3 comments
Labels
proposal course project proposals

Comments

@Arthur-Chang016
Copy link

Arthur-Chang016 commented Oct 27, 2023

What will you do?
Inspired by Crafting Interpreters
I will be building a full-functioned Lox programming language compiler and a bytecode interpreter/ virtual machine in Rust language from scratch. Lox is a language that combines the syntax of Python and C with dynamic typing. And it has some extensions of object-oriented features and first-class functions.

As a new Rust user, one of the main difficulties of this project would be to correctly program Rust in the canonical way and pay best efforts to pass Rust's borrow checker.

How will you do it?

  • Lox Compiler
    This part will be built as fast and as easy as possible. Instead of using Bison- or Flex-like tools, the scanner and parser will be implemented from scratch. Pratt Parser Algorithm will be adopted in the front-end. Moreover, I won’t build the AST (abstract syntax tree). Instead, the bytecode will be generated immediately during parsing.

  • Bytecode Interpreter
    The language (ISA) for the interpreter is a stack machine that does data movements by mainly push and pop operations. This is not adopted in modern, high-performance architectures, but it can be very friendly for the front-end compiler development with quite acceptable performance. Furthermore, a Mark-Sweep Garbage Collection will be implemented as the memory management mechanism.

How will you empirically measure success?
There are some official test cases. The most important metric of language implementation is to make it correct. I will write a test harness that uses these tests as input. I expect it will pass the tests that cover features of this language I implemented.

If time permits, I could find the hotspot of bytecode simulation and try to optimize it by either algorithm improvement or hacking Rust by using unsafe block.

I will do the performance comparison with both jlox and clox from official implementation. The former is AST tree-walking interpreter implemented by Java, and the latter is bytecode interpreter implemented by C. It could be especially interesting to compare the performance between C's and Rust's implementation (mine).

Team members:

Possible Future Works
Build a compiler that compiles bytecode into native machine language (Arm64 for my computer) that can either be stack-machine-like style that doesn't need to take care of the register or register machine style that we should perform register allocation. The later one would be a huge project.

The above compiler can be used for the JIT compilation and we can also do the performance measurement on bytecode interpretation, stack style asm, or register style asm.

Or, build a profiler to measure performance between different backend implementation.

@Arthur-Chang016 Arthur-Chang016 added the proposal course project proposals label Oct 27, 2023
@Arthur-Chang016 Arthur-Chang016 changed the title Project [NUMBER] Proposal: [TITLE] Lox Programming Language Compiler and Bytecode Virtual Machine Project Proposal: Lox Programming Language Compiler and Bytecode Virtual Machine Oct 27, 2023
@sampsyo
Copy link
Owner

sampsyo commented Oct 30, 2023

Sounds good overall!

Can you please expand on what you have written under "empirically measure success"? You have mentioned some test cases, which is great! What exactly will you do, and what outcomes do you expect? Please make a list. For example, it could include "I will write a test harness that uses these test files as input, and I expect 100% of the tests in this directory to pass." Or you could pick a subset of the tests.

And even if you don't implement any optimizations, I think you should measure performance and compare it against some sort of baseline (which baseline? you pick).

Smaller comments:

Pratt Parser Algorithm will be adopted in the front-end that is more likely to be used in industry.

Do you have a specific justification for this (or perhaps a specific baseline that Pratt is "more adoptable" than)? If I were to guess, I would have supposed that a simple recursive descent parser would be the most common in industry. :)

I won’t build the AST (abstract syntax tree). Instead, the bytecode will be generated immediately after parsing.

Do you mean "during parsing" (i.e., the parser will directly output bytecode)? Or is there some other intermediate data structure (which might be referred to as the "parse tree")?

I could find the hotspot of bytecode simulation and try to optimize it by either algorithm improvement or hacking Rust by using unsafe block.

I strongly suspect that there will be plenty of low-hanging fruit for optimization that does not require unsafe.

@Arthur-Chang016
Copy link
Author

Thanks for your detailed response. I added the performance test in the How will you empirically measure success?.

Do you have a specific justification for this

Sorry I don't have justification for this. I just read from this book that talks about it's more likely to implement front-end with Pratt Parser than generative tools like Bison or Flex.

@sampsyo
Copy link
Owner

sampsyo commented Nov 1, 2023

Sounds good; thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal course project proposals
Projects
None yet
Development

No branches or pull requests

2 participants