Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: RISC-V Binary Translation to WebAssembly (WASM) #394

Open
evanmwilliams opened this issue Oct 19, 2023 · 1 comment
Open

Proposal: RISC-V Binary Translation to WebAssembly (WASM) #394

evanmwilliams opened this issue Oct 19, 2023 · 1 comment
Labels
proposal course project proposals

Comments

@evanmwilliams
Copy link
Contributor

evanmwilliams commented Oct 19, 2023

What will you do?

We will design and implement a binary translator that converts RISC-V binaries into WebAssembly (WASM) binaries. This will enable RISC-V software to be executed in any environment with a WASM runtime, including web browsers, servers, and certain embedded systems.

How will you do it?

First, we'll need to familiarize ourselves with the RISC-V and WASM specifications. The three of us are quite familiar with RISC-V already, but we are less familiar with WASM. We'll pick a suitable subset of RISC-V instructions that we can then translate into WASM. The initial subset will just ensure we can get out a proof of concept. We'll extend the subset as time allows to be more ambitious.

We'll need to develop a parser for RISC-V binaries. We've written interpreters for RISC-V before in other courses, so this should not be too involved. It might take a bit more work to design good data structures to store RISC-V instructions and their operands. Then, we'll create a mapping schema to transform each RISC-V instruction (or a series of instructions) into its corresponding Wasm representation. Some mappings may be direct, while others might require innovative solutions due to architectural differences.

As a more lofty objective, we'll also try to resolve some of the differences between the memory models of the two ISAs. We'll need to emulate RISC-V's memory model within Wasm's linear memory. To deal with system calls, we'll need to implement shims or stubs to emulate or handle RISC-V system calls within the Wasm environment. Our mapped instructions can then be used to produce WASM binaries.

If time permits, we will explore optimizations to improve the performance or reduce the size of the translated WASM binaries. Some of these optimizations can be more classical compilers techniques like dead code elimination or loop unrolling. Other more tailored optimizations might include profiling for hotspot optimization and tailoring optimizations for specific WASM runtimes.

As a sidenote, we're planning on doing all of this in Rust. Two of us are pretty experienced with Rust already, and the third is eager to learn. Seems fitting, given some of the conversations we've had in class recently!

How will you empirically measure success?

First we'll ensure functional correctness by developing a suite of RISC-V binaries that can be easily verified. Then, we'll do the translation to WASM and verify that the outputs are the same. This is the baseline of correctness. If we implement a more expansive subset of RISC-V, we might be able to test on actual benchmark suites that have been developed. The other way we can empirically measure success is by observing the performance. We expect that the WASM binaries are reasonably performant, for some definition of performant. This might show itself in the size of the generated code or the speed of the executed binary given a certain set of constraints. In our final report, we'll do a deep dive into at least one realistic program that is actually useful to translate into WASM.

Team members:
@emwangs @he-andy @evanmwilliams

@evanmwilliams evanmwilliams added the proposal course project proposals label Oct 19, 2023
@sampsyo
Copy link
Owner

sampsyo commented Oct 23, 2023

Sounds like fun, and also quite a lot of work! I'm glad there are 3 people on your team. 😃

One very big challenge you are going to encounter is that WebAssembly uses structured control flow. That is, it has constructs like if and while but not arbitrary branch/jump instructions. Of course, RISC-V binaries have arbitrary (possibly non-reducible) CFGs. This is a huge headache for many compilers that target WebAssembly.

There are a few ways to deal with this limitation, and you'll need to pick one. You can pick something simple and inefficient, such as a level of indirection through a big wrapper while. Or you could use the famous "Relooper" algorithm that appears in many wasm-related tools, or you can use this somewhat refined version by Ramsey published in 2022.

Other nits:

  • For system calls, you'll want to use WASI rather than inventing your own system interface.
  • Matching memory models, to be somewhat blunt, seems too hard… especially given that shared-memory threads in wasm are still experimental. I would leave this out; you have plenty of other interesting stuff to do!

Can you please answer these important questions in the next ~week, before you get started on the project? Even provisional answers are OK, but it is important to have answers to these up front:

  • What benchmark suite will you use? (It's probably a good idea, for this project, to use something off the shelf instead of writing your own programs—purely because there are so many options available (i.e., any C/whatever program will do).)
  • How will you measure the speed of the original RISC-V binaries?
  • How will you measure the speed of the converted WebAssembly binaries? (Which wasm engine will you use?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal course project proposals
Projects
None yet
Development

No branches or pull requests

2 participants