Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compiler)!: Custom Grain object files #2104

Open
wants to merge 7 commits into
base: oscar/digest
Choose a base branch
from

Conversation

ospencer
Copy link
Member

@ospencer ospencer commented May 4, 2024

Overview

This PR replaces Grain's intermediate .gr.wasm files with Grain object files. This is for a variety of reasons, including:

  • Debuggability. Our wasm-based linker is unable to reconstruct debugging/sourcemap information from wasm binaries. This approach once again generates proper sourcemaps. There is additional work to be done to make the sourcemaps truly usable, but this is a major step in the right direction.
  • Optimizations. This change will allow us to do "whole program" optimizations to better assist Binaryen with its optimizations.
  • Performance. Going from our IR to Binaryen IR and serializing is the bulk of compilation time. Since we no longer are serializing/deserializing many files, there is a performance boost. For a hello world program, cold compile time is down 10% and warm compile time is down 25%. When running in JS, cold compile time is down 15% and warm compile time is down 21%. This also gets subsecond warm builds in JS on my machine.

Technical details

Grain object files use the .gro extension. As no wasm files are specific to Grain, the default output filename from the compiler uses the .wasm extension rather than .gr.wasm.

Object files follow this layout:

magic version len version mash_program offset cmi mash_program
0xF0 0x9F 0x8C 0xBE 5 0.6.3 1024 <cmi> <mash_program>

mash_program does include the module's cmi, but this format allows typechecking to unmarshal just the type information.

We use OCaml's Marshal module as our binary format (and for Whole Grain, we can use our Marshal module). js_of_ocaml can't marshal floats into a format that can safely be read back by native code. I wanted object files to be the same between native and js, so we store floats as int64s in the mashtree to guarantee consistent behavior.

Reviewing this PR

The code delta for this PR is fairly small, but snapshot tests are now sexprs of mashtrees. As such, we have all new snapshots. I'd recommend clicking the first commit to review the code diff, or using the GitHub UI to hide changes to snapshots.

Closes #1903

@@ -377,6 +377,6 @@ describe("basic functionality", ({test, testSkip}) => {
~config_fn=smallestFileConfig,
"smallest_grain_program",
"",
4769,
6424,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this go up?

Is it because of names changing and debug info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to leave a comment about this; thanks for pointing it out. The old linker had a concept of hard/soft dependencies. If a file was a dependency but no values/functions from that module were actually uses (just types or an unused import), then the linker would compile in type information but not link in the actual files.

At first thought this seems like good behavior, but if this is done, side effects from importing those modules don't happen, which is bad. For example, a normal Grain program that doesn't use anything in Pervasives doesn't link in Pervasives, which means the setupExceptions side effects don't happen, and throwing exceptions in those programs prints GrainError instead of the actual error. In practice, users don't run into this because most programs use something from Pervasives.

This new linker links in all dependencies, which is more correct. As I mentioned, in practice this doesn't make a real difference for most users. Side note: the major reason the original implementation did this was because the compiler wasn't great at removing unused code from the resulting binary and this resulted in large module sizes. The compiler has gotten much better and this isn't really an issue anymore.

I also discovered a few more configurations producing smaller programs while working on this, as small as 25 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay then this probably closes #1903 then.

@ospencer ospencer changed the base branch from main to oscar/digest May 4, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants