Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wat2wasm segfaults on .wat file with many nested if statements #2377

Open
womeier opened this issue Jan 30, 2024 · 9 comments
Open

wat2wasm segfaults on .wat file with many nested if statements #2377

womeier opened this issue Jan 30, 2024 · 9 comments

Comments

@womeier
Copy link

womeier commented Jan 30, 2024

Running wat2wasm on a rather big file with many nested if-statements segfaults.

Version: wabt 1.0.32-1 (debian bookworm)
System: Debian (with enough RAM + swap)

Steps to reproduce:

wasm2wat CertiCoq.Benchmarks.tests.color.wasm > color.wat          // color.wat is ~20GB
wat2wasm color.wat -o color.wam                                    // Segmention fault
wasm-tools parse color.wat -o color.wasm                           // works

segfault:

(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
    at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585       if (tokens_.empty()) {
(gdb)

CertiCoq.Benchmarks.tests.color.zip

@keithw
Copy link
Member

keithw commented Jan 30, 2024

It depends... on the size of the .wat, do you get a more concise output from binaryen's wasm-dis or wasm-tools print? If so... we can dive down to compare.

In terms of memory consumption of wat2wasm, wasm-tools parse is probably going to be better (and also better than binaryen's wasm-as) since I think it tries not to assemble an in-memory IR of the whole module.

@womeier
Copy link
Author

womeier commented Jan 31, 2024

Thanks for the prompt response, wasm-tools parse worked (with more swap, takes quite some time).

But wat2wasm still segfaults when run on the 20GB .wat file, I edited the issue accordingly.

FYI: both wasm-tools parse and wat2wasm require around 20GB of RAM.

@womeier womeier changed the title wasm2wat generates rather big .wat file wat2wasm segfaults on 20GB .wat file Jan 31, 2024
@keithw
Copy link
Member

keithw commented Jan 31, 2024

Cool, what's the segfault? (E.g. if you make a debug build and run it in gdb, where does it segfault?)

@sbc100
Copy link
Member

sbc100 commented Jan 31, 2024

What is the reason for wanting to support such large files? Done't Wasm runtimes have internal limits that make running such files impossible? Perhaps we should instead just impose a limit on the size of any file we process? (see https://webassembly.github.io/spec/core/appendix/implementation.html#binary-format).

@womeier
Copy link
Author

womeier commented Feb 1, 2024

Here is the segfault, hope it helps:

(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
    at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585       if (tokens_.empty()) {
(gdb)

@sbc100 We extended the CertiCoq compiler with a Wasm backend.
One of the benchmarks of CertiCoq runs a graph coloring algorithm, the generated binary includes the representation of a graph in a functional language, compiled to Wasm.
Didn't have issues running the other (a little smaller) benchmarks, haven't tried this one yet.

And in case you're wandering, the roundtrip with wasm2wat and wat2wasm is not strictly necessary, we'll get rid of that soon.

@sbc100
Copy link
Member

sbc100 commented Feb 1, 2024

What wasm engine are are you able to run 20GB program on (or even 1GB program for that matter)? I would have assumed that such a program would be exceed the internal limits of any engine.

@womeier
Copy link
Author

womeier commented Feb 1, 2024

It's just the color.wat file that's so big (for some reason), the color.wasm file is just ~2MB, I ran the color.wasm file in both Node.js and wasmtime, both went fine.
(Node.js takes > 1 min to instantiate the module, but no crash :) )

@womeier womeier changed the title wat2wasm segfaults on 20GB .wat file wat2wasm segfaults on .wat file with many nested if statements Mar 24, 2024
@womeier
Copy link
Author

womeier commented Mar 24, 2024

I think I understand now, why the color.wat file was so big, it has a quite large chain of nested if-statements, like so:

...
some_check
if
  return
else
   ...
   some_check   
   if
     return
   else
      ... ~20,000x this nesting
   end
end

Presumably, this adds quite a lot of whitespace. (gzip reduces the 16GB wat file to ~25MB)

we now generate the semantically equivalent programs, where the else branch is moved to after the if-statement, like so:

some_check
if
  return
end
...
some_check
if
  return
end
...

This change reduced the color.wasm from 1.8 MB to 1.7 MB, but the color.wat from 16GB to 14MB

Also, we generate code in SSA form, so one function has ~20k locals before we run wasm-opt --coalesce-locals :)

@rossberg
Copy link
Member

rossberg commented Mar 24, 2024

some_check
if
  return
end

You should be able to shorten that if return end even more to just br_if i, where i is the label of the function body – a br out of a function body is equivalent to return.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants