Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with wasm2c + some ideas #1

Open
vshymanskyy opened this issue Dec 22, 2021 · 30 comments
Open

Comparison with wasm2c + some ideas #1

vshymanskyy opened this issue Dec 22, 2021 · 30 comments

Comments

@vshymanskyy
Copy link
Contributor

vshymanskyy commented Dec 22, 2021

Just performed a quick test to understand how it compares to wasm2c.
For this test I took a ~25 MB wasm (WASI) file, clang.wasm

  • w2c2 finishes in 5.1s, RAM usage: 193 Mb. Output: 1.3 GiB file (34.8 Mb zipped)
  • wasm2c finishes in 1m23s, RAM usage: 1432 Mb. Output: 301 MiB file (22 Mb zipped)

Clearly, speed and RAM usage are really good 🎉 🚀
Wondering if file size can be reduced 🧑‍🔬

This also brings another question. C compilers are not very efficient when working with huge C files, so monsters like clang.wasm will take ages to compile. This is mentioned in my wasm2native tool: https://github.com/vshymanskyy/wasm2native#todo
One of the possible solutions is to split output into smaller chunks (which also enables parallel build).

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Dec 22, 2021

Features comparison.

w2c2 is not yet handling some of the important wasm proposals:

  • Sign-extension operations
  • Non-trapping Float-to-int Conversions
  • Multi-value

These are implemented in wasm2c and covered by opam-1.1.1 spec tests.

Other important proposals include:

  • Bulk memory operations
  • Exception handling
  • Reference types
  • Multiple memories

These (AFAIK at the time of writing) are not available in wasm2c. Spec tests for these features are available in main branch.

P.S. I think I can add w2c2 as an alternative translator for wasm2native.

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Dec 22, 2021

Coremark 1.0 results

Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz, single-thread

GCC 10.3.0, optimization level: -O3

  • Native: 33305.578684 (direct execution, without wasm stage)
  • w2c2: 27469.603167
  • wasm2c: 27458.936027

Clang 12.0.0, optimization level: -O3

  • Native: 28793.550245
  • w2c2: 26085.192936
  • wasm2c: 26125.000733

@turbolent
Copy link
Owner

turbolent commented Dec 23, 2021

Thank you for this analysis Volodymyr!

These are great points for the README, I will detail better which features are supported and also which are not, and the trade-offs / goals of the project itself.

Performance

I had no idea clang was itself available as a WebAssembly binary, amazing! This is a great stress-test.

My main goal is porting new software to older machines, which have only tens of MHz and memory, ideally on the host itself (no cross-compiling). This is why I looked into streaming compilation (no IR), but also took a few short-cuts.

For now I have focused on getting the test suite to pass and some small programs to compile and run. So far I have not focused on any performance optimizations, other than employing a streaming-compilation approach. It is great to see that even large programs like clang can be translated and performance is already good.

I think there are multiple opportunities to improve the performance and resource footprint further, by improving memory allocation, improving speed by parallelizing compilation, and especially for reducing size by removing the unnecessarily generated code:

  • Always remove parameter names from function prototypes, they are not needed
  • Behind a "compact" flag:
    • Remove spaces for indentation
    • Remove braces for blocks
    • Remove newlines for statement of a single op (e.g. jump op is if-statement + goto statement)
    • Remove unnecessary labels. Streaming compilation makes this harder

Great point about compiling large C files: This is one of the reasons I started this project. SwiftWasm generates very large WebAssembly binaries, which take long to compile to C and then very long to compile to a native binary.

I am currently working on parallel compilation, where the compiler is writing function implementations into several smaller files that can be compiled independently, and also performing this code generation concurrently. This will hopefully make compiling the resulting C code more manageable and also speed up the translation process from WebAssembly to C.

Features

For now I have focused on supporting the core specification 1.0 as outlined in https://www.w3.org/TR/wasm-core-1/.

I have not looked into new approved features, like the ones you listed, mostly because I was not sure if they are needed for compiling applications and libraries written in C, C++, Rust, Swift, etc. It seems that most compilers seem to target the "MVP" specification.

As for the concrete proposals you listed:

  • Sign-extension operations and non-trapping float-to-int conversions: I think this should be "just" implementing the new opcodes
  • Multi-value: I had a quick look, it seems like it is going to complicate the compiler quite a bit
  • Bulk memory operations, exception handling, reference types, multiple memories: These features all seem to add new opcodes, new types, and extend the language quite a lot. They are going to be quite a bit of work to implement

Do you know if there are certain compilers targeting WebAssembly that leverage or even require these feature?

Coremark results

Performance should be very similar to wasm2c, as the generated code pretty much the same.
I have not looked into opportunities to generate more efficient code.

wasm2native integration

This looks like a great project! I was looking for a wasi implementation for wasm2c, awesome work!

Currently the API that w2c2 provides is slightly different than wasm2c, and it produces differently mangled names (mostly they are shorter, e.g. don't include type information). I think this difference could be worked around or even removed.

@kripken
Copy link

kripken commented Dec 23, 2021

Some thoughts (great project btw!):

Sign-extension [..] non-trapping float-to-int [..] Multi-value [..] Bulk memory operations, exception handling, reference types, multiple memories [..] Do you know if there are certain compilers targeting WebAssembly that leverage or even require these feature?

Several of those features are implemented in LLVM and are optional in the Emscripten and Rust toolchains for example. But they are not required, so projects using them could just be recompiled for wasm 1.0.

Exceptions is maybe the hardest of those. You can compile C++ exceptions and longjmp with wasm 1.0 today, but then you have to use the Emscripten EH model which requires supporting some extra imports (emscripten's wasm2c layer emits them), but also it is probably slower than wasm EH.

Overall, my guess is that compiling wasm 1.0 to C is "good enough" for most things in the C/C++/Rust/Zig/etc. world today. Exceptions maybe raise the question of compiling to C++ instead, and wasm GC maybe suggests compiling to a GC language, but maybe those features would be out of scope of this project anyhow?

Coremark results: Performance should be very similar to wasm2c, as the generated code pretty much the same. I have not looked into opportunities to generate more efficient code.

I would guess nothing is needed in w2c2 or wasm2c for performance since the C compiler does the hard work anyhow. The big question is whether there are things that wasm->C translation can optimize that a C compiler can't, but it's hard for me to think of anything...

@turbolent
Copy link
Owner

@kripken Thank you for answering my questions, that makes a lot of sense!

Exceptions maybe raise the question of compiling to C++ instead, and wasm GC maybe suggests compiling to a GC language, but maybe those features would be out of scope of this project anyhow?

I have no immediate plans for generating other code in other languages, but even though the C code generation (in c.c) is currently writing C directly, it would be possible to abstract it away, e.g. by calling code generating functions, for different target languages.

[...] the C compiler does the hard work anyhow. The big question is whether there are things that wasm->C translation can optimize that a C compiler can't, but it's hard for me to think of anything..

I mostly focused on streaming code generation and assumed that tools like binaryen's wasm-opt can take care of optimizations and w2c2 does not need to reinvent the wheel here. It is likely that such pre-C optimizations are actually useful, as older C compilers likely do not have as advanced optimizations as modern compilers, though I have not tested this yet.

@turbolent
Copy link
Owner

@vshymanskyy I've added support for parallel compilation in #2. Going to work on reducing the output size now

@turbolent
Copy link
Owner

With #3 merged, the output size is now reasonable

@vshymanskyy
Copy link
Contributor Author

Great news. #2 along with #3 bring clang compilation time from 20 minutes to ~3 minutes (gcc, 12 threads, -O3).
I was not able to link it yet, will take look into this later.

Overall, this is a huge improvement 🎉 🎉 🎉

@turbolent
Copy link
Owner

@vshymanskyy That's great to hear! Linking clang.wasm will require implementing quite a few more WASI functions (e.g. file I/O, FS operations), so far I had only implemented enough to get coremark.wasm to run.

I saw you started work on a WASI implementation for wasm2c in wasm2native, great work! It would be nice to leverage this existing work and not have to re-implement the WASI spec specifically for w2c2 again. First step would be to make the API of the generated code of w2c2 match that of wasm2c. I started looking into the differences and noticed that wasm2c includes the parameter types and return types in the mangled names of the imports and exports – is this really necessary?

I also wonder what the minimum requirements for uvwasi and especially libuv are (C standard, endianness, etc.)

@turbolent
Copy link
Owner

WASI implementation improvements are work in progress in #4

@cjihrig
Copy link

cjihrig commented Dec 28, 2021

I also wonder what the minimum requirements for uvwasi and especially libuv are (C standard, endianness, etc.)

Both target C89 and run on little and big endian machines.

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Dec 29, 2021

I was able to add w2c2 as an alternative translator for wasm2native.
Re-mapping of symbols is easy:

    #define Z_fd_prestat_getZ_iii               fdX5FprestatX5Fget
    #define Z_fd_prestat_dir_nameZ_iiii         fdX5FprestatX5FdirX5Fname
    #define Z_environ_sizes_getZ_iii            environX5FsizesX5Fget
    #define Z_environ_getZ_iii                  environX5Fget
    ....

Along with customized IMPORT_IMPL* definitions.

@turbolent When generating multiple files with -j flag, I'm getting lots of "multiple definitions" of e_X5Fstart and e_memory (defined in decls.h, then included in each c file). Fixed it by making them extern and defining in my main.c.

With this I was able to build multiple rather complex wasi apps.
But for clang.wasm, I'm getting huge amount of those:

/usr/bin/ld: <artificial>:(.text+0x29a8): undefined reference to `e_X5FZNSt3X5FX5F26locale8X5FX5FglobalEv'
/usr/bin/ld: <artificial>:(.text+0x29b9): undefined reference to `e_X5FZNSt3X5FX5F26localeC2Ev'
/usr/bin/ld: <artificial>:(.text+0x29ca): undefined reference to `e_X5FZNSt3X5FX5F26localeC2ERKS0X5F'
/usr/bin/ld: <artificial>:(.text+0x29db): undefined reference to `e_X5FZNSt3X5FX5F26localeD2Ev'
/usr/bin/ld: <artificial>:(.text+0x29ec): undefined reference to `e_X5FZNSt3X5FX5F26localeC2EPKc'
/usr/bin/ld: <artificial>:(.text+0x29fd): undefined reference to `e_X5FZNSt3X5FX5F26localeC2ERKNSX5F12ba
...

@vshymanskyy
Copy link
Contributor Author

Ok, got it working. clang.wasm was compiled and linked (using Clang 12 + LLD) in parallel mode (-j 12) in just 3m34s.
Will send a PR with my changes.

@turbolent
Copy link
Owner

turbolent commented Dec 29, 2021

@cjihrig that's great! Thank you for the information 👍

https://github.com/libuv/libuv/blob/v1.x/SUPPORTED_PLATFORMS.md looks good too, I'll try on some of my older machines (e.g. Mac OS X < 10.7, especially on PowerPC; IRIX on MIPS; OpenStep/NeXTSTEP on x86/PowerPC/HPPA)

@turbolent
Copy link
Owner

@vshymanskyy Wow, nice! It didn't even occur to me to use the pre-processor to add compatibility, great idea 👍

Also, thank you for the bug report and also the fix!

Great to hear clang builds in reasonable time 🎉 Maybe it can be sped up with -O0? (see https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster)

@vshymanskyy
Copy link
Contributor Author

@turbolent do all/any of these platforms support CMake? Do you compile on the target, or do you cross-compile?

wasm2native should conceptually support Big-Endian systems, like Wasm3 does. It needs some debugging, as testing with QEMU showed there are issues (but looks like we're almost there).

@vshymanskyy
Copy link
Contributor Author

I'd like to get rid of CMake dependency for wasm2native, but libuv only supports autotools or CMake officially.

@cjihrig
Copy link

cjihrig commented Dec 29, 2021

@vshymanskyy Node.js builds libuv with gyp. I'm not sure if you can use it for inspiration, but here is the gyp file.

@turbolent
Copy link
Owner

@vshymanskyy For Mac OS X 10.4/10.5 tigerbrew provides CMake 3.6, I'm not sure if that's enough.

As for big-endian support: wasm2c and w2c2 both use "negative memory" (see the end of https://skmp.dev/blog/negative-addressing-bswap/), so access in memory mem with size size at offset off is mem + size - off. Values are in native endianness, but that also means that e.g. data (e.g. string values) is stored in "reversed order" and e.g. needs a reverse before a syscall (e.g. write). I think the first step is to add a big-endian definition for https://github.com/vshymanskyy/wasm2native/blob/afa64bee90b3483e8747f3533906a7332c588a6b/src/wasi-main.c#L16 and then add reverse operations where data is read. I wonder which of the two options I can think off are more efficient:

  • Two reverses in the linear memory that is pointed to
  • One copy of the memory that is pointed to into a temporary allocation and a single reverse of it

@turbolent
Copy link
Owner

@vshymanskyy I now have enough of a WASI implementation that should be able to run clang.wasm, e.g. the input file is checked to be existing, but then clang just exits. A simple Rust program exercising all the functionality works as expected, so I'm wondering what I'm missing.

Do you have instructions on how you compiled/created clang.wasm? I would like to build it with assertions enabled, so I can debug it further.

@vshymanskyy
Copy link
Contributor Author

@turbolent thanks for explanations on Big-Endian, will check it soon.

Just checked my clang compilation. For this test I replaced clang.wasm here in my wasm3 self compilation experiment here: https://github.com/wasm3/wasm3-self-compiling/blob/ee61ccecdf30bee73f3c640764896da8f6ca439d/Makefile#L33
It looks working well.

I didn't push my changes to wasm2native yet. I'll let you know when it's ready.

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Dec 31, 2021

Overall, it may be a good idea to move wasi implementation into a separate project. It's rather complicated, esp. if targeting multiple OS environments.
I this case it could be reused by wasm3, for example.

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Dec 31, 2021

@turbolent just pushed changes to wasm2native. You should be able to:

git clone https://github.com/vshymanskyy/wasm2native.git
cd wasm2native
export CC="clang-12"
export LDFLAGS="-fuse-ld=lld"
./build.sh path/to/clang.wasm

To run the resulting clang.elf, you can replace the compilation command in wasm3-self-compiling.
But it should work as a standalone compiler as well.

@turbolent
Copy link
Owner

@vshymanskyy Great to see you've added support for w2c2!

I've got clang.wasm working in #4 and documented the status for all WASI syscalls in the README. There are still a few missing and I'll add them as needed, contributions are very welcome! Agreed on moving the WASI implementation to a separate repository once it is in a more complete state.

When compiling clang.wasm I noticed that inits.c is still huge (16MB), so I'll look into splitting how to split it up further, as currently I can only compile it with -O0.

@vshymanskyy
Copy link
Contributor Author

Awesome! 🙌

@turbolent
Copy link
Owner

Update: The WASI implementation now has support for big-endian machines and fd_readdir.

QEMU's user mode emulation is really useful for development, but also has some issues, e.g. when using it on a 64-bit host running 32-bit executables and performing filesystem operations.

@vshymanskyy
Copy link
Contributor Author

vshymanskyy commented Jan 24, 2022

With some efforts, it should be possible to w2c2 w2c2.wasm > w2c2.c 😱
Or even... wasm3 w2c2.wasm w2c2.wasm > w2c2.c
But parallel translation won't be posible with wasm2c.wasm atm (no pthreads in WASI yet: WebAssembly/wasi-libc#209).

@vshymanskyy
Copy link
Contributor Author

@turbolent , wasm3 w2c2.wasm w2c2.wasm > w2c2.c works with #7 ! 🎉 🎉 🎉

@turbolent
Copy link
Owner

turbolent commented Jan 26, 2022

@vshymanskyy Wow, awesome! This is actually going to be really useful on platforms with bad C compilers (e.g. the SGI MIPSpro compiler on IRIX) 👍

@turbolent
Copy link
Owner

As mentioned above, I ran into linking errors for clang on Mac OS X 10.4 on PowerPC. I assumed this was due to large static arrays that are generated for the data segments. I found some platform-specific ways of embedding the data segments directly in the binary without having to generate large array literals in the source in #8. This is generally useful, however, I am still getting linker issues for the example above, probably due to the binary being too large and jump instructions overflowing?

turbolent pushed a commit that referenced this issue Nov 8, 2022
Add turbolent's Python example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants