Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c: Optionally support #embed #2325

Open
SoniEx2 opened this issue Nov 11, 2023 · 7 comments
Open

wasm2c: Optionally support #embed #2325

SoniEx2 opened this issue Nov 11, 2023 · 7 comments

Comments

@SoniEx2
Copy link
Contributor

SoniEx2 commented Nov 11, 2023

Instead of merely emitting data segments as array initializers, it would be neat if we could (optionally) use #embed too.

Too bad offset(...) isn't standard so we'll need to emit separate files for each data segment.

@sbc100
Copy link
Member

sbc100 commented Nov 11, 2023

Wow, #embed looks awesome. First time I've seen it.

Using it for data segments seems possible, but it would also mean that wasm2c no longer generated just one single C file but a collection of files. Maybe as an option? What do you think would be the advantage of this option over doing the embedding like we do today?

@sbc100
Copy link
Member

sbc100 commented Nov 11, 2023

(Doesn't #embed also emit array initializers?)

@SoniEx2
Copy link
Contributor Author

SoniEx2 commented Nov 11, 2023

We note that wasm2c already outputs more than one file: a .h and a .c. We do think it should be an option, because C23 is, well we don't think it's even published yet? So yeah it's not exactly widely supported - yet.

We believe ThePhD's blog post has relevant benchmarks: https://thephd.dev/implementing-embed-c-and-c++

We've never personally hit data segments bigger than 48KiB when playing with wasm, but we're almost certain real-world use-cases do. ThePhD's benchmark used a 4MB file, which doesn't seem unreasonable to us: after all, wasm2c's is primarily used to take C/C++, compile it to wasm, and then compile it to C again, as part of RLBox; you can have a C program using #embed plus additional static initializers, compile it to wasm, and get fairly sizeable data segments that way.

@sbc100
Copy link
Member

sbc100 commented Nov 11, 2023

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

@workingjubilee
Copy link

workingjubilee commented Nov 11, 2023

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

Yes, it's basically designed to allow it to be implemented as "just shove the !@#$%^&* bytes into the !@#$%^&* executable already!" instead of anything like using array initializers. Raw byte concatenation, effectively. "The compiler uses fwrite and then creates an appropriate symbol for the spot which was written-to."

@sbc100
Copy link
Member

sbc100 commented Nov 12, 2023

So does that mean its not really a pure pre-processor feature? i.e. if you run the pre-processor it doesn't produce the array init expressions that I was imagining? Or is it that if you run a compiler like clang that does both pre-processing and compilation its expected to take a shortcut and avoid the array init? I guess the latter.

@workingjubilee
Copy link

It is, strictly speaking, doing that. If you dump out the preprocessed file, it will contain what you are expecting. If you compile it like a typical C programmer, however, yes, very few C compilers have zero things that exploit the fact that they are both preprocessor and compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants