wasm2c: Optionally support #embed #2325

SoniEx2 · 2023-11-11T12:36:28Z

Instead of merely emitting data segments as array initializers, it would be neat if we could (optionally) use #embed too.

Too bad offset(...) isn't standard so we'll need to emit separate files for each data segment.

The text was updated successfully, but these errors were encountered:

sbc100 · 2023-11-11T17:51:52Z

Wow, #embed looks awesome. First time I've seen it.

Using it for data segments seems possible, but it would also mean that wasm2c no longer generated just one single C file but a collection of files. Maybe as an option? What do you think would be the advantage of this option over doing the embedding like we do today?

sbc100 · 2023-11-11T17:52:25Z

(Doesn't #embed also emit array initializers?)

SoniEx2 · 2023-11-11T18:23:30Z

We note that wasm2c already outputs more than one file: a .h and a .c. We do think it should be an option, because C23 is, well we don't think it's even published yet? So yeah it's not exactly widely supported - yet.

We believe ThePhD's blog post has relevant benchmarks: https://thephd.dev/implementing-embed-c-and-c++

We've never personally hit data segments bigger than 48KiB when playing with wasm, but we're almost certain real-world use-cases do. ThePhD's benchmark used a 4MB file, which doesn't seem unreasonable to us: after all, wasm2c's is primarily used to take C/C++, compile it to wasm, and then compile it to C again, as part of RLBox; you can have a C program using #embed plus additional static initializers, compile it to wasm, and get fairly sizeable data segments that way.

sbc100 · 2023-11-11T18:34:53Z

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

workingjubilee · 2023-11-11T20:38:11Z

Oh I see, do #embed doesn't just generate array initializers like we currently do? It can use compiler specific builtins to go faster under the hood? Do the advantage would be compile time improvements for large data segments?

Yes, it's basically designed to allow it to be implemented as "just shove the !@#$%^&* bytes into the !@#$%^&* executable already!" instead of anything like using array initializers. Raw byte concatenation, effectively. "The compiler uses fwrite and then creates an appropriate symbol for the spot which was written-to."

sbc100 · 2023-11-12T05:49:04Z

So does that mean its not really a pure pre-processor feature? i.e. if you run the pre-processor it doesn't produce the array init expressions that I was imagining? Or is it that if you run a compiler like clang that does both pre-processing and compilation its expected to take a shortcut and avoid the array init? I guess the latter.

workingjubilee · 2023-11-12T07:06:47Z

It is, strictly speaking, doing that. If you dump out the preprocessed file, it will contain what you are expecting. If you compile it like a typical C programmer, however, yes, very few C compilers have zero things that exploit the fact that they are both preprocessor and compiler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm2c: Optionally support #embed #2325

wasm2c: Optionally support #embed #2325

SoniEx2 commented Nov 11, 2023 •

edited

sbc100 commented Nov 11, 2023

sbc100 commented Nov 11, 2023

SoniEx2 commented Nov 11, 2023

sbc100 commented Nov 11, 2023

workingjubilee commented Nov 11, 2023 •

edited

sbc100 commented Nov 12, 2023

workingjubilee commented Nov 12, 2023

wasm2c: Optionally support #embed #2325

wasm2c: Optionally support #embed #2325

Comments

SoniEx2 commented Nov 11, 2023 • edited

sbc100 commented Nov 11, 2023

sbc100 commented Nov 11, 2023

SoniEx2 commented Nov 11, 2023

sbc100 commented Nov 11, 2023

workingjubilee commented Nov 11, 2023 • edited

sbc100 commented Nov 12, 2023

workingjubilee commented Nov 12, 2023

SoniEx2 commented Nov 11, 2023 •

edited

workingjubilee commented Nov 11, 2023 •

edited