Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade emscripten to 3.X.X? #163

Open
4nthonylin opened this issue Nov 23, 2022 · 7 comments
Open

Upgrade emscripten to 3.X.X? #163

4nthonylin opened this issue Nov 23, 2022 · 7 comments

Comments

@4nthonylin
Copy link
Contributor

The current emscripten (1.38.43) is ~3 years old now. The docker container used is also deprecated https://hub.docker.com/r/trzeci/emscripten. Can we migrate to the official emscripten and use a newer version?

There are hosts of benefits and bug fixes from the past three years.

@nrabinowitz
Copy link
Collaborator

I would like to do so, and can investigate. In the past I had to scrap this because the latest emscripten version at the time produced a library with significant performance degradation.

@4nthonylin
Copy link
Contributor Author

@nrabinowitz Do you have an example branch of how you upgraded emscripten? Trying locally and seeing some major version breaks since MODULARIZE_INSTANCE is no longer supported and multi-environment builds will result in an async import.

Note that https://stackoverflow.com/questions/60987493/h3-js-in-a-web-worker-document-is-not-defined would be fixed with a newer version of emscripten that adds a document defined check here emscripten-core/emscripten#12553

@nrabinowitz
Copy link
Collaborator

Just spent a little time looking into this. We should still be able to get a synchronous module via

emcc -sMODULARIZE -sWASM_ASYNC_COMPILATION=0

This works in emscripten 3.1.30, though for some reason not with EXPORT_ES6, which turns it async again (that looks like an emscripten bug).

You can see my branch here: https://github.com/nrabinowitz/h3-js/tree/update-emscripten

The good news - the output seems massively smaller, with gzipped file sizes going from 48Kb to 14kb.
The bad news - there's still a serious performance degradation, especially for some of the core functions:

master branch

isValidCell x 3,844,605 ops/sec ±0.61% (90 runs sampled)
latLngToCell x 458,094 ops/sec ±1.17% (90 runs sampled)
cellToLatLng x 1,200,561 ops/sec ±0.60% (87 runs sampled)
cellToLatLng - integers x 1,649,222 ops/sec ±0.53% (89 runs sampled)
cellToBoundary x 381,646 ops/sec ±1.25% (87 runs sampled)
cellToBoundary - integers x 378,033 ops/sec ±1.94% (88 runs sampled)
getIcosahedronFaces x 877,811 ops/sec ±3.20% (79 runs sampled)
gridDisk x 194,921 ops/sec ±1.11% (89 runs sampled)
polygonToCells_9 x 4,957 ops/sec ±0.61% (89 runs sampled)
polygonToCells_11 x 362 ops/sec ±1.11% (86 runs sampled)
polygonToCells_10ring x 71.60 ops/sec ±0.87% (71 runs sampled)
cellsToMultiPolygon x 775 ops/sec ±0.76% (90 runs sampled)
compactCells x 2,472 ops/sec ±0.98% (91 runs sampled)
uncompactCells x 739 ops/sec ±0.44% (89 runs sampled)
areNeighborCells x 1,078,589 ops/sec ±0.56% (92 runs sampled)
cellsToDirectedEdge x 705,669 ops/sec ±0.87% (89 runs sampled)
getDirectedEdgeOrigin x 1,033,246 ops/sec ±0.80% (91 runs sampled)
getDirectedEdgeDestination x 945,509 ops/sec ±1.48% (87 runs sampled)
isValidDirectedEdge x 3,661,669 ops/sec ±0.71% (91 runs sampled)

Updated emscripten

isValidCell x 3,439,617 ops/sec ±1.05% (88 runs sampled)
latLngToCell x 36,071 ops/sec ±0.61% (89 runs sampled)
cellToLatLng x 101,367 ops/sec ±1.29% (92 runs sampled)
cellToLatLng - integers x 103,621 ops/sec ±0.75% (91 runs sampled)
cellToBoundary x 16,883 ops/sec ±0.82% (91 runs sampled)
cellToBoundary - integers x 17,129 ops/sec ±0.60% (91 runs sampled)
getIcosahedronFaces x 953,856 ops/sec ±0.66% (89 runs sampled)
gridDisk x 190,580 ops/sec ±1.09% (91 runs sampled)
polygonToCells_9 x 461 ops/sec ±0.45% (88 runs sampled)
polygonToCells_11 x 59.06 ops/sec ±0.51% (61 runs sampled)
polygonToCells_10ring x 17.39 ops/sec ±0.21% (46 runs sampled)
cellsToMultiPolygon x 51.25 ops/sec ±0.39% (64 runs sampled)
compactCells x 3,462 ops/sec ±0.57% (91 runs sampled)
uncompactCells x 686 ops/sec ±5.37% (86 runs sampled)
areNeighborCells x 1,180,269 ops/sec ±0.77% (88 runs sampled)
cellsToDirectedEdge x 688,882 ops/sec ±5.47% (88 runs sampled)
getDirectedEdgeOrigin x 1,002,533 ops/sec ±0.52% (94 runs sampled)
getDirectedEdgeDestination x 921,541 ops/sec ±0.89% (89 runs sampled)
isValidDirectedEdge x 3,392,697 ops/sec ±0.62% (92 runs sampled)

While some functions are comparable and compactCells actually seems faster with the later version, core functions like latLngToCell and cellToLatLng drop by an order of magnitude, which is a serious blocker for this upgrade.

@4nthonylin
Copy link
Contributor Author

Thanks for sharing the benchmarks. Agreed definitely don't want to regress performance. Out of curiosity could we output a wasm bundle? Shouldn't that be faster than a pure javascript implementation? Would be great to select a pure javascript vs wasm library.

There is a workaround to fix the compilation error for Web workers and React-Native without upgrading emscripten. How would you feel about this patch?

diff --git a/scripts/update-emscripten.sh b/scripts/update-emscripten.sh
index 4c93e1c..273a39a 100755
--- a/scripts/update-emscripten.sh
+++ b/scripts/update-emscripten.sh
@@ -57,7 +57,9 @@ emcc -O3 -I ../include *.c -DH3_HAVE_VLA --memory-init-file 0 \
     "$@"
 
 for file in *.js ; do
-  cat ../../../../build/pre.js "$file" > ../../../../out/"$file"
+  cat ../../../../build/pre.js "$file" \
+  | sed 's/else if(document.currentScript)/else if(typeof document != "undefined" \&\& document.currentScript)/' \
+  > ../../../../out/"$file"
 done
 
 popd

Just manually add in the typeof document != "undefined" check. This is the same check done in newer versions of emscripten: https://github.com/emscripten-core/emscripten/blob/main/src/shell.js#L378

Validated that adding this check allows h3-js to be used in a web worker.

@nrabinowitz
Copy link
Collaborator

Thanks for sharing the benchmarks. Agreed definitely don't want to regress performance. Out of curiosity could we output a wasm bundle? Shouldn't that be faster than a pure javascript implementation? Would be great to select a pure javascript vs wasm library.

I've been meaning to target wasm as well, I just haven't had a chance to look into this - it should be fairly straightforward, but I don't know how much the binding code would need to change (i.e. getting data into/out of wasm might be substantially different than it is for the transpiled JS).

I definitely still want a pure JS bundle as the primary output, however. The library is used in a number of contexts now where only synchronous pure JS would be supported.

There is a workaround to fix the compilation error for Web workers and React-Native without upgrading emscripten. How would you feel about this patch?

This looks great if that's all it takes! Do you want to submit this as a PR? I think the only hard part here is testing it - I doubt there's an easy way to add it to CI :/.

@4nthonylin
Copy link
Contributor Author

Yea! I'll get a PR open, will figure out a good way to test. I'm thinking grepping the bundles afterwards to validate that they contain the patch.

Side note, was able to benchmark WASM=1 with WASM_ASYNC_COMPILATION=0 without any changes. Can post benchmarks later but tl;dr; wasm was faster than pure js compiled with emscripten 3.x (around ~2x for latLngToCell), however was still a magnitude slower (10x) than the current version. I'm wondering if there's some different compilation flags or settings that we would have to play with.

@nrabinowitz
Copy link
Collaborator

Side note, was able to benchmark WASM=1 with WASM_ASYNC_COMPILATION=0 without any changes. Can post benchmarks later but tl;dr; wasm was faster than pure js compiled with emscripten 3.x (around ~2x for latLngToCell), however was still a magnitude slower (10x) than the current version. I'm wondering if there's some different compilation flags or settings that we would have to play with.

Very interesting, thanks for the info! I'm going to open a ticket for Emscripten - we'll see if that produces any ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants