Proposal: Add a benchmark. #235

martinfrances107 · 2021-04-18T16:36:18Z

In my spare time I am porting this module over to rust ( rustlang ) so I can have a wasm binary that I run in the browser.

As I work I am judging my process by how many test I have ported over .. but now I need something else... I need to start profiling. ( it is still very much pre-alpha level code but it is showing signs of working.)

What I need is a benchmark .. that is

The simplest code that exercises the largest amount of code paths a produces a number so that I can measure improvement or degradations in performance.

Also what I need is a javascript equivalent for comparison.

So , I have converted this tutorial

https://www.d3-graph-gallery.com/graph/backgroundmap_canvas_basic.html

to run d3-geo with the orthographic projection.

I have a PR .. that I am about to post.

In terms of discussion ... perhaps future work will have the projection selected from a pull-down menu

I am open to suggestions .. If anyone has an idea for other benchmarks ... I am wiling to do the work.

Currently the RenderTime on my laptop for the benchmark is 64ms

If the rust code runs 10 times faster, as I hope, then we can have a detailed globe spinning well below the 16ms needed for jank free animation.

jrus · 2021-04-19T04:32:51Z

The time spent in projecting a map is a combination of function call overhead, expensive arithmetic (mostly a lot of trigonometric function evaluations), and probably some branch mispredictions. Some parts of the code can be more careful about avoiding unnecessary branches, but I would expect the other two to be more significant.

If you want to speed everything up, you probably first want to re-architect the code to do tight loops of pure arithmetic over large homogeneous arrays of data, re-using already allocated blocks of memory whenever possible. Switching programming languages may make that more convenient, but it can also be done in javascript if you are careful and write in a low-level C style (the javascript JITs should usually do a pretty good job with such code). But it is a substantially different code style than the current "stream" concept.

Breaking data into chunks and processing them in parallel on different CPU cores would also speed everything up roughly linearly with the number of cores you have available. (Unfortunately SharedArrayBuffer has been disabled in some browsers for security reasons.) I’m not sure what the latest wasm multicore story is.

Second, once you have cut the overhead down, you can save most of the trigonometry (in some map projections a bit of trig is still unavoidable, but fast approximate trig functions could be used) by switching from using longitude/latitude to using stereographic coordinates as a canonical internal data representation (I have been working out the details in the past couple years at https://observablehq.com/@jrus/planisphere). You can save a large amount of computation on rotation, interpolation, clipping, etc.

Once wasm reliably supports SIMD, adopting SIMD could speed some parts of the pipeline by another 2x or more, depending on how wide the SIMD support is. But every other kind of optimization should theoretically be accessible from plain javascript.

If you just want a spinning globe in orthographic projection and don't need arbitrary map projections etc., you can make something more specialized than d3-geo which will be a whole lot faster. You can use lower precision arithmetic, switch everything to 3d cartesian coordinates, and do rotation by a matrix product. You could even use a GPU.

martinfrances107 · 2021-04-19T09:42:55Z

I think I should start with a alternate explanation of why I think the benchmark is valid.

I want to spend 90% of my time writing application level code...

d3 is a great eco-system for this .. from application to application I say can just bolt one of the many other packages as needed.

( I am simultaneously porting
https://github.com/martinfrances107/rust_d3_delaunay
https://github.com/martinfrances107/rust_d3_geo
https://github.com/martinfrances107/rust_d3_geo_voronoi
)

As you say I could refactor the loops, suffer the browser incompatibilities of shared buffer arrays. and write GPU centric code ... but I that goes against the spirit of a rapid development.

Let me express my expectations ...

if I port to rust then my expectation is to a raw x10 speedup...
but my experience is different the FFI between rust and javascript is a major bottleneck

if for example as I think will happen in this benchmark : -
Approximately 240kb will be passed down into the rust library via the shared memory buffer to rust and then 240kb of data will be passed up again via the shared memory buffer ..

Well If you don't take the time to mitigate the bottlenecks then speedup will drop to only x2.
( if the output is via a "2d" context rather than rendering to SVG the data can be drawn on the screen without the return data path)

From my blinkered rapid-development mindset .. if I can rewrite a small part of my call to call the equivalent FFI function ... and just get a doubling in speed. then I don't care about the waste. So even writing to an SVG and suffering the bottleneck is maybe valid.

I can only do that will confidence if I know there is : -

The API's look identical ..
The underlying implementations are s close a possible.
A one-to-one mapping between d3's javascript tests and the rust equivalent.

Anyway that is what I mean by port.

jrus · 2021-04-19T20:04:57Z

Sorry, I didn’t mean to suggest that making a benchmark is ‘invalid’. Indeed it sounds pretty useful: Go for it!

I would be pretty surprised if you see anything close to 10x improvement if you port the code 1:1 to Rust. You might even see a slowdown. But indeed, try and see!

If you want to significantly beat the current version, I think the place to start is putting all of the data in a single large homogeneous data array (think of a numpy or Matlab type array here), and then doing pure arithmetic in simple loops. You'll save overhead from chasing down pointers, allocating small arrays, nested function calls in the inner loops, and so on. I could imagine getting a several times speedup from that, though perhaps not 10x. (Again, someone would have to try to find out.) This could be done from Javascript if you write in a low-level C style; Rust is probably a more convenient language for writing at higher level and still emitting tight low-overhead assembly for numerical workloads. (It’s also a lot easier in a compiled language to avoid some of the traps that can accidentally make javascript jump out of the fully JITed version and start running an order of magnitude slower than necessary.)

After that the bottleneck becomes expensive arithmetic, mostly a bunch of trigonometry.

Fully-JITed Javascript can be pretty darn fast though. See e.g. this recent thing, https://surma.dev/things/js-to-asc/

martinfrances107 · 2021-04-20T05:22:09Z

The benchmark I want to add to this module ready...

#236

I really don't want to refactor my set of applications into performance tuned loops...

Can I speak up for the modular approach of the cached stream pipeline

  projection.stream = function(stream) {
    return cache && cacheStream === stream ? cache : cache = transformRadians(transformRotate(rotate)(preclip(projectResample(postclip(cacheStream = stream)))));
  };

src/projection/index.js

I think this is a well considered approach ( all credit to whoever wrote it ) and once you unravelled it a bit and can see the in-depth molecularity it becomes easy to maintain and debug ... it looks like enterprise grade library code.
I have learnt to appreciate easy to maintain and debug over performance ... it is easier to trust it.

I appreciate good code like this takes hundreds of hours to get right...
So if I port that code .. I am getting that goodness for free.

Regarding rust

If you read anything about rust and its zero cost abstraction ...which is new compared to C

If all type information is determined statically .. ie known at compile time .. then the compiler's optimiser can remove the branch checks on certain "if-then" statements.

I just want to say the modular approach fits in really nicely with the "zero-cost" language improvement.
So I am expecting good performance and good readability

jrus · 2021-04-20T19:07:18Z

easy to maintain and debug ... it looks like enterprise grade library code.
I have learnt to appreciate easy to maintain and debug over performance

Yeah, Mike Bostock makes excellent modular code, and d3-geo is certainly elegant, carefully thought-out, and pleasant to work with.

It’s just a different style than often seen in e.g. scientific computing, image processing, or video game code where people are aiming for peak numerical performance or have tight per-frame compute budgets.

martinfrances107 added a commit to martinfrances107/d3-geo that referenced this issue Apr 18, 2021

Proposal: Add a benchmark. d3#235

2a6ab6b

martinfrances107 added a commit to martinfrances107/d3-geo that referenced this issue Apr 18, 2021

Proposal: Add a benchmark. d3#235

8d4e16c

martinfrances107 added a commit to martinfrances107/d3-geo that referenced this issue Apr 18, 2021

Proposal: Add a benchmark. d3#235

014a4ae

martinfrances107 mentioned this issue Apr 18, 2021

Proposal: Add a benchmark. #235 #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add a benchmark. #235

Proposal: Add a benchmark. #235

martinfrances107 commented Apr 18, 2021

jrus commented Apr 19, 2021 •

edited

martinfrances107 commented Apr 19, 2021

jrus commented Apr 19, 2021 •

edited

martinfrances107 commented Apr 20, 2021

jrus commented Apr 20, 2021 •

edited

Proposal: Add a benchmark. #235

Proposal: Add a benchmark. #235

Comments

martinfrances107 commented Apr 18, 2021

jrus commented Apr 19, 2021 • edited

martinfrances107 commented Apr 19, 2021

jrus commented Apr 19, 2021 • edited

martinfrances107 commented Apr 20, 2021

jrus commented Apr 20, 2021 • edited

jrus commented Apr 19, 2021 •

edited

jrus commented Apr 19, 2021 •

edited

jrus commented Apr 20, 2021 •

edited