Skip to content

Multithread JSON parsing library in pure C11

License

Notifications You must be signed in to change notification settings

batonius/threason

Repository files navigation

threason

An experimental pure C11 multithreaded JSON parser library. DOM parsing only, no streaming and no creation/modification.

Features

  1. Written in standard C11 (-xc -std=c11 -pedantic) with no non-standard libraries/extensions used. (I had to use -D_POSIX_C_SOURCE=199309L to get CLOCK_MONOTONIC in the testing binary, tho).
  2. High-quality C code with tests and no warnings, leaks, data races or UBs (at least according to valgrind, address- and UB-sanititzers).
  3. Stores DOM in contiguous buffers (one per thread), thus minimizing allocations.
  4. Single-threaded mode is ~3x slower than simdjosn, ~2x slower than yyjson, ~2x faster than rapidjson.
  5. Multithreaded mode with 4 threads can be as fast as simdjson and somewhat faster than yyjson, both single-threaded.
  6. Indexing an array is O(1), indexing an object is O(log(n)).

Whys

Can I use it in production?

You can, but why would you? There are thoroughly-tested, fine-tuned, much faster implementations like simdjson and yyjson out there. I personally would stick to them.

Why then?

I had an idea about how to efficiently parallelize JSON parsing, and I wanted to see if it's practical. Turns out it is, providing 3x speedup on 4 threads, which is a great result.

Will you work on improving the single-threaded performance? That way threason's multithreaded results would be competitive with these other libraries.

Nah. But maybe you can use the same approach to make these other libraries scalable.

Why C?

I dislike the language, so I wanted to use it for a non-trivial project to see if I'm being unfair. That's also the reason I tried to produce high-quality UB-free code with lots of checks. Turns out I wasn't being unfair, and my distaste is more nuanced now.