Use `mapChunks` in `CirceInstances.streamedJsonArray` #7010

peterneyens · 2023-03-01T21:36:22Z

Small clean up to use mapChunks instead of an implementation using repeatPull.

danicheg

Really like this refactoring 👍🏻 but wouldn't you mind adding a small micro-benchmark to the CirceJsonBench suite to prove we aren't losing any microseconds with that implementation?

diesalbla · 2023-03-07T12:25:53Z

circe/src/main/scala/org/http4s/circe/CirceInstances.scala

+            val bldr = Chunk.newBuilder[Byte]
+            c.foreach { o =>
+              bldr += CirceInstances.comma
+              bldr += fromJsonToChunk(printer)(o)
            }
+            bldr.result


Perhaps this can be a local def nonFirst(c: Chunk[Json]): Chunk[Byte] function?

Additionally, rather than using the fromJsonToChunk on every element, and using chunk concatenation (in the bldr builder), would it also help to use one bytebuffer per source chunk, directly print each json to that buffer, and use one view ?

Edit perhaps that would be an optimisation to do at the circe library instead, with a printInterspersed or printCommaSeparated method that takes any iterable of Json, or similar. https://github.com/circe/circe/blob/6bbb7e73a529916faec89f1455d11356af03ea64/modules/core/shared/src/main/scala/io/circe/Printer.scala#L209-L221.

Additionally, rather than using the fromJsonToChunk on every element, and using chunk concatenation (in the bldr builder), would it also help to use one bytebuffer per source chunk, directly print each json to that buffer, and use one view ?

Benchmarking streamJsonArrayEncoder and jsonEncoder makes it clear that there is a lot of room for improvement. That feels a bit out of scope of my small refactoring in this PR.

peterneyens · 2023-05-13T17:14:17Z

Finally found some time to add a benchmark.

[info] Benchmark                           (elems)  (elemsPerChunk)  Mode  Cnt        Score        Error  Units
[info] CirceJsonStreamBench.encode_stream      100               50  avgt    5    54294.630 ±  13951.153  ns/op
[info] CirceJsonStreamBench.encode_stream      100              500  avgt    5    50007.465 ±   2632.255  ns/op
[info] CirceJsonStreamBench.encode_stream     1000               50  avgt    5   299525.990 ±  54286.551  ns/op
[info] CirceJsonStreamBench.encode_stream     1000              500  avgt    5   251252.929 ± 101122.138  ns/op
[info] CirceJsonStreamBench.encode_stream    10000               50  avgt    5  2731966.198 ± 760253.727  ns/op
[info] CirceJsonStreamBench.encode_stream    10000              500  avgt    5  2224727.347 ± 327952.214  ns/op

This PR:

[info] Benchmark                           (elems)  (elemsPerChunk)  Mode  Cnt        Score        Error  Units
[info] CirceJsonStreamBench.encode_stream      100               50  avgt    5    52789.592 ±   7598.824  ns/op
[info] CirceJsonStreamBench.encode_stream      100              500  avgt    5    49110.386 ±   1697.552  ns/op
[info] CirceJsonStreamBench.encode_stream     1000               50  avgt    5   280540.460 ±  30941.520  ns/op
[info] CirceJsonStreamBench.encode_stream     1000              500  avgt    5   237874.710 ±   6312.693  ns/op
[info] CirceJsonStreamBench.encode_stream    10000               50  avgt    5  2564971.598 ±  92078.114  ns/op
[info] CirceJsonStreamBench.encode_stream    10000              500  avgt    5  2214598.872 ± 685056.424  ns/op

Benchmark streamJsonArrayEncoder (vs jsonEncoder)

diesalbla · 2023-05-13T18:00:05Z

bench/src/main/scala/org/http4s/bench/CirceJsonStreamBench.scala

+import java.util.concurrent.TimeUnit
+
+// sbt "bench/jmh:run -i 10 -wi 10 -f 2 -t 1 org.http4s.bench.CirceJsonStreamBench"
+@BenchmarkMode(Array(Mode.AverageTime))


Some benchmarks often use a "Throughput" mode, rather than average. Perhaps that would help smooth out results error?

Use mapChunks in CirceInstances.streamedJsonArray

e7c91c0

mergify bot added series/0.23 PRs targeting 0.23.x module:circe labels Mar 1, 2023

armanbilge closed this Mar 2, 2023

armanbilge reopened this Mar 2, 2023

armanbilge changed the title ~~Use mapChunks in CirceInstances.streamedJsonArray~~ Use mapChunks in CirceInstances.streamedJsonArray Mar 2, 2023

danicheg reviewed Mar 2, 2023

View reviewed changes

diesalbla reviewed Mar 7, 2023

View reviewed changes

peterneyens force-pushed the mapChunks-streamedJsonArray branch from 09a06f4 to 05cb6d1 Compare May 13, 2023 17:58

Add CirceJsonStreamBench

05cb6d1

Benchmark streamJsonArrayEncoder (vs jsonEncoder)

diesalbla reviewed May 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `mapChunks` in `CirceInstances.streamedJsonArray` #7010

Use `mapChunks` in `CirceInstances.streamedJsonArray` #7010

peterneyens commented Mar 1, 2023

danicheg left a comment

diesalbla Mar 7, 2023

diesalbla Mar 7, 2023 •

edited

peterneyens May 13, 2023

peterneyens commented May 13, 2023

diesalbla May 13, 2023 •

edited

Use mapChunks in CirceInstances.streamedJsonArray #7010

Are you sure you want to change the base?

Use mapChunks in CirceInstances.streamedJsonArray #7010

Conversation

peterneyens commented Mar 1, 2023

danicheg left a comment

Choose a reason for hiding this comment

diesalbla Mar 7, 2023

Choose a reason for hiding this comment

diesalbla Mar 7, 2023 • edited

Choose a reason for hiding this comment

peterneyens May 13, 2023

Choose a reason for hiding this comment

peterneyens commented May 13, 2023

diesalbla May 13, 2023 • edited

Choose a reason for hiding this comment

Use `mapChunks` in `CirceInstances.streamedJsonArray` #7010

Use `mapChunks` in `CirceInstances.streamedJsonArray` #7010

diesalbla Mar 7, 2023 •

edited

diesalbla May 13, 2023 •

edited