Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mapChunks in CirceInstances.streamedJsonArray #7010

Open
wants to merge 2 commits into
base: series/0.23
Choose a base branch
from

Conversation

peterneyens
Copy link
Contributor

Small clean up to use mapChunks instead of an implementation using repeatPull.

@mergify mergify bot added series/0.23 PRs targeting 0.23.x module:circe labels Mar 1, 2023
@armanbilge armanbilge closed this Mar 2, 2023
@armanbilge armanbilge reopened this Mar 2, 2023
@armanbilge armanbilge changed the title Use mapChunks in CirceInstances.streamedJsonArray Use mapChunks in CirceInstances.streamedJsonArray Mar 2, 2023
Copy link
Member

@danicheg danicheg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this refactoring 👍🏻 but wouldn't you mind adding a small micro-benchmark to the CirceJsonBench suite to prove we aren't losing any microseconds with that implementation?

Comment on lines +299 to +304
val bldr = Chunk.newBuilder[Byte]
c.foreach { o =>
bldr += CirceInstances.comma
bldr += fromJsonToChunk(printer)(o)
}
bldr.result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this can be a local def nonFirst(c: Chunk[Json]): Chunk[Byte] function?

Copy link
Contributor

@diesalbla diesalbla Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, rather than using the fromJsonToChunk on every element, and using chunk concatenation (in the bldr builder), would it also help to use one bytebuffer per source chunk, directly print each json to that buffer, and use one view ?

Edit perhaps that would be an optimisation to do at the circe library instead, with a printInterspersed or printCommaSeparated method that takes any iterable of Json, or similar. https://github.com/circe/circe/blob/6bbb7e73a529916faec89f1455d11356af03ea64/modules/core/shared/src/main/scala/io/circe/Printer.scala#L209-L221.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, rather than using the fromJsonToChunk on every element, and using chunk concatenation (in the bldr builder), would it also help to use one bytebuffer per source chunk, directly print each json to that buffer, and use one view ?

Benchmarking streamJsonArrayEncoder and jsonEncoder makes it clear that there is a lot of room for improvement. That feels a bit out of scope of my small refactoring in this PR.

@peterneyens
Copy link
Contributor Author

Finally found some time to add a benchmark.

Previous

[info] Benchmark                           (elems)  (elemsPerChunk)  Mode  Cnt        Score        Error  Units
[info] CirceJsonStreamBench.encode_stream      100               50  avgt    5    54294.630 ±  13951.153  ns/op
[info] CirceJsonStreamBench.encode_stream      100              500  avgt    5    50007.465 ±   2632.255  ns/op
[info] CirceJsonStreamBench.encode_stream     1000               50  avgt    5   299525.990 ±  54286.551  ns/op
[info] CirceJsonStreamBench.encode_stream     1000              500  avgt    5   251252.929 ± 101122.138  ns/op
[info] CirceJsonStreamBench.encode_stream    10000               50  avgt    5  2731966.198 ± 760253.727  ns/op
[info] CirceJsonStreamBench.encode_stream    10000              500  avgt    5  2224727.347 ± 327952.214  ns/op

This PR:

[info] Benchmark                           (elems)  (elemsPerChunk)  Mode  Cnt        Score        Error  Units
[info] CirceJsonStreamBench.encode_stream      100               50  avgt    5    52789.592 ±   7598.824  ns/op
[info] CirceJsonStreamBench.encode_stream      100              500  avgt    5    49110.386 ±   1697.552  ns/op
[info] CirceJsonStreamBench.encode_stream     1000               50  avgt    5   280540.460 ±  30941.520  ns/op
[info] CirceJsonStreamBench.encode_stream     1000              500  avgt    5   237874.710 ±   6312.693  ns/op
[info] CirceJsonStreamBench.encode_stream    10000               50  avgt    5  2564971.598 ±  92078.114  ns/op
[info] CirceJsonStreamBench.encode_stream    10000              500  avgt    5  2214598.872 ± 685056.424  ns/op

Benchmark streamJsonArrayEncoder (vs jsonEncoder)
import java.util.concurrent.TimeUnit

// sbt "bench/jmh:run -i 10 -wi 10 -f 2 -t 1 org.http4s.bench.CirceJsonStreamBench"
@BenchmarkMode(Array(Mode.AverageTime))
Copy link
Contributor

@diesalbla diesalbla May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some benchmarks often use a "Throughput" mode, rather than average. Perhaps that would help smooth out results error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:circe series/0.23 PRs targeting 0.23.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants