perf(coord): Improve performance of result streaming #1549

vishramachandran · 2023-04-03T17:31:36Z

Pull Request checklist

The commit(s) message(s) follows the contribution guidelines ?
Tests for the changes have been added (for bug fixes / features) ?
Docs have been added / updated (for bug fixes / features) ?

Improve performance of Query Result Streaming by

Adding multiple (configurable number) RVs within one streaming StreamQueryResult message. This was needed since performance tests with one RV per akka message created a bottleneck in Akka Remoting. [2023-03-23 12:45:13,666] WARN filo-standalone-akka.actor.default-dispatcher-34 akka.remote.EndpointWriter [akka.tcp://filo-standalone@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Ffilo-standalone%40127.0.0.1%3A63524-1/endpointWriter] - [79284] buffered messages in EndpointWriter for [akka.tcp://filo-standalone@127.0.0.1:63524]. You should probably implement flow control to avoid flooding the remote connection.
Use one result actor to receive streaming query results from callee. Since there was a overload of LocalActorRefs in heap dumps. This required (a) externalizing QueryScheduler outside of QueryActor so it can be used from ResultActor as well. (b) Adding new unique planId UUID to each execPlan so we can distinguish between streamed query results across various plans and route it to the right consumer query pipeline.
Invoke child plans in parallel rather than sequential

With this change I was able to bring performance of raw query throughput and latency on one-node-setup on par with non-streaming solution. More iterations are needed to improve the performance. The bottleneck appears to arise from the fact that Prom HTTP API is non-streaming and requires accumulation of new swaths of data in memory. In any case, FiloDB is relieved from this and would be more reliable than without streaming.

Briefly here are the remaining TODOs:

On a setup with one FiloDB, and one query facade service node, I was able to see equivalent latencies for 5QPS. There was improved GC performance on FiloDB, but increased memory usage on Query Facade process. Scaling streaming setup to 6QPS produced more timeouts than fat response.
There are some functional issues on Apple M1. The query monix pipeline seems to be canceled abruptly. Whereas the same setup in Intel-Mac works functionally.
There is probably a lurking bug when I occasional see some additional result RVs than expected.

alextheimer

Approved with a couple comments / questions 👍

coordinator/src/main/scala/filodb.coordinator/ActorPlanDispatcher.scala

coordinator/src/main/scala/filodb.coordinator/ResultActor.scala

coordinator/src/main/scala/filodb.coordinator/queryengine/Utils.scala

query/src/main/scala/filodb/query/ProtoConverters.scala

amolnayak311

Thanks @vishramachandran for the PR, if you can rebase I will be happy to take a quick look again and approve.

vishramachandran requested review from amolnayak311, sandeep6189, sherali42, alextheimer and yu-shipit April 3, 2023 17:31

alextheimer previously approved these changes Apr 18, 2023

View reviewed changes

vishramachandran dismissed alextheimer’s stale review via 60604d1 April 19, 2023 12:42

vishramachandran force-pushed the single-result-actor branch from 60604d1 to 7529e17 Compare April 19, 2023 21:59

vishramachandran force-pushed the single-result-actor branch from 7529e17 to 62fe9f4 Compare May 15, 2023 17:31

amolnayak311 previously approved these changes Jun 5, 2023

View reviewed changes

perf(coord): Improve performance of result streaming

9a3507f

vishramachandran dismissed amolnayak311’s stale review via 9a3507f June 7, 2023 18:24

vishramachandran force-pushed the single-result-actor branch from 62fe9f4 to 9a3507f Compare June 7, 2023 18:24

vishramachandran added 3 commits June 7, 2023 14:15

Keep original fat response conversion

e210141

Merge branch 'develop' into single-result-actor

91271fd

additional case to cover for SRVs when converting IBRV to SRV

dd9dea9

amolnayak311 approved these changes Jun 13, 2023

View reviewed changes

vishramachandran merged commit 08fb4e0 into filodb:develop Jun 13, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(coord): Improve performance of result streaming #1549

perf(coord): Improve performance of result streaming #1549

vishramachandran commented Apr 3, 2023 •

edited

alextheimer left a comment

amolnayak311 left a comment

perf(coord): Improve performance of result streaming #1549

perf(coord): Improve performance of result streaming #1549

Conversation

vishramachandran commented Apr 3, 2023 • edited

alextheimer left a comment

Choose a reason for hiding this comment

amolnayak311 left a comment

Choose a reason for hiding this comment

vishramachandran commented Apr 3, 2023 •

edited