-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(coord): Improve performance of result streaming #1549
perf(coord): Improve performance of result streaming #1549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a couple comments / questions 👍
coordinator/src/main/scala/filodb.coordinator/queryengine/Utils.scala
Outdated
Show resolved
Hide resolved
60604d1
to
7529e17
Compare
7529e17
to
62fe9f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vishramachandran for the PR, if you can rebase I will be happy to take a quick look again and approve.
62fe9f4
to
9a3507f
Compare
Pull Request checklist
Improve performance of Query Result Streaming by
[2023-03-23 12:45:13,666] WARN filo-standalone-akka.actor.default-dispatcher-34 akka.remote.EndpointWriter [akka.tcp://filo-standalone@127.0.0.1:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Ffilo-standalone%40127.0.0.1%3A63524-1/endpointWriter] - [79284] buffered messages in EndpointWriter for [akka.tcp://filo-standalone@127.0.0.1:63524]. You should probably implement flow control to avoid flooding the remote connection.
LocalActorRef
s in heap dumps. This required (a) externalizing QueryScheduler outside of QueryActor so it can be used from ResultActor as well. (b) Adding new unique planId UUID to each execPlan so we can distinguish between streamed query results across various plans and route it to the right consumer query pipeline.With this change I was able to bring performance of raw query throughput and latency on one-node-setup on par with non-streaming solution. More iterations are needed to improve the performance. The bottleneck appears to arise from the fact that Prom HTTP API is non-streaming and requires accumulation of new swaths of data in memory. In any case, FiloDB is relieved from this and would be more reliable than without streaming.
Briefly here are the remaining TODOs: