Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coprocessor for query planning #4966

Open
samuelAndalon opened this issue Apr 16, 2024 · 2 comments
Open

coprocessor for query planning #4966

samuelAndalon opened this issue Apr 16, 2024 · 2 comments

Comments

@samuelAndalon
Copy link
Contributor

samuelAndalon commented Apr 16, 2024

Is your feature request related to a problem? Please describe.
Coproccesors allows hooking into the router lifeclycle allowing extending the behavior of the app, router service, supergraph service, execution service, subgraph service.

Describe the solution you'd like
It would be nice if query planning could be decorated with a coprocessor that way we could decide where and how a query plan will be built.

If a coprocessor for query planning is not detected then spawn the deno process, and if it is, don't spawn it.
wrote an small Bun server capable of building query plans and ran some benchmarks comparing it against node.

image

node on top, bun bottom, 100tps during 10s.

unlike other the coprocessors, query planning coprocessor wont be called as much as cache for query plans will be in the main router app

Describe alternatives you've considered
rely on federation, however, building query plans and updating supergraphs is synchronous, it doesnt soon like a bad idea do leverage those responsibilities to a "sidecar".

Additional context
Add any other context or screenshots about the feature request here.

@garypen
Copy link
Contributor

garypen commented Apr 18, 2024

Hi @samuelAndalon.

That looks like an interesting project and it's nice to see how the coprocessor API can be extended to provide functionality like this.

This is a problem area that we have thought about several times over the last couple of years and we have started to form some opinions and work up some ideas about how to provide this kind of functionality. Here are some of the main problems we'd like to be able to address:

  1. Minimize redundant Query Planning across a fleet of routers.
  2. Support independent scaling of Query Planning resources from other router functionality.
  3. Support a massively scalable Query Planner
  4. Minimize latency between main router and shared Query Planning resource
  5. Support the evolution of the Query Planning format
  6. Support smart cache invalidation for a router when a schema is published

(there are more, but this paints a broad picture)

If I test out your idea across those 6 areas it would probably look a little like this:

  1. Doesn't help.
  2. Does help, but not using resources effectively across a fleet since CPU/Memory requirements duplicated per pod.
  3. Unsure. Probably not due to limitations of V8/JSC.
  4. Does help. Should have good latency, but perhaps the protocol isn't as high performance as we might like.
  5. Doesn't help, but is probably not an issue since it's router pod specific.
  6. Not required. The existing cache invalidation would apply.

That's a fairly positive picture, it looks like this would help reduce latency and provide a way to scale QP resource independently. Could we do better?

For a long time we've been aware of the benefits of caching query plans to be shared between routers. That provides a variety of benefits, but this is still a sub-optimal use of resource, since there is still a duplication of query planning across a fleet of routers.

One approach could be to provide a "standalone query planner with associated cache". The standalone planner could be deployed as a resilient component with pluggable cache and accessible by a fleet of routers. How would this score against our criteria?:

  1. Very well. We'd be doing the planning right next to the shared cache.
  2. Very well. It's a totally independent deployment.
  3. Very well. We'd centralise all that redundant planning CPU/Memory resource and use it for multiple routers.
  4. Probably slightly worse due to the latency to the planner. Similar to the existing distributed cache.
  5. Very well. This would be part of the design.
  6. Possibly optimisation across a fleet of routers.

This would require a careful evaluation of the implications of item 4 and 6, but, on the whole, I'm confident that a dedicated, distributed, caching query planner would be a much more scaleable solution than any of the alternatives: status quo or sidecar planning. One other drawback would be the fact that it would potentially be a SPOF, but careful design of the distributed planner should be able to minimise that risk.

For this kind of scenario: scalability is probably far more important than straight ahead performance, but we would continue to support query planning in the router for those deployments where performance far outweighs scalability requirements.

@samuelAndalon
Copy link
Contributor Author

samuelAndalon commented Apr 18, 2024

  1. Minimize redundant Query Planning across a fleet of routers.

The advantage of having a query planning service is that it can be shipped as a sidecar, or as a completely independent service out of a "pod", if is deployed as a single entity then multiple router instances will be calling the same "query planner service"

  1. Support independent scaling of Query Planning resources from other router functionality.

If query planner service is separately deployed, then, this wont be a problem.

  1. Support a massively scalable Query Planner

Based on some benchmarks that i performed, Bun is extremely fast in comparison with node and deno. (check issue where i shared some screenshots), Query Planning is 100% CPU, synchronous execution, and the Bun runtime is extremely fast, yes it is still javascript, but similar to the experimental parallelism feature in the router, Bun provides workers that will help to execute query plans in parallel.

  1. Minimize latency between main router and shared Query Planning resource

I believe this can be tuned, HTTP2, TCP Keep Alive connections, etc. on top of that we have the cache in routers that will avoid making same requests over and over again.

The best thing about this is that based on he needs the router user will decide how to ship the router and query planner service,

  1. as a sidecar
  2. as an independent service

both options are agnostic of the router, and router will just need a query planner url, for the planning coprocessor.

@Geal Geal changed the title coproccesor for query planning coprocessor for query planning Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants