Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the query planner options to the query plan cache key #5100

Merged
merged 23 commits into from May 14, 2024

Conversation

Geal
Copy link
Contributor

@Geal Geal commented May 7, 2024

Fix #5093

Some query plan options can result in incompatible generated plans if they are enabled or disabled, so we must take them into account in the cache key, otherwise a router might get a plan from the cache that does not match its configuration.
We cannot cover the entire apollo-federation config in the hash for now, this will have to be done as a follow up


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented May 7, 2024

CI performance tests

  • step - Basic stress test that steps up the number of users over time
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • large-request - Stress test with a 1 MB request payload
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • xxlarge-request - Stress test with 100 MB request payload
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • xlarge-request - Stress test with 10 MB request payload
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • no-graphos - Basic stress test, no GraphOS.
  • reload - Reload test over a long period of time at a constant rate of users
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • const - Basic stress test that runs with a constant number of users

@BrynCooke
Copy link
Contributor

I added some testing for this which meant extending the integration test runner a little.

I'm not massively keen on the way that we obtain the config for the query planners. Once the rust query planner is hashable maybe we could consider implementing From<Configuration> for QueryPlannerConfig

abernix
abernix previously requested changes May 9, 2024
Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed some softer bits and requested changes around them on account of looking for answers to my questions, but I will leave the rest of the review to others for now. Happy to revisit.

.changesets/fix_geal_cache_key_qp_options.md Outdated Show resolved Hide resolved
.changesets/fix_geal_cache_key_qp_options.md Outdated Show resolved Hide resolved
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/caching_query_planner.rs Outdated Show resolved Hide resolved
@BrynCooke BrynCooke requested a review from abernix May 9, 2024 13:40
"plan:{}:{}:{}:{}",
FEDERATION_VERSION, self.hash, operation, metadata,
"plan:{}:{}:{}:{}:{}",
CACHE_KEY_VERSION, FEDERATION_VERSION, self.hash, operation, metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unrelated to this particular PR, but it took me a while to understand that operation is a hash of self.operation, which is actually an operation name.

Also, self.hash is hash of query and relevent parts of schema, but taking into account only operation matching operation_name.
At the same time, self.hash doesn't include a hash of "operation_name" itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also self.metadata is actually "authorization metadata".

P.S. Same as the previous comment, this is unrelated to this PR.
Just comments for others to also understand what is happening in this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this refactored and keys become self describing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When/where does this change get itemized/made?

hasher.update(&serde_json::to_vec(&self.sdl).expect("serialization should not fail"));
hasher.update([self.introspection as u8]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is introspection added here but not in the Hash implementation below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch: 43ccd9d

I am wondering though why we are using a manual implementation of Hash rather than using derive? There seems to be nothing special about this structure that would warrant a custom impl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to dig it up, but I think there was something in the key that could not implement Hash in the past

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we figure that out, is there somewhere we could add a comment to make that more intuitive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need a custom implementation now let's remove it and use derive. That way we don't have to worry about it in the future. If we do eventually need a custom version would need to comment and possibly fuzz test.

Co-authored-by: Jesse Rosenberger <git@jro.cc>
@Geal
Copy link
Contributor Author

Geal commented May 13, 2024

thank you all for the feedback, I will make another pass today or tomorrow to fix up the small issues. @BrynCooke do you want to see the hash prefxies in this PR or in a follow up? I plan to make another PR right after to fix other things, like removing the JSON usage when possible

@BrynCooke
Copy link
Contributor

@Geal Happy for the key change to be a follow up.

@Geal
Copy link
Contributor Author

Geal commented May 13, 2024

@abernix can we merge this?

@abernix
Copy link
Member

abernix commented May 13, 2024

@Geal Does this completely supersede @fotoetienne's #5094? If so, curious if @xuorig / @fotoetienne could take a look.

@abernix abernix dismissed their stale review May 13, 2024 12:50

My previous review is no longer relevant based on many new changes.

@Geal
Copy link
Contributor Author

Geal commented May 13, 2024

yes it supersedes that PR

"plan:{}:{}:{}:{}",
FEDERATION_VERSION, self.hash, operation, metadata,
"plan:{}:{}:{}:{}:{}",
CACHE_KEY_VERSION, FEDERATION_VERSION, self.hash, operation, metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR: Why are operation and metadata separate vs combined in the same hasher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was a way to see how many variants there can be in cache for a single query

@Geal Geal merged commit f24ae3a into dev May 14, 2024
13 of 14 checks passed
@Geal Geal deleted the geal/cache-key-qp-options branch May 14, 2024 07:43
@Geal
Copy link
Contributor Author

Geal commented May 14, 2024

follow up issue with the fixes we discussed: #5160

@Geal Geal mentioned this pull request May 27, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

distributed Query Plan cache may be poisoned by experimental query plan features
7 participants