Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove legacy validation #4551

Merged
merged 63 commits into from
Apr 11, 2024
Merged

Remove legacy validation #4551

merged 63 commits into from
Apr 11, 2024

Conversation

Geal
Copy link
Contributor

@Geal Geal commented Jan 26, 2024

Fix #4407
Fix #4409
This removes GraphQL validation from the query planner, to use the Rust version instead. Validation has now moved to the query analysis layer, which means we can remove a lot of code that was there to accommodate parsing and validating the query but not rejecting it outright, because we needed to compare the validation result with the planner's.
This will greatly reduce the load on the planner, which will now only be used for planning queries, not validating.
This new validation process has been running in production for months concurrently with the JavaScript version, allowing us to detect and fix any discrepancies in the new implementation. We now have enough confidence in the new Rust-based validation to entirely switch off the less performant, JavaScript validation.

Remaining issues:

  • usage reporting: since invalid queries are rejected in query analysis, inside the router service, then usage reporting for invalid queries is no longer reported, because the telemetry plugins handles usage reporting at the supergraph response level
  • persisted queries: apparently persisted queries use the query analysis transformation of supergraph requests, now one test fails on an invalid query, I have not yet looked at why
  • @defer directive: a test is failing because the directive is used in the query but not found by the schema. @lrlna what is apollo-compiler expecting here? Is @defer considered builtin, or does it need to be declared in the schema?

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Jan 26, 2024

CI performance tests

  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • large-request - Stress test with a 1 MB request payload
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • reload - Reload test over a long period of time at a constant rate of users
  • no-graphos - Basic stress test, no GraphOS.
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • xxlarge-request - Stress test with 100 MB request payload
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • const - Basic stress test that runs with a constant number of users
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode

@lrlna
Copy link
Member

lrlna commented Jan 29, 2024

@defer directive: a test is failing because the directive is used in the query but not found by the schema. @lrlna what is apollo-compiler expecting here? Is @defer considered builtin, or does it need to be declared in the schema?

@Geal we do not define it as a built-in in apollo-compiler. The supergraphs that support defer, will have a defer definition already defined. And it looked like a bunch of tests [1] [2] [3] in the router already had them defined as part of the schema. I assume this test also needs it defined.

@Geal
Copy link
Contributor Author

Geal commented Jan 29, 2024

alright, that's an easy fix then

@Geal Geal marked this pull request as ready for review January 30, 2024 09:14
- Using `Schema::parse_test()` to instantiate a supergraph + API schema
- Using the router types for the cost tests that use federated schemas
- Using the apollo-compiler types for tests that use single schemas
- Fixing some validation errors in the test queries
Comment on lines -132 to +140
operation.name.as_ref(),
root_type_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just a small mistake, the operation name isn't a type name, the type is always the root type (eg. for any query it's always Query, for any mutation it's always Mutation, or whatever the schema has specified)

@Geal Geal mentioned this pull request Apr 11, 2024
6 tasks
@Geal Geal enabled auto-merge (squash) April 11, 2024 07:23
Copy link
Contributor

@SimonSapin SimonSapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I still think we should use .to_json() over .to_string() for parse errors: #4551 (comment)

The latter formats errors for a CLI-like output with a monospace font

{
"request_id": "[REDACTED]",
"stats": {
"## GraphQLValidationFailure\n": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to replace this metric with something else?

@SimonSapin
Copy link
Contributor

Marked as approved to mean I don’t need to approve again after this change, but I do think the parse error thing should be changed.

Copy link
Member

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🤩 everything LGTM except a few places in tests where we should use Schema::parse_test

apollo-router/src/plugins/progressive_override/tests.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/progressive_override/tests.rs Outdated Show resolved Hide resolved
apollo-router/src/plugins/progressive_override/tests.rs Outdated Show resolved Hide resolved
apollo-router/src/services/supergraph/tests.rs Outdated Show resolved Hide resolved
@SimonSapin SimonSapin enabled auto-merge (squash) April 11, 2024 13:16
@SimonSapin SimonSapin merged commit a501e49 into dev Apr 11, 2024
13 checks passed
@SimonSapin SimonSapin deleted the geal/remove-legacy-validation branch April 11, 2024 13:30
bnjjj added a commit that referenced this pull request Apr 16, 2024
…on_mode (#4964)

Fixup of #4551 

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
@abernix abernix mentioned this pull request Apr 22, 2024
@yanns
Copy link
Contributor

yanns commented Apr 26, 2024

We are using a plugin defining a supergraph_service . Before this change, this method was called, we could do some request validation in it. After this change, it seems that this method is not called when the GraphQL query is invalid.
Can you confirm that this PR could change this behavior? Is it intended?

@abernix
Copy link
Member

abernix commented Apr 30, 2024

@yanns What was the request validation that you were doing at the supergraph service? Something GraphQL-specific?

@yanns
Copy link
Contributor

yanns commented Apr 30, 2024

@yanns What was the request validation that you were doing at the supergraph service? Something GraphQL-specific?

There are 2 use-cases:

  • (1) as we have a public GraphQL API, we need to translate some error code. So we need a plugin that translates the error codes emitted by the router to "our" error codes. If the router rejects invalid GraphQL queries without calling plugins, we cannot modify the response.
  • (2) we are using feature toggles. One GraphQL query does not use exactly the same GraphQL schema as another GraphQL query. So we need to validate requests depending on the GraphQL schema that is valid for this particular request. For this, I don't think that this PR is an issue. We configure the router with a GraphQL schema that has all possible fields. After the router has accepted the query, we validate it again with another GraphQL that is a subset of the one used by the router.

So I guess our main issue is that our plugin cannot adapt the response in case of invalid GraphQL queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants