Skip to content

Releases: apollographql/router

v1.47.0

21 May 14:58
v1.47.0
2bb67a0
Compare
Choose a tag to compare

馃殌 Features

Support telemetry selectors with errors (Issue #5027)

The router now supports telemetry selectors that take into account the occurrence of errors. This capability enables you to create metrics, events, or span attributes that contain error messages.

For example, you can create a counter for the number of timed-out requests for subgraphs:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        requests.timeout:
          value: unit
          type: counter
          unit: request
          description: "subgraph requests containing subgraph timeout"
          attributes:
            subgraph.name: true
          condition:
            eq:
              - "request timed out"
              - error: reason

The router also can now compute new attributes upon receiving a new event in a supergraph response. With this capability, you can fetch data directly from the supergraph response body:

telemetry:
  instrumentation:
    instruments:
      acme.request.on_graphql_error:
        value: event_unit
        type: counter
        unit: error
        description: my description
        condition:
          eq:
          - MY_ERROR_CODE
          - response_errors: "$.[0].extensions.code"
        attributes:
          response_errors:
            response_errors: "$.*"

By @bnjjj in #5022

Add support for status_code response to Rhai (Issue #5042)

The router now supports response.status_code on the Response interface in Rhai.

Examples using the response status code:

  • Converting a response status code to a string:
if response.status_code.to_string() == "200" {
    print(`ok`);
}
  • Converting a response status code to a number:
if parse_int(response.status_code.to_string()) == 200 {
    print(`ok`);
}

By @bnjjj in #5045

Add gt and lt operators for telemetry conditions (PR #5048)

The router supports greater than (gt) and less than (lt) operators for telemetry conditions. Similar to the eq operator, the configuration for both gt and lt takes two arguments as a list. The gt operator checks that the first argument is greater than the second, and the lt operator checks that the first argument is less than the second. Other conditions such as gte, lte, and range can be made from combinations of gt, lt, eq, and all.

By @tninesling in #5048

Expose busy timer APIs (PR #4989)

The router supports public APIs that native plugins can use to control when the router's busy timer is run.

The router's busy timer measures the time spent working on a request outside of waiting for external calls, like coprocessors and subgraph calls. It includes the time spent waiting for other concurrent requests to be handled (the wait time in the executor) to show the actual router overhead when handling requests.

The public methods are Context::enter_active_request and Context::busy_time. The result is reported in the apollo_router_processing_time metric

For details on using the APIs, see the documentation for enter_active_request.

By @Geal in #4989

馃悰 Fixes

Reduce JSON schema size and Router memory footprint (PR #5061)

As we add more features to the Router the size of the JSON schema for the router configuration file continutes to grow. In particular, adding conditionals to telemetry in v1.46.0 significantly increased this size of the schema. This has a noticeable impact on initial memory footprint, although it does not impact service of requests.

The JSON schema for the router configuration file has been optimized from approximately 100k lines down to just over 7k.

This reduces the startup time of the Router and a smaller schema is more friendly for code editors.

By @BrynCooke in #5061

Prevent query plan cache collision when planning options change (Issue #5093)

The router's hashing algorithm has been updated to prevent cache collisions when the router's configuration changes.

Important

If you have enabled Distributed query plan caching, this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

The router supports multiple options that affect the generated query plans, including:

  • defer_support
  • generate_query_fragments
  • experimental_reuse_query_fragments
  • experimental_type_conditioned_fetching
  • experimental_query_planner_mode

If distributed query plan caching is enabled, changing any of these options results in different query plans being generated and cached.

This could be problematic in the following scenarios:

  1. The router configuration changes and a query plan is loaded from cache which is incompatible with the new configuration.
  2. Routers with different configurations share the same cache, which causes them to cache and load incompatible query plans.

To prevent these from happening, the router now creates a hash for the entire query planner configuration and includes it in the cache key.

By @Geal in #5100

5xx internal server error responses returned as GraphQL structured errors (PR #5159)

Previously, the router returned internal server errors (5xx class) as plaintext to clients. Now in this release, the router returns these 5xx errors as structured GraphQL (for example, {"errors": [...]}).

Internal server errors are returned upon unexpected or unrecoverable disruptions to the GraphQL request lifecycle execution. When these occur, the underlying error messages are logged at an ERROR level to the router's logs.
By @BrynCooke in #5159

Custom telemetry events not created when logging is disabled (PR #5165)

The router has been fixed to not create custom telemetry events when the log level is set to off.

An example configuration with level set to off for a custom event:

telemetry:
  instrumentation:
    events:
      router:
        # Standard events
        request: info
        response: info
        error: info

        # Custom events
        my.disabled_request_event:
          message: "my event message"
          level: off # Disabled because we set the level to off
          on: request
          attributes:
            http.request.body.size: true

By @bnjjj in #5165

Ensure that batch entry contexts are correctly preserved (PR #5162)

Previously, the router didn't use contexts correctly when processing batches. A representative context was chosen (the first item in a batch of items) and used to provide context functionality for all the generated responses.

The router now correctly preserves request contexts and uses them during response creation.

By @garypen in #5162

Validate enum values in input variables (Issue #4633)

The router now validates enum values provided in JSON variables. Invalid enum values result in GRAPHQL_VALIDATION_FAILED errors.

By @Geal in #4753

Strip dashes from trace_id in CustomTraceIdPropagator (Issue #4892)

The router now strips dashes from trace IDs to ensure conformance with OpenTelemetry.

In OpenTelemetry, trace IDs are 128-bit values represented as hex strings without dashes, and they're based on W3C's trace ID format.

This has been applied within the router to trace_id in CustomTraceIdPropagator.

Note, if raw trace IDs from headers are represented by uuid4 and contain dashes, the dashes should be stripped so that the raw trace ID value can be parsed into a valid trace_id.

By @kindermax in #5071

v1.47.0-rc.0

16 May 08:53
Compare
Choose a tag to compare
v1.47.0-rc.0 Pre-release
Pre-release
1.47.0-rc.0

v1.46.0

07 May 12:45
63e9fa2
Compare
Choose a tag to compare

馃殌 Features

Entity cache preview: support queries with private scope (PR #4855)

This feature is part of the work on subgraph entity caching, currently in preview.

The router now supports caching responses marked with private scope. This caching currently works only on subgraph responses without any schema-level information.

For details about the caching behavior, see PR #4855

By @Geal in #4855

Add support of custom events defined by YAML for telemetry (Issue #4320)

Users can now configure telemetry events via YAML
to log that something has happened (e.g. a request had errors of a particular type) without reaching for Rhai or a custom plugin.

Events may be triggered on conditions and can include information in the request/response pipeline as attributes.

Here is an example of configuration:

telemetry:
  instrumentation:
    events:
      router:
        # Standard events
        request: info
        response: info
        error: info

        # Custom events
        my.event:
          message: "my event message"
          level: info
          on: request
          attributes:
            http.response.body.size: false
          # Only log when the x-log-request header is `log` 
          condition:
            eq:
              - "log"
              - request_header: "x-log-request"
          
      supergraph:
          # Custom event configuration for supergraph service ...
      subgraph:
          # Custom event configuration for subgraph service .

By @bnjjj in #4956

Ability to ignore auth prefixes in the JWT plugin

The router now supports a configuration to ignore header prefixes with the JWT plugin. Given that many application headers use the format of Authorization: <scheme> <token>, this option enables the router to process requests for specific schemes within the Authorization header while ignoring others.

For example, you can configure the router to process requests with Authorization: Bearer <token> defined while ignoring others such as Authorization: Basic <token>:

authentication:
  router:
    jwt:
      header_name: authorization
      header_value_prefix: "Bearer"
      ignore_mismatched_prefix: true

If the header prefix is an empty string, this option is ignored.

By @lleadbet in #4718

Support conditions on custom attributes for spans and a new selector for GraphQL errors (Issue #4336)

The router now supports conditionally adding attributes on a span and the new on_graphql_error selector that is set to true if the response body contains GraphQL errors.

An example configuration using condition in attributes and on_graphql_error:

telemetry:
  instrumentation:
    spans: 
      router: 
        attributes:    
          otel.status_description: 
            static: "there was an error"
            condition:
              any:
              - not:
                  eq:
                  - response_status: code
                  - 200
              - eq:
                - on_graphql_error
                - true

By @bnjjj in #4987

馃悰 Fixes

Federation v2.7.5 (PR #5064)

This brings in a query planner fix released in v2.7.5 of Apollo Federation. Notably, from its changelog:

  • Fix issue with missing fragment definitions due to generateQueryFragments. (#2993)

    An incorrect implementation detail in generateQueryFragments caused certain queries to be missing fragment definitions, causing the operation to be invalid and fail early in the request life-cycle (before execution). Specifically, subsequent fragment "candidates" with the same type condition and the same length of selections as a previous fragment weren't correctly added to the list of fragments. An example of an affected query is:

    query {
      t {
        ... on A {
          x
          y
        }
      }
      t2 {
        ... on A {
          y
          z
        }
      }
    }

    In this case, the second selection set would be converted to an inline fragment spread to subgraph fetches, but the fragment definition would be missing
    By @garypen in #5064

Use supergraph schema to extract authorization info (PR #5047)

The router now uses the supergraph schema to extract authorization info, as authorization information may not be available on the query planner's subgraph schemas. This reverts the authorization changes made in PR #4975.

By @tninesling in #5047

Filter fetches added to batch during batch creation (PR #5034)

Previously, the router didn't filter query hashes when creating batches. This could result in failed queries because the additional hashes could incorrectly make a query appear to be committed when it wasn't actually registered in a batch.

This release fixes this issue by filtering query hashes during batch creation.

By @garypen in #5034

Use subgraph.name attribute instead of apollo.subgraph.name (PR #5012)

In the router v1.45.0, subgraph name mapping didn't work correctly in the Datadog exporter.

The Datadog exporter does some explicit mapping of attributes and was using a value apollo.subgraph.name that the latest versions of the router don't use. The correct choice is subgraph.name.

This release updates the mapping to reflect the change and fixes subgraph name mapping for Datadog.

By @garypen in #5012

馃摎 Documentation

Document traffic shaping default configuration (PR #4953)

The documentation for configuring traffic shaping has been updated to clarify that it's enabled by default with preset values. This setting has been the default since PR #3330, which landed in v1.23.0.

By @bnjjj in #4953

馃И Experimental

Experimental type conditioned fetching (PR #4748)

This release introduces an experimental configuration to enable type-conditioned fetching.

Previously, when querying a field that was in a path of two or more unions, the query planner wasn't able to handle different selections and would aggressively collapse selections in fetches. This resulted in incorrect plans.

Enabling the experimental_type_conditioned_fetching option can fix this issue by configuring the query planner to fetch with type conditions.

experimental_type_conditioned_fetching: true # false by default

By @o0Ignition0o in #4748

v1.46.0-rc.3

06 May 16:20
Compare
Choose a tag to compare
v1.46.0-rc.3 Pre-release
Pre-release
1.46.0-rc.3

v1.46.0-rc.1

02 May 13:12
Compare
Choose a tag to compare
v1.46.0-rc.1 Pre-release
Pre-release
1.46.0-rc.1

v1.46.0-rc.0

01 May 08:45
Compare
Choose a tag to compare
v1.46.0-rc.0 Pre-release
Pre-release
1.46.0-rc.0

v1.45.1

26 Apr 23:09
ff9f666
Compare
Choose a tag to compare

馃悰 Fixes

Correct v1.44.0 regression in query plan cache (PR #5028)

Correct a critical regression that was introduced in v1.44.0 which could lead to execution of an incorrect query plan. This issue only affects Routers that use distributed query plan caching, enabled via the supergraph.query_planning.cache.redis.urls configuration property.

By @o0Ignition0o in #5028

Use entire schema when hashing an introspection query (Issue #5006)

Correct a different hashing bug which impacted introspection queries which was also introduced in v1.44.0. This other hashing bug failed to account for introspection queries, resulting in introspection results being misaligned to the current schema. This issue only affects Routers that use distributed query plan caching, enabled via the supergraph.query_planning.cache.redis.urls configuration property.

This release fixes the hashing mechanism by adding the schema string to hashed data if an introspection field is encountered. As a result, the entire schema is taken into account and the correct introspection result is returned.

By @Geal in #5007

Fix subgraph name mapping of Datadog exporter (PR #5012)

Previously in the router v1.45.0, subgraph name mapping didn't work correctly in the router's Datadog exporter. The exporter used the incorrect value apollo.subgraph.name for mapping attributes when it should have used the value subgraph.name. This issue has been fixed in this release.

By @garypen in #5012

v1.45.1-rc.1

26 Apr 20:22
Compare
Choose a tag to compare
v1.45.1-rc.1 Pre-release
Pre-release
1.45.1-rc.1

v1.45.1-rc.0

25 Apr 18:02
Compare
Choose a tag to compare
v1.45.1-rc.0 Pre-release
Pre-release
1.45.1-rc.0

v1.45.0

22 Apr 14:30
e569688
Compare
Choose a tag to compare

Caution

This version has a critical bug impacting users of distributed query plan caching. See the Fixes in v1.45.1 for details. We highly recommend using v1.45.1 (or newer) or v1.43.2 over v1.45.0.

馃殌 Features

Query validation process with Rust (PR #4551)

The router has been updated with a new Rust-based query validation process using apollo-compiler from the apollo-rs project. It replaces the Javascript implementation in the query planner. It improves query planner performance by moving the validation out of the query planner and into the router service, which frees up space in the query planner cache.

Because validation now happens earlier in the router service and not in the query planner, error paths in the query planner are no longer encountered. Some messages in error responses returned from invalid queries should now be more clear.

We've tested the new validation process by running it for months in production, concurrently with the JavaScript implementation, and have now completely transitioned to the Rust-based implementation.

By @Geal in #4551

Add support for SHA256 hashing in Rhai (Issue #4939)

The router supports a new sha256 module to create SHA256 hashes in Rhai scripts. The module supports the sha256::digest function.

An example script that uses the module:

fn supergraph_service(service){
    service.map_request(|request|{
        log_info("hello world");
        let sha = sha256::digest("hello world");
        log_info(sha);
    });
}

By @lleadbet in #4940

Subgraph support for query batching (Issue #2002)

As an extension to the ongoing work to support client-side query batching in the router, the router now supports batching of subgraph requests. Each subgraph batch request retains the same external format as a client batch request. This optimization reduces the number of round-trip requests from the router to subgraphs.

Also, batching in the router is now a generally available feature: the experimental_batching router configuration option has been deprecated and is replaced by the batching option.

Previously, the router preserved the concept of a batch until a RouterRequest finished processing. From that point, the router converted each batch request item into a separate SupergraphRequest, and the router planned and executed those requests concurrently within the router, then reassembled them into a batch of RouterResponse to return to the client. Now with the implementation in this release, the concept of a batch is extended so that batches are issued to configured subgraphs (all or named). Each batch request item is planned and executed separately, but the queries issued to subgraphs are optimally assembled into batches which observe the query constraints of the various batch items.

To configure subgraph batching, you can enable batching.subgraph.all for all subgraphs. You can also enable batching per subgraph with batching.subgraph.subgraphs.*. For example:

batching:
  enabled: true
  mode: batch_http_link
  subgraph:
    # Enable batching on all subgraphs
    all:
      enabled: true
batching:
  enabled: true
  mode: batch_http_link
  subgraph:
    # Disable batching on all subgraphs
    all:
      enabled: false
    # Configure (override) batching support per subgraph
    subgraphs:
      subgraph_1:
        enabled: true
      subgraph_2:
        enabled: true

Note: all can be overridden by subgraphs. This applies in general for all router subgraph configuration options.

To learn more, see query batching in Apollo docs.

By @garypen in #4661

馃悰 Fixes

Update rustls to v0.21.11, the latest v0.21.x patch (PR #4993)

While the Router does use rustls, RUSTSEC-2024-0336 (also known as CVE-2024-32650 and GHSA-6g7w-8wpp-frhj) DOES NOT affect the Router since it uses tokio-rustls which is specifically called out in the advisory as unaffected.

Despite the lack of impact, we update rustls version v0.21.10 to rustls v0.21.11 which includes a patch.

By @tninesling in #4993

Performance improvements for Apollo usage report field generation (PR 4951)

The performance of generating Apollo usage report signatures, stats keys, and referenced fields has been improved.

By @bonnici in #4951

Apply alias rewrites to arrays (PR #4958)

The automatic aliasing rules introduced in #2489 to support @interfaceObject are now properly applied to lists.

By @o0Ignition0o in #4958

Fix compatibility of coprocessor metric creation (PR #4930)

Previously, the router's execution stage created coprocessor metrics differently than other stages. This produced metrics with slight incompatibilities.

This release fixes the issue by creating coprocessor metrics in the same way as all other stages.

By @Geal in #4930

馃摎 Documentation

Documentation updates for caching and metrics instruments (PR #4872)

Router documentation has been updated for a couple topics:

By @smyrick in #4872

馃И Experimental

Experimental: Introduce a pool of query planners (PR #4897)

The router supports a new experimental feature: a pool of query planners to parallelize query planning.

You can configure query planner pools with the supergraph.query_planning.experimental_parallelism option:

supergraph:
  query_planning:
    experimental_parallelism: auto # number of available CPUs

Its value is the number of query planners that run in parallel, and its default value is 1. You can set it to the special value auto to automatically set it equal to the number of available CPUs.

You can discuss and comment about query planner pools in this GitHub discussion.

By @xuorig and @o0Ignition0o in #4897

Experimental: Rust implementation of Apollo usage report field generation (PR 4796)

The router supports a new experimental Rust implementation for generating the stats report keys and referenced fields that are sent in Apollo usage reports. This implementation is one part of the effort to replace the router-bridge with native Rust code.

The feature is configured with the experimental_apollo_metrics_generation_mode setting. We recommend that you use its default value, so we can verify that it generates the same payloads as the previous implementation.

By @bonnici in #4796