Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial cost metrics #4990

Merged
merged 55 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
fb2d0b6
wip
bnjjj Apr 3, 2024
ae546b8
add support of events
bnjjj Apr 5, 2024
62d400d
add tests for events
bnjjj Apr 11, 2024
28d93cd
fix test
bnjjj Apr 12, 2024
5bc0287
fix test and ordering of attributes and headers
bnjjj Apr 15, 2024
2e56947
fix lint
bnjjj Apr 15, 2024
4ac8833
Merge branch 'dev' of github.com:apollographql/router into bnjjj/feat…
bnjjj Apr 15, 2024
8ef1a15
delete useless commment
bnjjj Apr 15, 2024
9bba758
update metrics
bnjjj Apr 15, 2024
3235c57
wip
bnjjj Apr 15, 2024
c3151bf
wip
bnjjj Apr 15, 2024
3f73f8b
refactor event attributes
bnjjj Apr 15, 2024
91402d7
tests wip
bnjjj Apr 16, 2024
d3d7740
Merge branch 'dev' of github.com:apollographql/router into bnjjj/feat…
bnjjj Apr 16, 2024
81f7e87
fix custom event detection in otel
bnjjj Apr 16, 2024
6a82c3e
add tests
bnjjj Apr 17, 2024
c83c6d9
improve handling of object as an otel attribute
bnjjj Apr 17, 2024
b513b51
fix snapshot
bnjjj Apr 17, 2024
84d3655
Merge branch 'bnjjj/feat_4320' of github.com:apollographql/router int…
bnjjj Apr 18, 2024
7c41055
add more tests for events
bnjjj Apr 18, 2024
6d0c232
fix otel value test
bnjjj Apr 18, 2024
60ca42e
Merge branch 'dev' of github.com:apollographql/router into bnjjj/feat…
bnjjj Apr 18, 2024
080ef4c
fixes
bnjjj Apr 19, 2024
c9ce9bd
fix lint
bnjjj Apr 19, 2024
514092f
add conditions on custom attributes for spans + a new selector for gr…
bnjjj Apr 19, 2024
566cd51
add static field selector
bnjjj Apr 19, 2024
f27bbf1
update docs
bnjjj Apr 19, 2024
0f5d6bd
fix lint
bnjjj Apr 19, 2024
1ef4b2e
Merge dev
Apr 22, 2024
a7b3a56
Condition attribute schema
Apr 22, 2024
af2ad47
Move condition attribute to separate module
Apr 22, 2024
5f43729
Rename `ConditionAttribute` to `Conditional` to match `Extendable`
Apr 22, 2024
340d779
Add tests, add deserializer.
Apr 22, 2024
5978025
Fix test. The issue is that you may have a selector on request that h…
Apr 22, 2024
fb8bc4c
Improve condition logic to avoid computation unless absolutely necess…
Apr 22, 2024
875036b
Add tests
Apr 22, 2024
da40912
Schema update
Apr 22, 2024
8e99b39
Add more tests
Apr 22, 2024
58f9b17
Allow non-object selectors
Apr 22, 2024
4895316
Remove commented out code and add comment
Apr 22, 2024
5c87015
Add failing test: `test_extendable_serde_conditional`
Apr 23, 2024
1bcfcc2
Fix deserialization, modify test to exercise multiple fields on attri…
Apr 23, 2024
ac1e733
Improve errors
Apr 19, 2024
2808f8a
Feed metrics through to selectors.
Apr 23, 2024
e9fff4a
Rename `CostAttributes` to `CostInstruments`.
Apr 23, 2024
03e5b39
Metrics now hooked up, but needs tests
Apr 23, 2024
5527727
Add missing events
Apr 23, 2024
9517515
Update metric names
Apr 24, 2024
751b231
Take in conditional fix
Apr 24, 2024
f5ac2c3
Merge dev
Apr 25, 2024
c9cb6a4
Add tests. Lint fixes
Apr 25, 2024
43326a1
Snapshot updates
Apr 25, 2024
e78fb5b
Merge branch 'dev' into bryn/demand-control-metrics
BrynCooke Apr 29, 2024
a28b9ce
Rename `CostResult` to `CostContext`
Apr 29, 2024
c9ca854
Lock only once for context
Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions apollo-router/src/context/extensions/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,18 @@ impl Extensions {
.and_then(|boxed| (&mut **boxed as &mut (dyn Any + 'static)).downcast_mut())
}

/// Get a mutable reference to a type or insert and return the value if it does not exist
pub fn get_or_default_mut<T: Default + Send + Sync + 'static>(&mut self) -> &mut T {
let map = self.map.get_or_insert_with(Box::default);
let value = map
.entry(TypeId::of::<T>())
.or_insert_with(|| Box::<T>::default());
// It should be impossible for the entry to be the wrong type as we don't allow direct access to the map.
value
.downcast_mut()
.expect("default value should be inserted and we should be able to downcast it")
}

/// Returns `true` type has been stored in `Extensions`.
pub fn contains_key<T: Send + Sync + 'static>(&self) -> bool {
self.map
Expand Down
116 changes: 97 additions & 19 deletions apollo-router/src/plugins/demand_control/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ use tower::ServiceExt;
use crate::error::Error;
use crate::graphql;
use crate::graphql::IntoGraphQLErrors;
use crate::json_ext::Object;
use crate::layers::ServiceBuilderExt;
use crate::plugin::Plugin;
use crate::plugin::PluginInit;
Expand All @@ -35,6 +36,34 @@ use crate::services::subgraph;
pub(crate) mod cost_calculator;
pub(crate) mod strategy;

/// The results of cost calculations for use in telemetry
pub(crate) struct CostResult {
pub(crate) estimated: f64,
pub(crate) actual: f64,
pub(crate) result: &'static str,
}

impl Default for CostResult {
fn default() -> Self {
Self {
estimated: 0.0,
actual: 0.0,
result: "COST_OK",
}
}
}

impl CostResult {
pub(crate) fn delta(&self) -> f64 {
self.estimated - self.actual
}

pub(crate) fn result(&mut self, error: DemandControlError) -> DemandControlError {
self.result = error.code();
error
}
}

/// Algorithm for calculating the cost of an incoming query.
#[derive(Clone, Debug, Deserialize, JsonSchema)]
#[serde(deny_unknown_fields, rename_all = "snake_case")]
Expand Down Expand Up @@ -87,11 +116,21 @@ pub(crate) struct DemandControlConfig {

#[derive(Debug, Display, Error)]
pub(crate) enum DemandControlError {
/// Query estimated cost exceeded configured maximum
EstimatedCostTooExpensive,
/// Query actual cost exceeded configured maximum
/// query estimated cost {estimated_cost} exceeded configured maximum {max_cost}
EstimatedCostTooExpensive {
/// The estimated cost of the query
estimated_cost: f64,
/// The maximum cost of the query
max_cost: f64,
},
/// auery actual cost {actual_cost} exceeded configured maximum {max_cost}
#[allow(dead_code)]
ActualCostTooExpensive,
ActualCostTooExpensive {
/// The actual cost of the query
actual_cost: f64,
/// The maximum cost of the query
max_cost: f64,
},
/// Query could not be parsed: {0}
QueryParseFailure(String),
/// The response body could not be properly matched with its query's structure: {0}
Expand All @@ -101,26 +140,55 @@ pub(crate) enum DemandControlError {
impl IntoGraphQLErrors for DemandControlError {
fn into_graphql_errors(self) -> Result<Vec<Error>, Self> {
match self {
DemandControlError::EstimatedCostTooExpensive => Ok(vec![graphql::Error::builder()
.extension_code("COST_ESTIMATED_TOO_EXPENSIVE")
.message(self.to_string())
.build()]),
DemandControlError::ActualCostTooExpensive => Ok(vec![graphql::Error::builder()
.extension_code("COST_ACTUAL_TOO_EXPENSIVE")
.message(self.to_string())
.build()]),
DemandControlError::EstimatedCostTooExpensive {
estimated_cost,
max_cost,
} => {
let mut extensions = Object::new();
extensions.insert("cost.estimated", estimated_cost.into());
extensions.insert("cost.max", max_cost.into());
Ok(vec![graphql::Error::builder()
.extension_code(self.code())
.extensions(extensions)
.message(self.to_string())
.build()])
}
DemandControlError::ActualCostTooExpensive {
actual_cost,
max_cost,
} => {
let mut extensions = Object::new();
extensions.insert("cost.actual", actual_cost.into());
extensions.insert("cost.max", max_cost.into());
Ok(vec![graphql::Error::builder()
.extension_code(self.code())
.extensions(extensions)
.message(self.to_string())
.build()])
}
DemandControlError::QueryParseFailure(_) => Ok(vec![graphql::Error::builder()
.extension_code("COST_QUERY_PARSE_FAILURE")
.extension_code(self.code())
.message(self.to_string())
.build()]),
DemandControlError::ResponseTypingFailure(_) => Ok(vec![graphql::Error::builder()
.extension_code("COST_RESPONSE_TYPING_FAILURE")
.extension_code(self.code())
.message(self.to_string())
.build()]),
}
}
}

impl DemandControlError {
fn code(&self) -> &'static str {
match self {
DemandControlError::EstimatedCostTooExpensive { .. } => "COST_ESTIMATED_TOO_EXPENSIVE",
DemandControlError::ActualCostTooExpensive { .. } => "COST_ACTUAL_TOO_EXPENSIVE",
DemandControlError::QueryParseFailure(_) => "COST_QUERY_PARSE_FAILURE",
DemandControlError::ResponseTypingFailure(_) => "COST_RESPONSE_TYPING_FAILURE",
}
}
}

impl<T> From<WithErrors<T>> for DemandControlError {
fn from(value: WithErrors<T>) -> Self {
DemandControlError::QueryParseFailure(format!("{}", value))
Expand Down Expand Up @@ -182,12 +250,13 @@ impl Plugin for DemandControl {
.get::<Strategy>()
.expect("must have strategy")
.clone();
let context = resp.context.clone();
resp.response = resp.response.map(move |resp| {
// Here we are going to abort the stream if the cost is too high
// First we map based on cost, then we use take while to abort the stream if an error is emitted.
// When we terminate the stream we still want to emit a graphql error, so the error response is emitted first before a termination error.
resp.flat_map(move |resp| {
match strategy.on_execution_response(req.as_ref(), &resp) {
match strategy.on_execution_response(&context, req.as_ref(), &resp) {
Ok(_) => Either::Left(stream::once(future::ready(Ok(resp)))),
Err(err) => Either::Right(stream::iter(vec![
// This is the error we are returning to the user
Expand Down Expand Up @@ -251,7 +320,10 @@ impl Plugin for DemandControl {
})
.map_future_with_request_data(
|req: &subgraph::Request| {
req.executable_document.clone().expect("must have document")
//TODO convert this to expect
req.executable_document.clone().unwrap_or_else(|| {
Arc::new(Valid::assume_valid(ExecutableDocument::new()))
})
},
|req: Arc<Valid<ExecutableDocument>>, fut| async move {
let resp: subgraph::Response = fut.await?;
Expand All @@ -270,7 +342,7 @@ impl Plugin for DemandControl {
.expect("must be able to convert to graphql error"),
)
.context(resp.context.clone())
.extensions(crate::json_ext::Object::new())
.extensions(Object::new())
.build(),
})
},
Expand Down Expand Up @@ -464,10 +536,16 @@ mod test {
fn from(value: &TestError) -> Self {
match value {
TestError::EstimatedCostTooExpensive => {
DemandControlError::EstimatedCostTooExpensive
DemandControlError::EstimatedCostTooExpensive {
max_cost: 1.0,
estimated_cost: 2.0,
}
}

TestError::ActualCostTooExpensive => DemandControlError::ActualCostTooExpensive,
TestError::ActualCostTooExpensive => DemandControlError::ActualCostTooExpensive {
actual_cost: 1.0,
max_cost: 2.0,
},
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ source: apollo-router/src/plugins/demand_control/mod.rs
expression: body
---
- errors:
- message: Query estimated cost exceeded configured maximum
- message: query estimated cost 2 exceeded configured maximum 1
extensions:
cost.estimated: 2
cost.max: 1
code: COST_ESTIMATED_TOO_EXPENSIVE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ source: apollo-router/src/plugins/demand_control/mod.rs
expression: body
---
- errors:
- message: Query estimated cost exceeded configured maximum
- message: query estimated cost 2 exceeded configured maximum 1
extensions:
cost.estimated: 2
cost.max: 1
code: COST_ESTIMATED_TOO_EXPENSIVE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ expression: body
---
data: ~
errors:
- message: Query estimated cost exceeded configured maximum
- message: query estimated cost 2 exceeded configured maximum 1
extensions:
cost.estimated: 2
cost.max: 1
code: COST_ESTIMATED_TOO_EXPENSIVE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ expression: body
---
data: ~
errors:
- message: Query estimated cost exceeded configured maximum
- message: query estimated cost 2 exceeded configured maximum 1
extensions:
cost.estimated: 2
cost.max: 1
code: COST_ESTIMATED_TOO_EXPENSIVE
5 changes: 4 additions & 1 deletion apollo-router/src/plugins/demand_control/strategy/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ use crate::plugins::demand_control::Mode;
use crate::plugins::demand_control::StrategyConfig;
use crate::services::execution;
use crate::services::subgraph;
use crate::Context;

mod static_estimated;
#[cfg(test)]
Expand Down Expand Up @@ -59,10 +60,11 @@ impl Strategy {
}
pub(crate) fn on_execution_response(
&self,
context: &Context,
request: &ExecutableDocument,
response: &graphql::Response,
) -> Result<(), DemandControlError> {
match self.inner.on_execution_response(request, response) {
match self.inner.on_execution_response(context, request, response) {
Err(e) if self.mode == Mode::Enforce => Err(e),
_ => Ok(()),
}
Expand Down Expand Up @@ -119,6 +121,7 @@ pub(crate) trait StrategyImpl: Send + Sync {
) -> Result<(), DemandControlError>;
fn on_execution_response(
&self,
context: &Context,
request: &ExecutableDocument,
response: &graphql::Response,
) -> Result<(), DemandControlError>;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use apollo_compiler::ExecutableDocument;
use crate::graphql;
use crate::plugins::demand_control::cost_calculator::static_cost::StaticCostCalculator;
use crate::plugins::demand_control::strategy::StrategyImpl;
use crate::plugins::demand_control::CostResult;
use crate::plugins::demand_control::DemandControlError;
use crate::services::execution;
use crate::services::subgraph;
Expand All @@ -19,8 +20,16 @@ impl StrategyImpl for StaticEstimated {
self.cost_calculator
.planned(&request.query_plan)
.and_then(|cost| {
let mut extensions = request.context.extensions().lock();
let cost_result = extensions.get_or_default_mut::<CostResult>();
cost_result.estimated = cost;
if cost > self.max {
Err(DemandControlError::EstimatedCostTooExpensive)
Err(
cost_result.result(DemandControlError::EstimatedCostTooExpensive {
estimated_cost: cost,
max_cost: self.max,
}),
)
} else {
Ok(())
}
Expand All @@ -41,12 +50,15 @@ impl StrategyImpl for StaticEstimated {

fn on_execution_response(
&self,
context: &crate::Context,
request: &ExecutableDocument,
response: &graphql::Response,
) -> Result<(), DemandControlError> {
if response.data.is_some() {
let _cost = self.cost_calculator.actual(request, response)?;
// Todo metrics
let cost = self.cost_calculator.actual(request, response)?;
let mut extensions = context.extensions().lock();
let cost_result = extensions.get_or_default_mut::<CostResult>();
cost_result.actual = cost;
}
Ok(())
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ impl StrategyImpl for Test {

fn on_execution_response(
&self,
_context: &crate::Context,
_request: &ExecutableDocument,
_response: &crate::graphql::Response,
) -> Result<(), DemandControlError> {
Expand Down
15 changes: 10 additions & 5 deletions apollo-router/src/plugins/telemetry/config_new/attributes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,13 @@ use opentelemetry_semantic_conventions::trace::URL_SCHEME;
use opentelemetry_semantic_conventions::trace::USER_AGENT_ORIGINAL;
use schemars::JsonSchema;
use serde::Deserialize;
#[cfg(test)]
use serde::Serialize;
use tower::BoxError;
use tracing::Span;

use crate::axum_factory::utils::ConnectionInfo;
use crate::context::OPERATION_KIND;
use crate::context::OPERATION_NAME;
use crate::plugins::telemetry::config_new::cost::SupergraphCostAttributes;
use crate::plugins::telemetry::config_new::trace_id;
use crate::plugins::telemetry::config_new::DatadogId;
use crate::plugins::telemetry::config_new::DefaultForLevel;
Expand Down Expand Up @@ -114,7 +113,7 @@ impl DefaultForLevel for RouterAttributes {
}

#[derive(Deserialize, JsonSchema, Clone, Default, Debug)]
#[cfg_attr(test, derive(Serialize, PartialEq))]
#[cfg_attr(test, derive(PartialEq))]
#[serde(deny_unknown_fields, default)]
pub(crate) struct SupergraphAttributes {
/// The GraphQL document being executed.
Expand All @@ -137,6 +136,10 @@ pub(crate) struct SupergraphAttributes {
/// Requirement level: Recommended
#[serde(rename = "graphql.operation.type")]
pub(crate) graphql_operation_type: Option<bool>,

/// Cost attributes for the operation being executed
#[serde(flatten)]
pub(crate) cost: SupergraphCostAttributes,
}

impl DefaultForLevel for SupergraphAttributes {
Expand Down Expand Up @@ -890,8 +893,10 @@ impl Selectors for SupergraphAttributes {
attrs
}

fn on_response(&self, _response: &supergraph::Response) -> Vec<KeyValue> {
Vec::default()
fn on_response(&self, response: &supergraph::Response) -> Vec<KeyValue> {
let mut attrs = Vec::new();
attrs.append(&mut self.cost.on_response(response));
attrs
}

fn on_error(&self, _error: &BoxError) -> Vec<KeyValue> {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
telemetry:
instrumentation:
instruments:
supergraph:
cost.actual: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
telemetry:
instrumentation:
instruments:
supergraph:
cost.actual:
attributes:
cost.result: true