Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental support for Apollo tracing over OTLP #4982

Open
wants to merge 91 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
5cf5750
Add a new configuration for setting the tracing protocol for Apollo S…
timbotnik Apr 18, 2024
d1024dc
Modify some exports that we’ll need
timbotnik Apr 18, 2024
360f8fb
Add the OTLP path
timbotnik Apr 18, 2024
376f480
Fix up some loose ends, review notes
timbotnik Apr 18, 2024
f4973e4
fixes
bnjjj Apr 19, 2024
2d5c59b
Turn the ApollOtlpExporter into a SpanExporter, try direct ref instea…
timbotnik Apr 23, 2024
738e2b3
Use interior mutability with Arcs to improve lifetimes
timbotnik Apr 23, 2024
0728090
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik Apr 23, 2024
e35b241
Add shutdown handling
timbotnik Apr 24, 2024
2004a6e
Code cleanup
timbotnik Apr 24, 2024
0730668
Fix an issue where we’d be stealing the spans away from the Apollo ex…
timbotnik Apr 24, 2024
2ce4d1a
Prepare SpanData’s during collection, prevents making additional copi…
timbotnik Apr 24, 2024
9a5fa87
Introduce “OTLP” only option, refactor to support peek vs. pop on the…
timbotnik Apr 25, 2024
51c43bf
Run cargo fmt
timbotnik Apr 25, 2024
e7ac75d
Run xtask lint —fmt
timbotnik Apr 25, 2024
6b08eb8
Manual lint fixes
timbotnik Apr 25, 2024
c418e2a
Clippy is sometimes wrong
timbotnik Apr 25, 2024
02246e8
Stop filtering spans for now, can revisit this later.
timbotnik Apr 25, 2024
8837262
Turn off compression temporarily till we enable on the collector
timbotnik Apr 25, 2024
c487ffa
Move ROUTER_ID to a higher order crate
timbotnik Apr 26, 2024
cd3328f
Add attribute-level filtering support for OTel spans.
timbotnik May 1, 2024
9ae6b1c
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 17, 2024
b40e3f4
Review notes: use parking lot mutex
timbotnik May 17, 2024
c6730d7
Add “subscribe” span to the allow list.
timbotnik May 17, 2024
89a6abe
Review notes: use reference instead of clone in shutdown
timbotnik May 17, 2024
ae0b3c3
Review notes: use expect instead of unwrap
timbotnik May 17, 2024
08d3eb7
Lint
timbotnik May 17, 2024
2b838c9
cargo fmt
timbotnik May 17, 2024
73e31b7
Update the experimental config to use a percentage rollout style flag
timbotnik May 17, 2024
13d93a7
Use more consistent naming
timbotnik May 19, 2024
3b573b2
Update snapshot test
timbotnik May 19, 2024
d352ed4
Simplify some of our boolean logic; also, don’t send if there is not…
timbotnik May 20, 2024
59a1b3e
Add config option for OTLP tracing protocol (HTTP v. GRPC)
timbotnik May 20, 2024
ebc9b83
Add integration tests for Otel traces
timbotnik May 20, 2024
b6eb5b5
Updated with new snapshots
timbotnik May 20, 2024
588509c
Formatting
timbotnik May 20, 2024
a1bb06a
Formatting
timbotnik May 20, 2024
03b45e1
Exclude snapshots from gitleaks
timbotnik May 20, 2024
be3b24c
Attempt to redact some high entropy values in snapshots
timbotnik May 21, 2024
0b0656c
Formatting
timbotnik May 21, 2024
a9d41ae
Try awaiting the task abort to ensure that the address is unbound.
timbotnik May 21, 2024
68da5df
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 21, 2024
445e1a3
Workaround: redact all attribute values in OTel snapshots
timbotnik May 22, 2024
99c7bbc
Refactor: remove features only needed for “both” scenario
timbotnik May 22, 2024
654a637
Feature: redact errors if needed in OTel path; also track error count…
timbotnik May 22, 2024
12aaca6
Formatting
timbotnik May 22, 2024
03a7c85
Gitleaks: ignore commits
timbotnik May 22, 2024
2d83349
Formatting
timbotnik May 22, 2024
8395532
Another gitleaks attempt
timbotnik May 22, 2024
93e9b50
Clean up some TODOs
timbotnik May 23, 2024
450b521
Remove an unnecessary clone of attributes
timbotnik May 23, 2024
b7197f7
Track the original span status through the cache as well
timbotnik May 23, 2024
ce3a0ee
Add more span export translation logic for subscriptions and errors
timbotnik May 23, 2024
2a1ef25
fix fp secret detection?
peakematt May 23, 2024
52d29a7
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 23, 2024
0f552ea
Merge branch 'timbotnik/apollo-otlp/initial-support' of github.com:ap…
timbotnik May 23, 2024
86b22fa
Update snapshots with code
timbotnik May 23, 2024
2fd1ecd
Fix http headers for reports
timbotnik May 24, 2024
ee4b8ec
Move back to dynamic port for integration tests
timbotnik May 24, 2024
d6d9cb2
Implement synthetic spans for subscription events
timbotnik May 24, 2024
4949652
Include more span names
timbotnik May 24, 2024
46cf2d5
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 24, 2024
28f64eb
Merge branch 'dev' of github.com:apollographql/router into timbotnik/…
bnjjj May 24, 2024
08000bd
Refactor: now that we only send one or the other, we don’t need to “j…
timbotnik May 27, 2024
91780a8
Analytics: measure the # of traces sent via OTLP vs. Apollo reporting
timbotnik May 27, 2024
a006989
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 27, 2024
5a74d6d
Drop SUBSCRIPTION_EVENT spans for now.
timbotnik May 27, 2024
51a720f
Add a changeset
timbotnik May 27, 2024
1c4fb92
Merge branch 'timbotnik/apollo-otlp/initial-support' of github.com:ap…
bnjjj May 27, 2024
ba92780
fix snapshot redactions
bnjjj May 27, 2024
bad69e3
Mimic trace filtering of operations without a signature attribute.
timbotnik May 28, 2024
5f80df4
Merge branch 'timbotnik/apollo-otlp/initial-support' of github.com:ap…
timbotnik May 28, 2024
a130b5e
Integration tests: solidify redactions
timbotnik May 28, 2024
d2f4b05
Refactor: move duplicate tracing fixtures to a tracing_common module
timbotnik May 28, 2024
08bb07c
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 28, 2024
2027c95
TBD cleanups
timbotnik May 28, 2024
b2b66c2
fix gzip compression issue
bnjjj May 29, 2024
5d87f3f
add comment
bnjjj May 29, 2024
99bb0c0
Allow apollo.telemetry metrics to be sent
timbotnik May 30, 2024
81a7e75
Merge branch 'timbotnik/apollo-otlp/initial-support' of github.com:ap…
timbotnik May 30, 2024
6e15217
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 30, 2024
2b6739b
Rename operation subtype attribute to apollo_private.operation.subtype
timbotnik May 30, 2024
3cb48ec
Promote some span names to be exported from their respective usage sites
timbotnik May 30, 2024
86ec930
Discard subscription events for now.
timbotnik May 30, 2024
671fcca
Clean up unit tests
timbotnik May 30, 2024
9c6debb
Remove superfluous enum ApolloTracingProtocol now that we are using a…
timbotnik May 30, 2024
389b27e
Exclude experimental_otlp_tracing_protocol from json schema
timbotnik May 30, 2024
cc911db
Removing the last TBD since so far we haven’t seen a need for another…
timbotnik May 30, 2024
940c166
Merge remote-tracking branch 'origin/dev' into timbotnik/apollo-otlp/…
timbotnik May 30, 2024
e19ed23
Revert "Exclude experimental_otlp_tracing_protocol from json schema"
timbotnik May 31, 2024
71a03f6
Merge branch 'dev' into timbotnik/apollo-otlp/initial-support
bnjjj Jun 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions apollo-router/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ opentelemetry-otlp = { version = "0.13.0", default-features = false, features =
"http-proto",
"metrics",
"reqwest-client",
"trace"
] }
opentelemetry-semantic-conventions = "0.12.0"
opentelemetry-zipkin = { version = "0.18.0", default-features = false, features = [
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
source: apollo-router/src/configuration/tests.rs
assertion_line: 31
expression: "&schema"
---
{
Expand Down Expand Up @@ -3213,6 +3214,32 @@ expression: "&schema"
"default": "https://usage-reporting.api.apollographql.com/",
"type": "string"
},
"experimental_tracing_protocol": {
"description": "The protocol used for sending traces to Apollo Studio.",
"oneOf": [
{
"description": "Use only the Apollo usage reporting protobuf over http",
"type": "string",
"enum": [
"apollo"
]
},
{
"description": "Use only OTLP over GRPC",
"type": "string",
"enum": [
"otlp"
]
},
{
"description": "Use both the Apollo usage reporting protobuf AND OTLP (note this is a testing mode and not intended for use in production)",
"type": "string",
"enum": [
"apollo_and_otlp"
]
}
]
},
"field_level_instrumentation_sampler": {
"description": "Field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses.",
"anyOf": [
Expand Down
21 changes: 21 additions & 0 deletions apollo-router/src/plugins/telemetry/apollo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use std::collections::HashMap;
use std::fmt::Display;
use std::num::NonZeroUsize;
use std::ops::AddAssign;
use std::sync::OnceLock;
use std::time::SystemTime;

use http::header::HeaderName;
Expand All @@ -12,6 +13,7 @@ use serde::ser::SerializeMap;
use serde::Deserialize;
use serde::Serialize;
use url::Url;
use uuid::Uuid;

use super::metrics::apollo::studio::ContextualizedStats;
use super::metrics::apollo::studio::SingleStats;
Expand All @@ -34,6 +36,9 @@ pub(crate) const ENDPOINT_DEFAULT: &str =

pub(crate) const OTLP_ENDPOINT_DEFAULT: &str = "https://usage-reporting.api.apollographql.com";

// Random unique UUID for the Router. This doesn't actually identify the router, it just allows disambiguation between multiple routers with the same metadata.
pub(crate) static ROUTER_ID: OnceLock<Uuid> = OnceLock::new();

#[derive(Clone, Deserialize, JsonSchema, Debug)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct Config {
Expand Down Expand Up @@ -69,6 +74,9 @@ pub(crate) struct Config {
/// Field level instrumentation for subgraphs via ftv1. ftv1 tracing can cause performance issues as it is transmitted in band with subgraph responses.
pub(crate) field_level_instrumentation_sampler: SamplerOption,

/// The protocol used for sending traces to Apollo Studio.
pub(crate) experimental_tracing_protocol: ApolloTracingProtocol,

/// To configure which request header names and values are included in trace data that's sent to Apollo Studio.
pub(crate) send_headers: ForwardHeaders,
/// To configure which GraphQL variable values are included in trace data that's sent to Apollo Studio
Expand Down Expand Up @@ -174,6 +182,7 @@ impl Default for Config {
schema_id: "<no_schema_id>".to_string(),
buffer_size: default_buffer_size(),
field_level_instrumentation_sampler: default_field_level_instrumentation_sampler(),
experimental_tracing_protocol: ApolloTracingProtocol::Apollo,
send_headers: ForwardHeaders::None,
send_variable_values: ForwardValues::None,
batch_processor: BatchProcessorConfig::default(),
Expand All @@ -182,6 +191,18 @@ impl Default for Config {
}
}

#[derive(Copy, Clone, Debug, Deserialize, JsonSchema, PartialEq)]
#[serde(deny_unknown_fields, rename_all = "snake_case")]
pub(crate) enum ApolloTracingProtocol {
timbotnik marked this conversation as resolved.
Show resolved Hide resolved
/// Use only the Apollo usage reporting protobuf over http
Apollo,
/// Use only OTLP over GRPC
Otlp,
/// Use both the Apollo usage reporting protobuf AND OTLP
/// (note this is a testing mode and not intended for use in production)
ApolloAndOtlp,
timbotnik marked this conversation as resolved.
Show resolved Hide resolved
}

schemar_fn!(
forward_headers_only,
Vec<String>,
Expand Down
140 changes: 140 additions & 0 deletions apollo-router/src/plugins/telemetry/apollo_otlp_exporter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
use std::borrow::Cow;
use std::sync::Arc;
use std::sync::Mutex;

use derivative::Derivative;
use futures::future::BoxFuture;
use opentelemetry::sdk::export::trace::ExportResult;
use opentelemetry::sdk::export::trace::SpanData;
use opentelemetry::sdk::export::trace::SpanExporter;
use opentelemetry::sdk::trace::EvictedQueue;
use opentelemetry::sdk::Resource;
use opentelemetry::trace::SpanContext;
use opentelemetry::trace::Status;
use opentelemetry::trace::TraceFlags;
use opentelemetry::trace::TraceState;
use opentelemetry::InstrumentationLibrary;
use opentelemetry::KeyValue;
use opentelemetry_otlp::SpanExporterBuilder;
use opentelemetry_otlp::WithExportConfig;
use sys_info::hostname;
use tonic::metadata::MetadataMap;
use tonic::metadata::MetadataValue;
use tower::BoxError;
use url::Url;
use uuid::Uuid;

use super::tracing::apollo_telemetry::LightSpanData;
use crate::plugins::telemetry::apollo::ROUTER_ID;
use crate::plugins::telemetry::apollo_exporter::get_uname;
use crate::plugins::telemetry::tracing::BatchProcessorConfig;
use crate::plugins::telemetry::GLOBAL_TRACER_NAME;

/// The Apollo Otlp exporter is a thin wrapper around the OTLP SpanExporter.
#[derive(Clone, Derivative)]
#[derivative(Debug)]
pub(crate) struct ApolloOtlpExporter {
batch_config: BatchProcessorConfig,
endpoint: Url,
apollo_key: String,
resource_template: Resource,
intrumentation_library: InstrumentationLibrary,
#[derivative(Debug = "ignore")]
otlp_exporter: Arc<Mutex<opentelemetry_otlp::SpanExporter>>,
timbotnik marked this conversation as resolved.
Show resolved Hide resolved
}

impl ApolloOtlpExporter {
pub(crate) fn new(
endpoint: &Url,
batch_config: &BatchProcessorConfig,
apollo_key: &str,
apollo_graph_ref: &str,
schema_id: &str,
) -> Result<ApolloOtlpExporter, BoxError> {
tracing::debug!(endpoint = %endpoint, "creating Apollo OTLP traces exporter");

let mut metadata = MetadataMap::new();
metadata.insert("apollo.api.key", MetadataValue::try_from(apollo_key)?);

return Ok(Self {
endpoint: endpoint.clone(),
batch_config: batch_config.clone(),
apollo_key: apollo_key.to_string(),
resource_template: Resource::new([
KeyValue::new(
"apollo.router.id",
ROUTER_ID.get_or_init(Uuid::new_v4).to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is initialised in a couple of places now. Let's have a fn router_id() that contains ROUTER_ID.get_or_init(||Uuid::new_v4().to_string())

),
KeyValue::new("apollo.graph.ref", apollo_graph_ref.to_string()),
KeyValue::new("apollo.schema.id", schema_id.to_string()),
KeyValue::new(
"apollo.user.agent",
format!(
"{}@{}",
std::env!("CARGO_PKG_NAME"),
std::env!("CARGO_PKG_VERSION")
),
),
KeyValue::new("apollo.client.host", hostname()?),
KeyValue::new("apollo.client.uname", get_uname()?),
]),
intrumentation_library: InstrumentationLibrary::new(
GLOBAL_TRACER_NAME,
Some(format!(
"{}@{}",
std::env!("CARGO_PKG_NAME"),
std::env!("CARGO_PKG_VERSION")
)),
Option::<String>::None,
None,
),
otlp_exporter: Arc::new(Mutex::new(
SpanExporterBuilder::from(
opentelemetry_otlp::new_exporter()
.tonic()
.with_timeout(batch_config.max_export_timeout)
.with_endpoint(endpoint.to_string())
.with_metadata(metadata),
// TBD(tim): figure out why compression seems to be turned off on our collector
// .with_compression(opentelemetry_otlp::Compression::Gzip),
)
.build_span_exporter()?,
)),
// TBD(tim): do we need another batch processor for this?
// Seems like we've already set up a batcher earlier in the pipe but not quite sure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @BrynCooke you might be the best one to answer this

});
}

pub(crate) fn prepare_for_export(&self, span: &LightSpanData) -> SpanData {
SpanData {
span_context: SpanContext::new(
span.trace_id,
span.span_id,
TraceFlags::default().with_sampled(true),
true,
TraceState::default(),
),
parent_span_id: span.parent_span_id,
span_kind: span.span_kind.clone(),
name: span.name.clone(),
start_time: span.start_time,
end_time: span.end_time,
attributes: span.attributes.clone(),
events: EvictedQueue::new(0),
links: EvictedQueue::new(0),
status: Status::Unset,
resource: Cow::Owned(self.resource_template.to_owned()),
instrumentation_lib: self.intrumentation_library.clone(),
}
}

pub(crate) fn export(&self, spans: Vec<SpanData>) -> BoxFuture<'static, ExportResult> {
let mut exporter = self.otlp_exporter.lock().unwrap();
exporter.export(spans)
}

pub(crate) fn shutdown(&self) {
timbotnik marked this conversation as resolved.
Show resolved Hide resolved
let mut exporter = self.otlp_exporter.lock().unwrap();
exporter.shutdown()
}
}
5 changes: 1 addition & 4 deletions apollo-router/src/plugins/telemetry/metrics/apollo.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
//! Apollo metrics
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering;
use std::sync::OnceLock;
use std::time::Duration;

use opentelemetry::runtime;
Expand All @@ -17,6 +16,7 @@ use url::Url;
use uuid::Uuid;

use crate::plugins::telemetry::apollo::Config;
use crate::plugins::telemetry::apollo::ROUTER_ID;
use crate::plugins::telemetry::apollo_exporter::get_uname;
use crate::plugins::telemetry::apollo_exporter::ApolloExporter;
use crate::plugins::telemetry::config::MetricsCommon;
Expand All @@ -35,9 +35,6 @@ fn default_buckets() -> Vec<f64> {
]
}

// Random unique UUID for the Router. This doesn't actually identify the router, it just allows disambiguation between multiple routers with the same metadata.
static ROUTER_ID: OnceLock<Uuid> = OnceLock::new();

impl MetricsConfigurator for Config {
fn enabled(&self) -> bool {
self.apollo_key.is_some() && self.apollo_graph_ref.is_some()
Expand Down
1 change: 1 addition & 0 deletions apollo-router/src/plugins/telemetry/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ use crate::ListenAddr;

pub(crate) mod apollo;
pub(crate) mod apollo_exporter;
pub(crate) mod apollo_otlp_exporter;
pub(crate) mod config;
pub(crate) mod config_new;
pub(crate) mod dynamic_attribute;
Expand Down
2 changes: 2 additions & 0 deletions apollo-router/src/plugins/telemetry/tracing/apollo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ impl TracingConfigurator for Config {
tracing::debug!("configuring Apollo tracing");
let exporter = apollo_telemetry::Exporter::builder()
.endpoint(&self.endpoint)
.otlp_endpoint(&self.experimental_otlp_endpoint)
.apollo_tracing_protocol(self.experimental_tracing_protocol)
.apollo_key(
self.apollo_key
.as_ref()
Expand Down