New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: experiment to support Apollo tracing over OTLP #4982
base: dev
Are you sure you want to change the base?
Conversation
@timbotnik, please consider creating a changeset entry in |
CI performance tests
|
apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs
Outdated
Show resolved
Hide resolved
// TBD(tim): maybe we should move this? | ||
// Also, maybe we can just use the actual SpanData instead of the light one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea of having this LightSpanData is to consume less memory in the LRU cache. it was a good improvement when introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah another approach would be grouping by trace and attributes to get rid of the duplications, which would be nice for the wire protocol as well. Probably something to visit in the next phase when we feel good about moving entirely to OTLP.
apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs
Outdated
Show resolved
Hide resolved
apollo-router/src/plugins/telemetry/tracing/apollo_telemetry.rs
Outdated
Show resolved
Hide resolved
Note that this is very much unvetted and doesn’t quite compile yet.
ea8c4a2
to
b0755b2
Compare
b0755b2
to
376f480
Compare
// Use only the Apollo usage reporting protobuf over http | ||
Apollo, | ||
// Use both the Apollo usage reporting protobuf AND Apollo OTLP | ||
ApolloAndOtlp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but there's a pretty common pattern here of using New/Legacy/Both as the enum names (though maybe we should use Otlp/Apollo/Both instead?) https://docs.google.com/document/d/1Zk6PJyCwgDIMTRBSt3UoCKPfefM25saKCDn_AtEHQnc/edit#heading=h.am5dyhwon4gw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool - happy to use a convention if there is one established 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could you change testing
by experimental
to be consistent with what we currently have
…porter Which with my current skills means we’ll have to clone all of the spans. Possibly there is a way to use shared span pointers here instead but I’ll leave that for an optimization pass.
…es of LightSpanData’s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing integration and unit tests and don't use unwrap, if you're 100% sure it's ok to unwrap then use expect()
instead with your message inside explaining why it's safe to do this
const REPORTS_INCLUDE_ATTRS: [Key; 17] = [ | ||
APOLLO_PRIVATE_REQUEST, | ||
APOLLO_PRIVATE_DURATION_NS_KEY, | ||
APOLLO_PRIVATE_SENT_TIME_OFFSET, | ||
APOLLO_PRIVATE_GRAPHQL_VARIABLES, | ||
APOLLO_PRIVATE_HTTP_REQUEST_HEADERS, | ||
APOLLO_PRIVATE_HTTP_RESPONSE_HEADERS, | ||
APOLLO_PRIVATE_OPERATION_SIGNATURE, | ||
APOLLO_PRIVATE_FTV1, | ||
PATH, | ||
SUBGRAPH_NAME, | ||
CLIENT_NAME_KEY, | ||
CLIENT_VERSION_KEY, | ||
DEPENDS, | ||
LABEL, | ||
CONDITION, | ||
OPERATION_NAME, | ||
OPERATION_TYPE, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this make sure to use an HashSet at the end because perfs are better when you're checking if it contains an element. Here is an example here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think you already did it. Just make sure it's done that way
// Use only the Apollo usage reporting protobuf over http | ||
Apollo, | ||
// Use both the Apollo usage reporting protobuf AND Apollo OTLP | ||
ApolloAndOtlp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could you change testing
by experimental
to be consistent with what we currently have
// TBD(tim): do we need another batch processor for this? | ||
// Seems like we've already set up a batcher earlier in the pipe but not quite sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @BrynCooke you might be the best one to answer this
resource_template: Resource, | ||
intrumentation_library: InstrumentationLibrary, | ||
#[derivative(Debug = "ignore")] | ||
otlp_exporter: Arc<Mutex<opentelemetry_otlp::SpanExporter>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use https://github.com/Amanieu/parking_lot mutex ? It's already in our deps and usually more perfs
exporter.export(spans) | ||
} | ||
|
||
pub(crate) fn shutdown(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a shutdown and we don't use the exporter itself after maybe we should take ownership ? Just a suggestion I don't know if it's valid
pub(crate) fn shutdown(&self) { | |
pub(crate) fn shutdown(self) { |
]; | ||
|
||
/// Additional attributes to include when sending to the OTLP protocol. | ||
const OTLP_EXT_INCLUDE_ATTRS: [Key; 6] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for hashset
// https://github.com/open-telemetry/opentelemetry-rust/blob/943bb7a03f9cd17a0b6b53c2eb12acf77764c122/opentelemetry-sdk/CHANGELOG.md?plain=1#L157-L159 | ||
let max_attr_len = std::cmp::min(attr_names.len(), value.attributes.len()); | ||
let mut new_attrs = EvictedHashMap::new(max_attr_len.try_into().unwrap(), max_attr_len); | ||
value.attributes.iter().for_each(|(key, value)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to avoid copies then use into_iter()
instead and then in the loop you'll be able to delete the clone
value.attributes.iter().for_each(|(key, value)| { | |
value.attributes.into_iter().for_each(|(key, value)| { |
@@ -229,6 +321,13 @@ impl Exporter { | |||
errors_configuration: errors_configuration.clone(), | |||
use_legacy_request_span: use_legacy_request_span.unwrap_or_default(), | |||
include_span_names: INCLUDE_SPANS.into(), | |||
include_attr_names: match apollo_tracing_protocol { | |||
// TBD(tim): consider filtering the current implementation for memory purposes to HashSet::from(REPORTS_INCLUDE_ATTRS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this comment
@@ -322,6 +421,63 @@ impl Exporter { | |||
Ok(results) | |||
} | |||
|
|||
fn init_spans_for_tree(&self, root_span: &LightSpanData) -> Vec<SpanData> { | |||
let exporter = self.otlp_exporter.as_ref().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use unwrap
|
||
fn shutdown(&mut self) { | ||
// currently only handled in the OTLP case | ||
if let Some(exporter) = self.otlp_exporter.clone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some(exporter) = self.otlp_exporter.clone() { | |
if let Some(exporter) = &self.otlp_exporter { |
or you can .take()
it if you don't need it anymore ?
Very early in the process here. Just trying to get some feedback on the approach and figure out why there is some complaints about a threading issue 👀
Fixes #issue_number
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩