[Feature Request] Improve detection of parent-child relationships #537

worldtiki · 2018-09-28T19:21:56Z

A service is only displayed in the service graph if a parent-child relationship is found.
This seems to be done in the SpanAccumulator.scala where it looks for 'cs', 'cr' and 'sr', 'ss' tags (to identify as client or server) and then tries to find span pairs based on if they share the same ids.

If a server service does not reporting tracing data (for eg: an external third party or a system that doesnt support distributed tracing) and we only have spans coming from the client service, this parent-child relationship is not established and the server service is not represented in the service graph.

Except from it's name, there is enough information to represent this service in the service graph as well it's stats (rate of calls, % of failures, etc).

I'm not really sure how the implementation would look like. One option would be to extend the idl.
Currently it only supports the name of the local service. Adding the name of the remote service as an optional field would allow us to display the correct name on the service graph (as well as in the traces view).

shreyaa-sharma · 2018-10-04T00:39:17Z

Thanks for sharing Daniel. The team will get back to you here post discussions.

ashishagg · 2018-10-04T09:18:33Z

There is some legacy code that uses tags to generate service graph, we hope to get rid of it very soon. However there exists a alternate and better logic working in parallel that computes the service graph using three IDs namely TraceId, SpanId and ParentSpanId and a serviceName.

How it works?
Say service 'A' calls 'B'. 'A' submits a client span with three ids - T1, S1, P1. 'B' receive these IDs esp T1, S1 over the http header. And then 'B' submits a server span with three ids - T1, S2, S1.

If you notice, B's span is a child of A's. And this parent-child relationship is used by haystack to generate an edge between the two service nodes. Note when B calls its downstream service (say C), then B's client span will be a child of B's server span. But since these two spans are coming from the service (B in this case), so we don't create a loopback edge.

Regarding your ask to show a third-party system that doesn't support tracing as a graph-node, we can think of using peer.service tag instead of extending IDL

Now if you look at the current implementation, for every span, we wait for few seconds(configurable) to look for its parent or child span. Now if we don't receive a pair, we ignore that span altogether. We can add a fallback to use 'peer.service' tag but that might conflict with a case where the span-pair came late and hence missed.

For such a scenario, we will show two edges, one for peer.service and other actual service, if they have different names. A calls peer.service as 'foo' but foo calls itself as 'bar'.

I need to evaluate if fallback strategy is ok, may be turn this on behind a feature flag?

worldtiki · 2018-10-09T09:11:43Z

Thanks for your time @ashishagg.
I'm happy to help test this feature if you go with the feature flag approach.

codefromthecrypt · 2018-10-10T01:53:49Z

This field is too important to be relegated to a tag, and even if it weren't, the opentracing peer convention is not widely used and not consistently either.

You should consider a field remoteServiceName to go next to your existing field serviceName. It is more consistent with your existing model and likely to result in less trouble.

Long history and context below

Bear in mind that the original concept of peer.service came literally from zipkin based conversions. For example, it was conventionally "sa" server address or "ca" client address in Zipkin. When Yuri decided to make the tag named "peer.service", Uber was still a Zipkin site and probably wanted something easier than the old binary annotations we had. They knew the limitations of looking up based on different names ("sa" vs "ca"), so one name ("peer.service") was a bit of an improvement back then. Note Yuri and Ben started and decided this in 2 days while a lot of folks were on holiday (27-29 Dec 2015). Notably, "service" was left out until several years later!

Meanwhile, Zipkin knew where bodies were buried... we all suffered them but wanted to always remain backwards compatible from a data pov. Fixing the root problem requires buy-in from more than two people though, at least that's been the case in Zipkin. We also need tests to pass. For example, we can't just define something and not use it in practice (such as in a service graph).

In early 2016, we started an effort to get rid of the confusion around the remote side and fix the model. This was called v2. Through that process, we noted some other things. Firstly "peer" is an awkward and bad name to use. It likely wouldn't have been used if others were involved in OpenTracing's PR besides just ben and Yuri. Anyway, "remote" was a far more accessible term. Next, indexing nested collections has performance problems, merging complexity, and the concept of remote is quite a primary concern.

So, A top-level remote service field is both more efficient and and easier to map apis. PS.. yes APIS! because this is too important to be relegated to a tag. In early 2017 we had the v2 format including this and Brave 4 which had an api for it as well. This made things much much simpler on users and the data side.
https://github.com/openzipkin/zipkin-api/blob/master/zipkin2-api.yaml#L412

Anyway.. instead of using tag naming convention which would only be present on a subset of instrumentation, I'd just add remoteServiceName field to your data instead.

You already have a serviceName field, and this makes the most sense from an intuition vs having local service as a field and relegating remote one to a tag naming convention! Plus you can always map "peer.service" in the rare case it is present just like you map into your span service field today. Meanwhile, many zipkin based instrumentation have top-level apis for service including remote service, and so you'd get a stronger mapping when that's the case as well.

So, would you consider just doing this with a model field? I'd be more likely to help with work involved if you agree.

codefromthecrypt · 2018-10-10T02:55:43Z

PS if it helps, here's our dependency linker code, which covers a bunch of edge cases as well. For example, people making circular links etc.

https://github.com/openzipkin/zipkin/blob/master/zipkin/src/main/java/zipkin2/internal/DependencyLinker.java
https://github.com/openzipkin/zipkin/blob/master/zipkin/src/test/java/zipkin2/internal/DependencyLinkerTest.java

Since haystack data model is somewhat like zipkin v1.5 ( :) ), the following mapper might also be of interest https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/mysql-v1/src/main/java/zipkin2/storage/mysql/v1/DependencyLinkV2SpanIterator.java

ashishagg · 2018-10-10T06:44:27Z

Good to read the history behind these 'tags' 👍

My concern in putting remoteServiceName in data model is about ownership and uniformity. I will illustrate this with an example of two services A calling B. The choice of remoteServiceName is entirely upto A, and B may not call itself as 'B' in its own span(serviceName) .

The two services can set whatever they desire which is correct. But from service-graph side, if it only depends on serviceName, semantics remain intact. However using both attributes (serviceName and remoteServiceName) with different naming may complicate things?

RemoteServiceName makes absolute sense if there is a uniformity across service-mesh for service names.

Your thoughts?

codefromthecrypt · 2018-10-10T07:10:23Z

My concern in putting remoteServiceName in data model is about ownership and uniformity. I will illustrate this with an example of two services A calling B. The choice of remoteServiceName is entirely upto A, and B may not call itself as 'B' in its own span(serviceName) .

We solve this simply by not allowing a caller to override the callee's name. In other words, you are authoritatively the decider of who "you" are. In this respect, the remoteServiceName only helps with uninstrumented incoming traffic (ex from Kinesis) or uninstrumented outbound (ex to MySQL). Make sense?

The two services can set whatever they desire which is correct. But from service-graph side, if it only depends on serviceName, semantics remain intact. However using both attributes (serviceName and remoteServiceName) with different naming may complicate things?

They serve different purposes. If I call MySQL that doesn't mean I am MySQL.

RemoteServiceName makes absolute sense if there is a uniformity across service-mesh for service names.

I don't see the relationship here. remoteServiceName is optional and when present offers a service name for a destination that *might not* be instrumented. If that is instrumented, then your linker ignores it. Another way is to propagate your service name downstream in a header. that can be an authoritative remote service name. I know Netflix spoke about this as it helps with streaming aggregation (ack some deduping still needed). Still when calling down stream you'd want to indicate something or else you'd not see any virtual destinations.

…

ashishagg · 2018-10-10T08:14:48Z

Makes sense.
If we think remoteServiceName as an optional attribute, then it might have been equivalent to a tag in a span, because this tag key is documented in OpenTracing spec.

But as you mentioned, there is a history behind its introduction in the spec. Anyhow, give us couple of days to think through it and we will create a work item for it.

codefromthecrypt · 2018-10-10T08:55:36Z

yep: the key here will be likely a decision of..

span.serviceName and span.tags[] where key = "peer.service"

nested query and only valid in opentracing, not zipkin, census, etc

vs

span.serviceName and span.remoteServiceName

matches your model and what you need

:)

mchandramouli · 2018-11-17T04:51:12Z

@ashishagg @ayansen I was looking at this again as I ran into a scenario to record a span for a AWS call. Re-reading the thread, I think we should add remoteServiceName as a first class field in data model and go with the overriding logic mentioned in the thread. Can we discuss again and make the change?

shreyaa-sharma · 2019-09-03T20:06:12Z

@abhinavtripathi7 and @ashishagg to look into and prioritize this issue.

jickyasu · 2019-10-11T05:44:01Z

Any idea when we would be able get this request implemented? We are seeing some holes in our service graphs as a result of this.

ashishagg · 2019-10-12T05:48:44Z

Did you get a chance to try our ServiceInsights (beta) feature? You need to install newer haystack UI version

mchandramouli mentioned this issue Dec 3, 2018

Remove service tag and replace with a similar concept on Scope. opentracing/specification#131

Open

shreyaa-sharma added the enhancement label Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Improve detection of parent-child relationships #537

[Feature Request] Improve detection of parent-child relationships #537

worldtiki commented Sep 28, 2018

shreyaa-sharma commented Oct 4, 2018

ashishagg commented Oct 4, 2018

worldtiki commented Oct 9, 2018

codefromthecrypt commented Oct 10, 2018 •

edited

codefromthecrypt commented Oct 10, 2018

ashishagg commented Oct 10, 2018

codefromthecrypt commented Oct 10, 2018 via email

ashishagg commented Oct 10, 2018

codefromthecrypt commented Oct 10, 2018 •

edited

mchandramouli commented Nov 17, 2018

shreyaa-sharma commented Sep 3, 2019

jickyasu commented Oct 11, 2019 •

edited

ashishagg commented Oct 12, 2019

[Feature Request] Improve detection of parent-child relationships #537

[Feature Request] Improve detection of parent-child relationships #537

Comments

worldtiki commented Sep 28, 2018

shreyaa-sharma commented Oct 4, 2018

ashishagg commented Oct 4, 2018

worldtiki commented Oct 9, 2018

codefromthecrypt commented Oct 10, 2018 • edited

Long history and context below

codefromthecrypt commented Oct 10, 2018

ashishagg commented Oct 10, 2018

codefromthecrypt commented Oct 10, 2018 via email

ashishagg commented Oct 10, 2018

codefromthecrypt commented Oct 10, 2018 • edited

mchandramouli commented Nov 17, 2018

shreyaa-sharma commented Sep 3, 2019

jickyasu commented Oct 11, 2019 • edited

ashishagg commented Oct 12, 2019

codefromthecrypt commented Oct 10, 2018 •

edited

codefromthecrypt commented Oct 10, 2018 •

edited

jickyasu commented Oct 11, 2019 •

edited