Processor equivalence with 3.x to drop dependency based on a criteria #3102

mercer · 2023-05-29T16:51:28Z

Is your feature request related to a problem? Please describe.
I'd like to drop sql dependency spans that are duration is under than a certain threshold. In 2.x and dotnet is easy to do using a TelemetryProcessor or ITelemetryProcessor.

I have digested https://learn.microsoft.com/en-us/azure/azure-monitor/app/java-standalone-telemetry-processors and I don't see how this would work.

Describe the solution you would like
An example would be great. The documentation could also include more real-world examples.

Describe alternatives you have considered
I considered downgrading to 2.x, but we need 3.x. We only have this problem in the java stack, not in .net

Additional context
Nothing else I can think of.

heyams · 2023-05-30T18:39:16Z

@mercer can you try sampling overrides?

mercer · 2023-05-31T13:51:43Z

@heyams can you provide an example where SQL dependency get sampled if duration > 50 ms? So two parts for this problem

the duration attribute
the logic to match for sampling with a threshold, for example, value < 50

I'd appreciate an example here. (Already tried to get inspired from "make noisy dependency call example").

In the meanwhile will turn the self-diagnose to debug. However, I'd preffer not to reverse enginner this, and work from documentation, if possible.

mercer · 2023-05-31T13:54:16Z

So, the equivalent in 2.x would be something like

public class SqlDependencyFilterProcessor implements TelemetryProcessor {
    private final TelemetryProcessor next;
    private final SqlDependencyFilterOptions options;

    public SqlDependencyFilterProcessor(TelemetryProcessor next, SqlDependencyFilterOptions options) {
        this.next = next;
        this.options = options;
    }

    @Override
    public boolean process(com.microsoft.applicationinsights.telemetry.Telemetry telemetry) {
        if (options.isEnabled()
                && telemetry instanceof RemoteDependencyTelemetry
                && ((RemoteDependencyTelemetry) telemetry).getSuccess()
                && ((RemoteDependencyTelemetry) telemetry).getDuration().toMillis() <= options.getDurationThresholdMSecs()
                && "SQL".equalsIgnoreCase(((RemoteDependencyTelemetry) telemetry).getType())) {
            return false;
        }
        return next == null || next.process(telemetry);
    }
}

wired with

@Configuration
@EnableConfigurationProperties(SqlDependencyFilterOptions.class)
public class ApplicationInsightsConfiguration {
    
    @Bean
    public SqlDependencyFilterProcessor createSqlDependencyFilterProcessor(TelemetryProcessor next, SqlDependencyFilterOptions options) {
        return new SqlDependencyFilterProcessor(next, options);
    }

    @Bean
    public TelemetryProcessor telemetryProcessorChain(SqlDependencyFilterProcessor processor) {
        TelemetryProcessor baseProcessor = TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().getBaseTelemetryProcessor();
        TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().addLast(processor);
        TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().build();
        return baseProcessor;
    }
}

mercer · 2023-05-31T13:58:43Z

A bit more context:

Sometimes we have batch jobs. What we noticed is that the extra dependency calls adds about 150 $ in cost for each hour of batch. And that data is not particularly useful, unless these dependency calls have unexpected latency, or they fail. Sometimes these batches may take 5-24 hours.
Now, in dotnet, we already solved this problem with an equivalent approach (using an ITelemetryProcessor)
And, as we have already upgraded to 3.x in java, we want to fix this in the java stack 3.x as well.

heyams · 2023-05-31T17:09:22Z

@mercer i can come up with an example, but it will be helpful if you can share a sample app so that i can create a fix based on your app? My sql example's attributes will be different from yours. Or even better, let's have a quick call and I can show you how to locate the attributes and then apply sampling override? please email me at helen.yang@microsoft.com to further discuss.

heyams · 2023-06-01T18:14:09Z

@mercer can you try DCR?

You can apply filter rule on dependencies. It's via Log Analytics and the equivalent table is AppDependencie
Please try the following rule and let us know if that works for your scenario:

source
| where Type != "SQL" or DurationMs > 100

Currently, we do not any filtering mechanism for dependencies based on duration.
If data collection rule doesn't work for you, please get back to me so that my team will find an alternative solution.

mercer · 2023-06-06T06:58:43Z

@heyams thanks for your swift answer, I will try today your suggestion for data collection rules. I hope this solution solves the cost problem -- batches introduces anomalies in cost patterns with low value telemetry data, and this anomaly needs to be dealt with different sampling rules than "normal" traffic.

In the meanwhile, I had a few other questions regarding potential options, all the questions are in the context of 3.x java client.

Is there a way to add a field at runtime in 3.x for dependencies (or any other traces)? For my use case, I could add the fact that it is a bulk, and then in applicationinsights.json I would sample on the custom field.
Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.
Because applicationinsights-agent-3.4.13.jar includes the generic io.opentelemetry.javaagent code, is there a way to extend the code and override the behavior? I know you already answered there is no programatic filtering available, but I wondered if there is an option for us to build it ourselves, given the underlying library follows an open principle.

mercer · 2023-06-06T08:00:56Z

@heyams I did an evaluation for adding a rule, but I don't see how I can configure a rule to apply to data to be sent to an appinsights instance, as targeted by the connection string.

I'm prompted to provide a datasource, and I can't match any option to my expectation, that is, to have the rule apply to the appinsights instance.

For instance, I'd like to test the setup from a local instance of the app, connection to a custom appinsights instance, and see the rule in action.

heyams · 2023-06-07T00:27:35Z

@mercer there are 3 ways to create a DCR.
can you follow this tutorial?

Each App Insights Resource has a link to workspace, which is on the overview blade on the Azure Portal.

heyams · 2023-06-07T00:51:00Z

@heyams thanks for your swift answer, I will try today your suggestion for data collection rules. I hope this solution solves the cost problem -- batches introduces anomalies in cost patterns with low value telemetry data, and this anomaly needs to be dealt with different sampling rules than "normal" traffic.

In the meanwhile, I had a few other questions regarding potential options, all the questions are in the context of 3.x java client.

Is there a way to add a field at runtime in 3.x for dependencies (or any other traces)? For my use case, I could add the fact that it is a bulk, and then in applicationinsights.json I would sample on the custom field.

[heyams] you can try custom dimensions and then use sampling overrides to filter telemetry

Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.

[heyams] can you try something like this:

create an attribute key for diff mode of the app

Span.current().setAttribute("mode", "mode1");

Put the following in the applicationinsights.json:
more details on inherited attributes

{
  "inheritedAttributes": [
    {
      "key": "mode",
      "type": "string"
    }
  ]
}

Then each mode of the app will get tagged with "mode=mode1". "mode1" is the value was set in step 1.
4. then you can use sampling override to change sampling rate based on that attribute key-value pair? Please give it a try.**

Because applicationinsights-agent-3.4.13.jar includes the generic io.opentelemetry.javaagent code, is there a way to extend the code and override the behavior? I know you already answered there is no programatic filtering available, but I wondered if there is an option for us to build it ourselves, given the underlying library follows an open principle.

[heyams] please try out data collection rule, if that doesn't work, we can engage further discussion to find a solution that meet your needs. if you use a custom version of our agent, you will need to update it whenever we have a new release.

heyams · 2023-06-07T17:06:40Z

Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.

@mercer regarding this question, I've suggested inherited attributes above.
however, there is a better approach without requiring any code changes.

You can use custom dimensions

{
  "customDimensions": {
    "mytag": "appMode",
    "anothertag": "${ANOTHER_VALUE}"
  }
}

ANOTHER_VALUE is an env variable you set for your app. For each mode of your app, you can set to a different value.
then you can use sampling override to change sampling rate based on this configuration. Hope that helps.

microsoft-github-policy-service · 2023-06-14T17:23:08Z

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

mercer · 2023-06-15T17:20:26Z

@heyams sorry for not responding earlier.

We felt like we can't make this work in a straight-forward way, and downgrading to 2.x wasn't the right call, as we already had some things setup in the 3.x fashion.

The way to mitigate the cost was to do a simple SQL dependency sample of 50%

{
  "preview": {
    "sampling": {
      "overrides": [
        {
          "telemetryType": "dependency",
          "attributes": [
            {
              "key": "db.system",
              "value": "mssql",
              "matchType": "strict"
            }
          ],
          "percentage": 50
        }
      ]
    }
  }
}

I think the 3.x rewrite is missing functionality, especially on custom processors. The sampling overrides is inferior to 2.x TelemetryProcessor, or to dotnet's ITelemetryProcessor. Before you could apply any logic to sampling (or anything else really), while now there are only a few predefined scenarios supported. I hope that this system will not be ported as is to dotnet.

Also, I believe the documentation can be improved. For example what are the fields (attributes) that one can configure the sampling overrides over.

In any case, thanks for all the time you put into answering my questions @heyams, I hope this ticket may help improve the 3.x appinsights client for java!

heyams · 2023-06-15T18:14:38Z

@mercer does DCR work?
I will experiment something on the upstream side to see if I can come up with an alternative solution. In the meantime, please give DCR a try if you haven't tried yet. Thanks.

mercer · 2023-06-20T08:14:43Z

@heyams we did not invest more time into making DCR work either, because it seems too heavy for us.

We would need to provision these rules at subscription level, while this is just a service. So in order to have this in prod, we would need:

decide ownership over the rules
have a pipeline to provision the generic rules
document the process
test cross environments
train DRIs
and of course, make it work in the first place

mattmccleary · 2023-06-30T23:20:53Z

@mercer - Are you open to a 30-minute meeting to discuss why KQL Ingestion Tranforms is too heavyweight? We want to understand your scenario a bit better so we can improve. If so can you shoot me a quick email at mmcc@microsoft.com? I'll be back in the office 7/5, to respond and set up a call.

mercer · 2023-07-11T05:20:44Z

The scenario is the same as the initial description.

I'd like to drop sql dependency spans that are duration is under than a certain threshold. In 2.x and dotnet is easy to do using a TelemetryProcessor or ITelemetryProcessor.

Given that "drop sql dependency spans that are duration is under than a certain threshold" is already possible in the 2.x of java and in current dotnet appinisghts clients, then the need to add more infrastructure to solve a problem with 3.x is too heavyweight, even if it works.

I should be able to decide which spans leave my process in code.

I'm happy to discuss this requirement, but if the answer is add/configure infrastructure, the the process will remain heavyweight. Why shouldn't I be allowed to prevent 99% of telemetry traffic at source? I understand that there is an option to "fix" the problem further down the pipeline, in a generic way, for all data collected, and this may even be a way to prevent costs. However, this should be an option, not "the only way" to sample data.

I should be able to sample data at source based on any criteria -- again, this already works in 2.x java client and dotnet client, the capability is removed in 3.x java client due to rewrite to follow OpenTelemetry.

heyams · 2024-03-28T17:44:16Z

@mercer since 3.5 GA, we added support for the OpenTelemetry java extensions.

Now, you can use the extension to have your own span exporter and filter data based on any criteria.
Here is my sample on filtering out spans based on duration. Please let me know if you can give it a try.

Sorry for taking this long to unblock you.

mercer · 2024-03-29T12:13:48Z

Had a look at https://github.com/Azure-Samples/ApplicationInsights-Java-Samples/tree/main/opentelemetry-api/java-agent/TelemetryFilteredBaseOnRequestDuration but I can't seem to find where I would configure that requests under 5s should not be ingested.

mercer · 2024-03-29T12:14:34Z

Is there a way to configure this for dependencies as well? My initial issue was to sample database dependencies that are under a threshold, say 10ms.

heyams · 2024-03-29T18:34:15Z

Had a look at https://github.com/Azure-Samples/ApplicationInsights-Java-Samples/tree/main/opentelemetry-api/java-agent/TelemetryFilteredBaseOnRequestDuration but I can't seem to find where I would configure that requests under 5s should not be ingested.

it's under extensions folder DurationSpanExporter

please read the readme.

-Dotel.javaagent.extensions=../extensions/FilterSpanBasedOnDuration/target/FilterSpanBasedOnDuration-1.0-SNAPSHOT.jar

main logic is in the ../extensions/FilterSpanBasedOnDuration.

heyams · 2024-03-29T18:35:00Z

Is there a way to configure this for dependencies as well? My initial issue was to sample database dependencies that are under a threshold, say 10ms.

yes, same idea. it's creating your own span exporter. you can filter any span based on any criteria.

mercer · 2024-04-01T07:16:43Z

Ok, do you have an example how I would differentiate a dependency from a trace?

In other words, using the example #3102 (comment), how would one port the code from 2.x to 3.x for this particular use case?

mercer · 2024-04-01T07:16:46Z

As a side-note, I think you should poopularize how 3.x java agent works with blog posts, technical documentation and so on, for example I find no blog posts today for AutoConfigurationCustomizerProvider. From the outside, it gives me the impresion that no one uses java version 3.x.

microsoft-github-policy-service bot added the Needs: Triage 🔍 label May 29, 2023

heyams added Needs: Author Feedback and removed Needs: Triage 🔍 labels May 30, 2023

heyams self-assigned this May 30, 2023

microsoft-github-policy-service bot added Needs: Attention 👋 and removed Needs: Author Feedback labels May 31, 2023

heyams added the Needs: Author Feedback label May 31, 2023

microsoft-github-policy-service bot removed the Needs: Author Feedback label Jun 6, 2023

heyams added the Needs: Author Feedback label Jun 7, 2023

microsoft-github-policy-service bot added the Status: No Recent Activity label Jun 14, 2023

microsoft-github-policy-service bot removed Needs: Author Feedback Status: No Recent Activity labels Jun 15, 2023

heyams added Needs: Author Feedback and removed Needs: Attention 👋 labels Mar 28, 2024

microsoft-github-policy-service bot added Needs: Attention 👋 and removed Needs: Author Feedback labels Mar 29, 2024

heyams added Needs: Author Feedback and removed Needs: Attention 👋 labels Mar 29, 2024

microsoft-github-policy-service bot added Needs: Attention 👋 and removed Needs: Author Feedback labels Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processor equivalence with 3.x to drop dependency based on a criteria #3102

Processor equivalence with 3.x to drop dependency based on a criteria #3102

mercer commented May 29, 2023 •

edited

heyams commented May 30, 2023 •

edited

mercer commented May 31, 2023

mercer commented May 31, 2023

mercer commented May 31, 2023

heyams commented May 31, 2023

heyams commented Jun 1, 2023

mercer commented Jun 6, 2023

mercer commented Jun 6, 2023

heyams commented Jun 7, 2023 •

edited

heyams commented Jun 7, 2023 •

edited

heyams commented Jun 7, 2023

microsoft-github-policy-service bot commented Jun 14, 2023

mercer commented Jun 15, 2023

heyams commented Jun 15, 2023 •

edited

mercer commented Jun 20, 2023

mattmccleary commented Jun 30, 2023

mercer commented Jul 11, 2023

heyams commented Mar 28, 2024

mercer commented Mar 29, 2024

mercer commented Mar 29, 2024

heyams commented Mar 29, 2024

heyams commented Mar 29, 2024

mercer commented Apr 1, 2024

mercer commented Apr 1, 2024

Processor equivalence with 3.x to drop dependency based on a criteria #3102

Processor equivalence with 3.x to drop dependency based on a criteria #3102

Comments

mercer commented May 29, 2023 • edited

heyams commented May 30, 2023 • edited

mercer commented May 31, 2023

mercer commented May 31, 2023

mercer commented May 31, 2023

heyams commented May 31, 2023

heyams commented Jun 1, 2023

mercer commented Jun 6, 2023

mercer commented Jun 6, 2023

heyams commented Jun 7, 2023 • edited

heyams commented Jun 7, 2023 • edited

heyams commented Jun 7, 2023

microsoft-github-policy-service bot commented Jun 14, 2023

mercer commented Jun 15, 2023

heyams commented Jun 15, 2023 • edited

mercer commented Jun 20, 2023

mattmccleary commented Jun 30, 2023

mercer commented Jul 11, 2023

heyams commented Mar 28, 2024

mercer commented Mar 29, 2024

mercer commented Mar 29, 2024

heyams commented Mar 29, 2024

heyams commented Mar 29, 2024

mercer commented Apr 1, 2024

mercer commented Apr 1, 2024

mercer commented May 29, 2023 •

edited

heyams commented May 30, 2023 •

edited

heyams commented Jun 7, 2023 •

edited

heyams commented Jun 7, 2023 •

edited

heyams commented Jun 15, 2023 •

edited