Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor equivalence with 3.x to drop dependency based on a criteria #3102

Open
mercer opened this issue May 29, 2023 · 24 comments
Open

Processor equivalence with 3.x to drop dependency based on a criteria #3102

mercer opened this issue May 29, 2023 · 24 comments
Assignees

Comments

@mercer
Copy link

mercer commented May 29, 2023

Is your feature request related to a problem? Please describe.
I'd like to drop sql dependency spans that are duration is under than a certain threshold. In 2.x and dotnet is easy to do using a TelemetryProcessor or ITelemetryProcessor.

I have digested https://learn.microsoft.com/en-us/azure/azure-monitor/app/java-standalone-telemetry-processors and I don't see how this would work.

Describe the solution you would like
An example would be great. The documentation could also include more real-world examples.

Describe alternatives you have considered
I considered downgrading to 2.x, but we need 3.x. We only have this problem in the java stack, not in .net

Additional context
Nothing else I can think of.

@heyams
Copy link
Contributor

heyams commented May 30, 2023

@mercer can you try sampling overrides?

image

@mercer
Copy link
Author

mercer commented May 31, 2023

@heyams can you provide an example where SQL dependency get sampled if duration > 50 ms? So two parts for this problem

  1. the duration attribute
  2. the logic to match for sampling with a threshold, for example, value < 50

I'd appreciate an example here. (Already tried to get inspired from "make noisy dependency call example").

In the meanwhile will turn the self-diagnose to debug. However, I'd preffer not to reverse enginner this, and work from documentation, if possible.

@mercer
Copy link
Author

mercer commented May 31, 2023

So, the equivalent in 2.x would be something like

public class SqlDependencyFilterProcessor implements TelemetryProcessor {
    private final TelemetryProcessor next;
    private final SqlDependencyFilterOptions options;

    public SqlDependencyFilterProcessor(TelemetryProcessor next, SqlDependencyFilterOptions options) {
        this.next = next;
        this.options = options;
    }

    @Override
    public boolean process(com.microsoft.applicationinsights.telemetry.Telemetry telemetry) {
        if (options.isEnabled()
                && telemetry instanceof RemoteDependencyTelemetry
                && ((RemoteDependencyTelemetry) telemetry).getSuccess()
                && ((RemoteDependencyTelemetry) telemetry).getDuration().toMillis() <= options.getDurationThresholdMSecs()
                && "SQL".equalsIgnoreCase(((RemoteDependencyTelemetry) telemetry).getType())) {
            return false;
        }
        return next == null || next.process(telemetry);
    }
}

wired with

@Configuration
@EnableConfigurationProperties(SqlDependencyFilterOptions.class)
public class ApplicationInsightsConfiguration {
    
    @Bean
    public SqlDependencyFilterProcessor createSqlDependencyFilterProcessor(TelemetryProcessor next, SqlDependencyFilterOptions options) {
        return new SqlDependencyFilterProcessor(next, options);
    }

    @Bean
    public TelemetryProcessor telemetryProcessorChain(SqlDependencyFilterProcessor processor) {
        TelemetryProcessor baseProcessor = TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().getBaseTelemetryProcessor();
        TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().addLast(processor);
        TelemetryConfiguration.getActive().getTelemetryProcessorChainBuilder().build();
        return baseProcessor;
    }
}

@mercer
Copy link
Author

mercer commented May 31, 2023

A bit more context:

  1. Sometimes we have batch jobs. What we noticed is that the extra dependency calls adds about 150 $ in cost for each hour of batch. And that data is not particularly useful, unless these dependency calls have unexpected latency, or they fail. Sometimes these batches may take 5-24 hours.
  2. Now, in dotnet, we already solved this problem with an equivalent approach (using an ITelemetryProcessor)
  3. And, as we have already upgraded to 3.x in java, we want to fix this in the java stack 3.x as well.

@heyams
Copy link
Contributor

heyams commented May 31, 2023

@mercer i can come up with an example, but it will be helpful if you can share a sample app so that i can create a fix based on your app? My sql example's attributes will be different from yours. Or even better, let's have a quick call and I can show you how to locate the attributes and then apply sampling override? please email me at helen.yang@microsoft.com to further discuss.

@heyams
Copy link
Contributor

heyams commented Jun 1, 2023

@mercer can you try DCR?

You can apply filter rule on dependencies. It's via Log Analytics and the equivalent table is AppDependencie
Please try the following rule and let us know if that works for your scenario:

source
| where Type != "SQL" or DurationMs > 100

Currently, we do not any filtering mechanism for dependencies based on duration.
If data collection rule doesn't work for you, please get back to me so that my team will find an alternative solution.

@mercer
Copy link
Author

mercer commented Jun 6, 2023

@heyams thanks for your swift answer, I will try today your suggestion for data collection rules. I hope this solution solves the cost problem -- batches introduces anomalies in cost patterns with low value telemetry data, and this anomaly needs to be dealt with different sampling rules than "normal" traffic.

In the meanwhile, I had a few other questions regarding potential options, all the questions are in the context of 3.x java client.

  1. Is there a way to add a field at runtime in 3.x for dependencies (or any other traces)? For my use case, I could add the fact that it is a bulk, and then in applicationinsights.json I would sample on the custom field.
  2. Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.
  3. Because applicationinsights-agent-3.4.13.jar includes the generic io.opentelemetry.javaagent code, is there a way to extend the code and override the behavior? I know you already answered there is no programatic filtering available, but I wondered if there is an option for us to build it ourselves, given the underlying library follows an open principle.

@mercer
Copy link
Author

mercer commented Jun 6, 2023

@heyams I did an evaluation for adding a rule, but I don't see how I can configure a rule to apply to data to be sent to an appinsights instance, as targeted by the connection string.

I'm prompted to provide a datasource, and I can't match any option to my expectation, that is, to have the rule apply to the appinsights instance.

For instance, I'd like to test the setup from a local instance of the app, connection to a custom appinsights instance, and see the rule in action.

image

@heyams
Copy link
Contributor

heyams commented Jun 7, 2023

@mercer there are 3 ways to create a DCR.
can you follow this tutorial?

Each App Insights Resource has a link to workspace, which is on the overview blade on the Azure Portal.

@heyams
Copy link
Contributor

heyams commented Jun 7, 2023

@heyams thanks for your swift answer, I will try today your suggestion for data collection rules. I hope this solution solves the cost problem -- batches introduces anomalies in cost patterns with low value telemetry data, and this anomaly needs to be dealt with different sampling rules than "normal" traffic.

In the meanwhile, I had a few other questions regarding potential options, all the questions are in the context of 3.x java client.

  1. Is there a way to add a field at runtime in 3.x for dependencies (or any other traces)? For my use case, I could add the fact that it is a bulk, and then in applicationinsights.json I would sample on the custom field.

[heyams] you can try custom dimensions and then use sampling overrides to filter telemetry

  1. Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.

[heyams] can you try something like this:

  1. create an attribute key for diff mode of the app
Span.current().setAttribute("mode", "mode1");
  1. Put the following in the applicationinsights.json:
    more details on inherited attributes
{
  "inheritedAttributes": [
    {
      "key": "mode",
      "type": "string"
    }
  ]
}

Then each mode of the app will get tagged with "mode=mode1". "mode1" is the value was set in step 1.
4. then you can use sampling override to change sampling rate based on that attribute key-value pair? Please give it a try.**

  1. Because applicationinsights-agent-3.4.13.jar includes the generic io.opentelemetry.javaagent code, is there a way to extend the code and override the behavior? I know you already answered there is no programatic filtering available, but I wondered if there is an option for us to build it ourselves, given the underlying library follows an open principle.

[heyams] please try out data collection rule, if that doesn't work, we can engage further discussion to find a solution that meet your needs. if you use a custom version of our agent, you will need to update it whenever we have a new release.

@heyams
Copy link
Contributor

heyams commented Jun 7, 2023

Is there a way to change general sampling value dynamically at runtime? I would use this to react dinamically on the mode of the app, either automatically, or with a technical feature flag. I'm thinking here of any option other than re-generating applicationinsights.json and redeploying the app.

@mercer regarding this question, I've suggested inherited attributes above.
however, there is a better approach without requiring any code changes.

You can use custom dimensions

{
  "customDimensions": {
    "mytag": "appMode",
    "anothertag": "${ANOTHER_VALUE}"
  }
}

ANOTHER_VALUE is an env variable you set for your app. For each mode of your app, you can set to a different value.
then you can use sampling override to change sampling rate based on this configuration. Hope that helps.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@mercer
Copy link
Author

mercer commented Jun 15, 2023

@heyams sorry for not responding earlier.

We felt like we can't make this work in a straight-forward way, and downgrading to 2.x wasn't the right call, as we already had some things setup in the 3.x fashion.

The way to mitigate the cost was to do a simple SQL dependency sample of 50%

{
  "preview": {
    "sampling": {
      "overrides": [
        {
          "telemetryType": "dependency",
          "attributes": [
            {
              "key": "db.system",
              "value": "mssql",
              "matchType": "strict"
            }
          ],
          "percentage": 50
        }
      ]
    }
  }
}

I think the 3.x rewrite is missing functionality, especially on custom processors. The sampling overrides is inferior to 2.x TelemetryProcessor, or to dotnet's ITelemetryProcessor. Before you could apply any logic to sampling (or anything else really), while now there are only a few predefined scenarios supported. I hope that this system will not be ported as is to dotnet.

Also, I believe the documentation can be improved. For example what are the fields (attributes) that one can configure the sampling overrides over.

In any case, thanks for all the time you put into answering my questions @heyams, I hope this ticket may help improve the 3.x appinsights client for java!

@heyams
Copy link
Contributor

heyams commented Jun 15, 2023

@mercer does DCR work?
I will experiment something on the upstream side to see if I can come up with an alternative solution. In the meantime, please give DCR a try if you haven't tried yet. Thanks.

@mercer
Copy link
Author

mercer commented Jun 20, 2023

@heyams we did not invest more time into making DCR work either, because it seems too heavy for us.

We would need to provision these rules at subscription level, while this is just a service. So in order to have this in prod, we would need:

  1. decide ownership over the rules
  2. have a pipeline to provision the generic rules
  3. document the process
  4. test cross environments
  5. train DRIs
  6. and of course, make it work in the first place

@mattmccleary
Copy link
Member

@mercer - Are you open to a 30-minute meeting to discuss why KQL Ingestion Tranforms is too heavyweight? We want to understand your scenario a bit better so we can improve. If so can you shoot me a quick email at mmcc@microsoft.com? I'll be back in the office 7/5, to respond and set up a call.

@mercer
Copy link
Author

mercer commented Jul 11, 2023

The scenario is the same as the initial description.

I'd like to drop sql dependency spans that are duration is under than a certain threshold. In 2.x and dotnet is easy to do using a TelemetryProcessor or ITelemetryProcessor.

Given that "drop sql dependency spans that are duration is under than a certain threshold" is already possible in the 2.x of java and in current dotnet appinisghts clients, then the need to add more infrastructure to solve a problem with 3.x is too heavyweight, even if it works.

I should be able to decide which spans leave my process in code.

I'm happy to discuss this requirement, but if the answer is add/configure infrastructure, the the process will remain heavyweight. Why shouldn't I be allowed to prevent 99% of telemetry traffic at source? I understand that there is an option to "fix" the problem further down the pipeline, in a generic way, for all data collected, and this may even be a way to prevent costs. However, this should be an option, not "the only way" to sample data.

I should be able to sample data at source based on any criteria -- again, this already works in 2.x java client and dotnet client, the capability is removed in 3.x java client due to rewrite to follow OpenTelemetry.

@heyams
Copy link
Contributor

heyams commented Mar 28, 2024

@mercer since 3.5 GA, we added support for the OpenTelemetry java extensions.

Now, you can use the extension to have your own span exporter and filter data based on any criteria.
Here is my sample on filtering out spans based on duration. Please let me know if you can give it a try.

Sorry for taking this long to unblock you.

@mercer
Copy link
Author

mercer commented Mar 29, 2024

Had a look at https://github.com/Azure-Samples/ApplicationInsights-Java-Samples/tree/main/opentelemetry-api/java-agent/TelemetryFilteredBaseOnRequestDuration but I can't seem to find where I would configure that requests under 5s should not be ingested.

@mercer
Copy link
Author

mercer commented Mar 29, 2024

Is there a way to configure this for dependencies as well? My initial issue was to sample database dependencies that are under a threshold, say 10ms.

@heyams
Copy link
Contributor

heyams commented Mar 29, 2024

Had a look at https://github.com/Azure-Samples/ApplicationInsights-Java-Samples/tree/main/opentelemetry-api/java-agent/TelemetryFilteredBaseOnRequestDuration but I can't seem to find where I would configure that requests under 5s should not be ingested.

it's under extensions folder DurationSpanExporter

please read the readme.

-Dotel.javaagent.extensions=../extensions/FilterSpanBasedOnDuration/target/FilterSpanBasedOnDuration-1.0-SNAPSHOT.jar

main logic is in the ../extensions/FilterSpanBasedOnDuration.

@heyams
Copy link
Contributor

heyams commented Mar 29, 2024

Is there a way to configure this for dependencies as well? My initial issue was to sample database dependencies that are under a threshold, say 10ms.

yes, same idea. it's creating your own span exporter. you can filter any span based on any criteria.

@mercer
Copy link
Author

mercer commented Apr 1, 2024

Ok, do you have an example how I would differentiate a dependency from a trace?

In other words, using the example #3102 (comment), how would one port the code from 2.x to 3.x for this particular use case?

@mercer
Copy link
Author

mercer commented Apr 1, 2024

As a side-note, I think you should poopularize how 3.x java agent works with blog posts, technical documentation and so on, for example I find no blog posts today for AutoConfigurationCustomizerProvider. From the outside, it gives me the impresion that no one uses java version 3.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants