Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: RUM Configuration #214

Open
LikeTheSalad opened this issue Jan 9, 2024 · 12 comments
Open

Proposal: RUM Configuration #214

LikeTheSalad opened this issue Jan 9, 2024 · 12 comments

Comments

@LikeTheSalad
Copy link
Contributor

The OTel Android library is essentially an OTel Java SDK wrapper that adds very useful features for Android apps on top. This provides huge value when it comes to adopting OpenTelemetry in Android apps as some of the features can be really difficult and/or time-consuming to implement manually, and some might even need specialized compilation tools that allow for bytecode instrumentation, which is accomplished very differently than doing the same for regular Java applications due to differences at runtime. So having Android-specific tools handy can provide an inestimable value when using OTel in an Android app, possibly comparable to creating an Android app with Android Jetpack tools vs without using those tools.

The problem

Providing several tools comes with an important challenge though, which is configuration. The challenge comes when we ask ourselves questions such as “What features are meant to be added out of the box?”, and “How should the ‘X’ feature behave by default?”, or “What aspects of the ‘X’ feature should be configurable?” And so on, and the reason why these are difficult questions is because all apps are different and can therefore have different needs, some of which might not be possible to foresee at the time of creating the feature.

There is a risk of making the library too opinionated if there aren’t enough config options available, but there’s also a risk of making it cumbersome to use if there’s a long list of flags and different params associated with several features, sometimes with several params associated to a single feature but only valid when the feature flag is enabled and so on, within the RUM config that would require making different combinations of flags and other params to make the overall agent work as expected.

The proposal

Because of the above, I’d like to propose dividing (and conquering) the overall functionality of the OTel Android RUM agent into 2 categories:

  • Initialization: Covers what’s needed to initialize the OpenTelemetry instance. This may include the app’s info (service.* params) as well as signal processors/exporters customizers, dynamic global attrs, resources, and clock configurations.
  • Instrumentation: Covers features that are not needed to initialize the OpenTelemetry instance. This includes auto-instrumentations though not necessarily limited to bytecode-instrumentation-related instrumentations, it could also cover things like crash reporting, ANR detection, and Activity/Fragment lifecycle instrumentations, which are arguably also auto-instrumentations, is just that they don’t rely on bytecode instrumentation to work.

With the intention of keeping both fully separated (in different artifacts) and allowing desired instrumentations to get installed during the initialization process. This will allow us to:

  • Keep the OtelRumConfig object focused on the initialization process only, preventing it from becoming a sort of cumbersome “god object” to deal with in the future, full of feature flags and a mixture of feature-specific params.
  • Not having to make opinionated decisions like “Which instrumentations are meant to be added out of the box?”.
  • Keeping the “core” lib lightweight and more straightforward to use for projects that need more specific customizations (as they wouldn’t get what to them would essentially be “dead code” packed into a single artifact that they would have to remember to disable before adding their desired features to avoid unexpected behaviors).
  • Users wouldn't need to know which instrumentations are enabled by default to check what they’d like to disable if needed, as nothing would be installed by default. We can add an example in the docs on how to set up the RUM agent with some “basic” or “minimum recommended” instrumentations for people who don’t have custom requirements and just want to quickly set the tool up and running with some basic functionality.

This approach of “installing” an instrumentation is already implemented in this repo, although, based on its javadoc, it seems like it was initially intended to be used as an internal feature. Nevertheless, the functionality is there already so I believe it shouldn’t take too many changes to do the separation of existing instrumentations into their own subprojects should we agree on going with this path.

I have seen similar approaches working well for widely used libs (in Android apps) such as Retrofit for example, where you can add a specific “serializer tool” of your choice into your project’s dependencies and set it as your “converter”. So I think this way of setting up tools shouldn’t be strange to Android devs.

There are more reasons why I think this "separation approach" is worth taking into consideration, but this is already a lot of text so, thanks for reaching the end 🙂 we can follow up in the comments if there are some questions and/or suggestions for this proposal.

@marandaneto
Copy link
Member

Keep the OtelRumConfig object focused on the initialization process only, preventing it from becoming a sort of cumbersome “god object” to deal with in the future, full of feature flags and a mixture of feature-specific params.

I like the idea of separating the setup config. and the instrumentation config.

Keeping the “core” lib lightweight and more straightforward to use for projects that need more specific customizations (as they wouldn’t get what to them would essentially be “dead code” packed into a single artifact that they would have to remember to disable before adding their desired features to avoid unexpected behaviors).

Having a lightweight lib is important, for that, features that are not used can just be removed via minifyEnabled (not sure how well it works though, the code cannot be coupled).

Every feature that depends on an external package, should also be another library (jar/aar) and manually installed (or automatically via bytecode manipulation if the target is available at compile time), this would reduce binary size as well.

Users wouldn't need to know which instrumentations are enabled by default to check what they’d like to disable if needed, as nothing would be installed by default. We can add an example in the docs on how to set up the RUM agent with some “basic” or “minimum recommended” instrumentations for people who don’t have custom requirements and just want to quickly set the tool up and running with some basic functionality.

I'm not so sure about that, usually the friction of finding that something exists via documentation is already a huge entry barrier, plus getting something installed and working is another one (people don't read docs - most of the time).

I prefer the experience that I install something and everything is working by default, this is the best DX, unless there's a risk such as PII, high-performance impact, etc.

I prefer the experience where people opt out of simple stuff that can be installed automatically and opt in for stuff that requires extra setup such as external deps, Gradle plugins, etc.

So we don't need to be opinionated on what should be enabled by default or not, but rather we can do this automatically or not.

@LikeTheSalad
Copy link
Contributor Author

Having a lightweight lib is important, for that, features that are not used can just be removed via minifyEnabled (not sure how well it works though, the code cannot be coupled).

Yeah that's the thing though, the unused code would still be referenced somewhere but behind some conditions that are only evaluated at runtime, so I don't think R8 can help much in this case.

I prefer the experience that I install something and everything is working by default, this is the best DX, unless there's a risk such as PII, high-performance impact, etc.

I understand this, though I think it applies better to some libs than others. So for example, if I use a lib that is only a crash reporter then I'd expect the crash reporting feature to work well out of the box (send the reports to some backend with at least the stacktrace of the crash without having to configure too many things) as you mentioned, though for the case of this OTel Android lib which is meant to provide tools to send any kind of telemetry data, sending data automatically by default means that we know what type of data is a must for every project and also how it is expected to be shaped, the latter might clash with the separation of instrumentation vs setup configs, unless we provide a default config for all the default features that we're sure is the one almost everyone will need.

So In other words, having default features means that we need to be opinionated on what those features are that all, or most, projects need, and then also being opinionated as to how those features should behave by default too, as well as providing and supporting mechanisms to disable them, which in the end might still require people to read some docs anyway, which brings me to:

(people don't read docs - most of the time)

I think this is true to some extent, if we're talking about javadocs then yeah I agree most people don't bother reading it, but there are some places, such as what's in the README and "quick start" type of documentations that I believe most people do read, and the people who don't read it tend to look at online tutorials or StackOverflow posts where usually it takes one person who read the README to sort of copy/paste or rephrase what it says and it should be enough to get people sorted.

So we don't need to be opinionated on what should be enabled by default or not, but rather we can do this automatically or not.

I'm not sure I'm following this sentence, could you elaborate a bit more on the "do this automatically or not" part?

@marandaneto
Copy link
Member

I'm not sure I'm following this sentence, could you elaborate a bit more on the "do this automatically or not" part?

For example, hooking up to an error signal handler, the activity lifecycle observers, and measuring slow and frozen frames, all of that is dependency-free and does not depend on external dependencies such 3rd party SDKs and plugins, so all is possible to be done automatically, I expect all those things to be enabled by default.

Measuring the performance of HTTP requests using the OkHttp lib depends on adding an interceptor, adding an external dependency (OkHttp itself), this cannot be done automatically, I mean it can be done via bytecode manipulation, but it requires adding a Gradle plugin, so it's not fully automatic, hence I don't expect this to be enabled by default, this is just an example though, I know the Android OTel ships OkHttp anyway.

@marandaneto
Copy link
Member

I think this is true to some extent, if we're talking about javadocs then yeah I agree most people don't bother reading it, but there are some places, such as what's in the README and "quick start" type of documentations that I believe most people do read, and the people who don't read it tend to look at online tutorials or StackOverflow posts where usually it takes one person who read the README to sort of copy/paste or rephrase what it says and it should be enough to get people sorted.

My experience with that is the opposite, the majority of people missed most of the features (not enabling them) just because they didn't know they existed, and they were right there in all forms of documentation, it's not everyone, but the majority.

@marandaneto
Copy link
Member

@LikeTheSalad I understand your points and I can relate, I had this discussion in the past, multiple times, and I was in favor of all opt-ins most of the time.

Time proved me wrong, People will go the extra mile to disable what they don't need because they see the data and they don't want it.

The opposite is a much bigger entry barrier, they will have to go the extra mile in configurations and installations to check if something is worthy, and they probably don't even start or give up in the very first lines of code, considering they find out the possible extra features, which was often not the case.

This is just my experience, there's no right or wrong, just the trade-off of considering things opt-in or opt-out.

@breedx-splk
Copy link
Contributor

Hey thanks for this discussion. Sorry it's taken me a little time to be able to share my thoughts.

There is a risk of making the library too opinionated if there aren’t enough config options available, but there’s also a risk of making it cumbersome to use if there’s a long list of flags [...]

I think we should strive to find a balance between ease of use and detailed technical configurability. I suggest, as a guiding principle (with respect to Rich Hickey) that we try to make the simple thing easy. That is, we should try and create tools such that:

Most users, with very little reading, should be able to get up and running quickly, with very few lines of code and achieve results that are not surprising.

  • most users
    • (one size doesn't fit all, but there's probably a Gaussian distribution!)
  • with very little reading (few docs)
    • (you shouldn't have to read multiple pages before getting started)
  • should be able to get up and running quickly
    • (if it's easy it should be fast to get started)
  • with very few lines of code
    • (shouldn't have to rearchitect your application to use this stuff)
  • and very few surprises in the outcomes
    • (the results should make sense to the notive user)

Historically, we did not want users to have to think much about otel initialization, because it can be quite complex. We hid all of the direct otel initialization from them and allowed hooks in a few small places to customize things. We assume that Android developers primarily want to deal with setting up opentelmetry-android and don't want to have to know about its dependencies (like the otel java sdk or api or exporters or all of that). Because of this, I am concerned that having two (or more!) artifacts will introduce more complexity and make the lib slightly harder to use.

Keep the OtelRumConfig object focused on the initialization process only, preventing it from becoming a sort of cumbersome “god object” to deal with in the future, full of feature flags and a mixture of feature-specific params.

I'm less worried about the config evolving into a god object. If we see it going in that direction, I think there are plenty of opportunities to carve off groups of functionality into smaller configs. I agree that config should only be set and used at startup time and should not be long lived. If that's the case right now, let's work on fixing it.

Not having to make opinionated decisions like “Which instrumentations are meant to be added out of the box?”.

I respectfully disagree. I think we should try and find the set of features that most users will want, and then make it easy for users that need extra customization to add/remove/alter features. There may not be a perfect set, but I think we can get it close enough.

nothing would be installed by default

I also respectfully disagree with this. I think a user who has gone out of their way to start using opentelemetry-android would expect some common core set of instrumentation/functionality to be included in the default configuration.

@marandaneto said

My experience with that is the opposite, the majority of people missed most of the features (not enabling them) just because they didn't know they existed, and they were right there in all forms of documentation, it's not everyone, but the majority.

And that is my experience as well. Users have an expectation (no matter how unreasonable 😅 ) that things should "just work". I think that the more we can try and make things "just work" the better.

I appreciate all this discussion! I'd like to better understand what we're actually trying to separate...because I think I've missed something. I see mention of separating "initialization" and "instrumentation" but also some mention of separating "config".

If the idea is to publish separate independent artifacts that contain small features that have some stand-alone merit, then let's have that discussion. I'm not opposed to it, I just want to grok what that looks like in practice. I'm reluctant to give a big menu of unrelated features to users and tell them to ala-carte start adding dependencies. It's too much work for users to pick and choose...and I haven't seen a current/modern/supported way of bundling an uber-aar (similar to how we bundle/repackage shade dependencies in other projects). If we had that, we could do it both -- provide the pieces and provide the pie.

@marandaneto
Copy link
Member

Btw it's possible to install dependencies automatically with gradle plugins.

For example, people add the OTel Gradle plugin, if okhttp is on the classpath, you could install the otel-okthttp package, this is just an example that adding dependencies can also be automatized via plugins.

So in case many small feature packages depend on 3rd party dependencies, this could be a solution.

One day all one needs to do is to install the OTel Gradle plugin and nothing more, besides host, apiKeys, etc.

@LikeTheSalad
Copy link
Contributor Author

Hey thanks for this discussion. Sorry it's taken me a little time to be able to share my thoughts.

No worries, I know it's a lot to ponder about and I'm glad we can discuss these things, even if it's slowly (though I'm looking forward to having a dedicated SIG to avoid the typing 😅)

Most users, with very little reading, should be able to get up and running quickly, with very few lines of code and achieve results that are not surprising.
...
Historically, we did not want users to have to think much about otel initialization, because it can be quite complex.

Definitely, it's also what we had in mind when creating the Elastic agent, setting up OTel Java takes a lot of code, so we also hid the whole OTel setup and added some config options where it made sense and I'm pretty sure it's enhanced the overall user experience, especially when they just want to give it a quick try. So I'm all on board with making things as easy as possible, though I know from experience that there's a fine line between "helping a lot" vs "helping too much", and usually the consequences are maintenance hell and some users complaining about either lib size being big, or unwelcomed automatic behavior which is not too obvious to them, which could be surprising, unless they read the docs... So I think the 'not-reading-docs' issue can cause problems in both cases.

Because of this, I am concerned that having two (or more!) artifacts will introduce more complexity and make the lib slightly harder to use.

I understand where the concern comes from, but I think this is unavoidable, in fact, we already have 2 artifacts (the core one and the okhttp instrumentation) and we have a PR to add one more soon and issues to work on others in the future. And I think the reason is that telemetry isn't a "defined feature" where there's a clear behavior to be expected but rather a tool with virtually unlimited use cases, though not all of them will apply to every project, which is why users need to be able to choose what they want to use and ignore what they don't. Having a catalog of tools to choose from might be annoying at first when you only want to get to know the tool with a quick sample app, but I think it's the most sensible way to make it work in the long run (just image if the OTel Java SDK decided to include all these features into a single artifact, it's just not happening 😅 )

I'm less worried about the config evolving into a god object. If we see it going in that direction.

The god object is no joke, it will for sure turn our lives, and our users', into hell on earth.

I think there are plenty of opportunities to carve off groups of functionality into smaller configs.

Each of those opportunities will become a breaking change for our users, so what's more likely to happen is that we'll be hesitant about touching anything about the god object to avoid affecting them while having to keep maintaining a bunch of config params that might not be needed anymore, so we'll have to have some new configs and some deprecated ones in the same place while still supporting the old ones.. (which is highly likely to happen when there are too many features managed from a single place, at some point one will have to change, rendering the old config params useless, yet still referenced in many projects and that's when hell starts).

I agree that making our users read the docs is going to be annoying for a lot of them, but I don't think anything will be more annoying than having to use an API that keeps changing over and over. I've actually seen a lot of backlash from Android devs when using tools created by Google where they kept deprecating and creating a lot of major versions with breaking changes, so if Google gets backlash for that, I don't think we'll be an exception. That's why I'd like to try and search for some form of API config that's scalable and organized in a way that there's a clear separation of concerns, and frankly having a lot of unrelated config params in a god object seems to be the exact opposite of that.

I think we should try and find the set of features that most users will want, and then make it easy for users that need extra customization to add/remove/alter features.
...
And that is my experience as well. Users have an expectation (no matter how unreasonable 😅 ) that things should "just work". I think that the more we can try and make things "just work" the better.

I understand the benefits of this approach, and to be honest I think it would be great if we could find the "golden set of features" that most people will want in their apps. Though I frankly just wish it wouldn't have to mean adding feature flags, and possibly feature-specific params too, to the initialization config, maybe there's a way we can have our cake and eat it too, probably by going with uber jars as you mentioned (which I know it's possible by using the Gradle shadow plugin), or probably with a dependency-aware Gradle plugin as mentioned by @marandaneto, I'd like to further discuss the options we might have available.

I'd like to better understand what we're actually trying to separate...because I think I've missed something. I see mention of separating "initialization" and "instrumentation" but also some mention of separating "config".

It's just the two ("initialization", which is essentially what would live in OtelRumConfig, and "instrumentation", which is every tool that automatically generates telemetry), the separation of the configs will come naturally once those are separated, since the "instrumentation" part, which will cover the automatic instrumentations, will require each one to have its own config (like it's already happening with the okhttp auto-instrumentation for example). That way, not only OtelRumConfig wouldn't need to be aware of any auto-instrumentation flags, but it also wouldn't care about the configs params that those might need either.

I just want to grok what that looks like in practice

I think a good example is the OTel Java SDK with its auto-instrumentations, or maybe not even the auto-instrumentations, but also the implementations provided within the same SDK repo but in separate artifacts, such as the exporters. I think another example of having to use separate artifacts to set up a tool (although in a much smaller scale) would be Retrofit, where you must add one extra dependency if you want to get automatic response deserialization, Retrofit is a pretty popular tool in Android development so I don't think this concept should be too difficult to grasp in general for Android devs. Also, you can see it in Glide, another popular Android tool, it has extensions to make it work with other tools for which you need to add extra dependencies into your project, and people still use it, and I think the reason is that, after they understand the concept of core + feature catalog, it's easy to know how to do the rest for every existing feature as well for any new ones (since the way to use them all will be the same and there wouldn't be any exceptions to that rule, such as "special features" already in the core lib that would need to be disabled).

I think we all agree that providing an easy-to-use tool is important and I think we will be able to accomplish that, is just that I'd like to avoid falling into the trap of mixing things up to achieve it because, most likely that approach will come to bite us later which is almost certain to happen every time the single responsibility principle is broken.

@breedx-splk
Copy link
Contributor

@LikeTheSalad Thanks for writing out such a thorough and thoughtful response. That definitely helps me to better see what you had in mind and where the priorities/concerns are.

Each of those opportunities will become a breaking change for our users, so what's more likely to happen is that we'll be hesitant about touching anything about the god object to avoid affecting them while having to keep maintaining a bunch of config params that might not be needed anymore

I also respect the desire to not break users when we make changes, but until we reach some kind of stability around 1.0 I think that's going to be challenging. 😁

I think it's fine for us to keep this issue around for some time, as part of the decision record, but also for interested folks to continue chiming in on...but I would like us to figure out a path forward. I'd love to hear ideas about this. As a first step, does it perhaps make sense to

  • rename the current instrumentation module to android-sdk (or similar)
  • introduce a new top-level module called instrumentation-bundle
    • move the code in the io.opentelemetry.android.instrumentation package from android-sdk to the new instrumentation-bundle module.
  • make a new top-level module called instrumentation
  • begin incrementally migrating per-package instrumentations from instrumentation-bundle over to instrumentation as their own module with published artifacts.
    • where applicable, migrate the config for each module as well
  • after the above is done, we can delete instrumentation-bundle or use it to build an uber-jar (shadow jar) of all instrumentations.

I'm sure there will be some rough spots, but does that help get us moving?

@LikeTheSalad
Copy link
Contributor Author

Thanks for coming up with this alternative 🙏 I love it! - I think the migration can also be used as an excuse to convert the code being moved to Kotlin 🙌

In terms of namings (which is always the hardest part) I would suggest just one addition, regarding this part:

make a new top-level module called instrumentation

Which I believe is a good opportunity to just rename the existing top-level module auto-instrumentation to instrumentation instead, and add all the new submodules there.

Generally speaking, I really like this approach for several reasons, for example, it could potentially avoid us these kinds of issues in the future, since we could move some "androidx" dependencies out of the "core", or "android-sdk" module, into their own module (such as the fragment one and the fragment instrumentation) where we could keep upgrading the androidx.fragment dependency to its latest version all the time without having to affect users who might not want to use it but do want to use the "android-sdk" lib (or even allowing them to set a fixed version of the instrumentation module while keeping upgrading the "android-sdk" dependency to its latest version until they're ready to bump their project's compileSdk version). Anyway, it might be a thing or not, but what I'm trying to say is that it would be possible after these changes, unlike the way it is now, and there could be other benefits as well so I'm really looking forward to implementing something like this.

@breedx-splk
Copy link
Contributor

Hey, so I wanted to see how bad or how much work this restructuring would be, so I spent some time today running ahead of this. You can check out my fork's branch that has this suggested structure here: https://github.com/breedx-splk/opentelemetry-android/tree/restructure_repo_modules. I encourage folks to clone it and play with it.

A few points of interest:

  • There were a few oddities that suggested circular dependencies, but I was able to mitigate those by creating the topmost common module. That module contains a few classes that are depended on by both the android-sdk module and the instrumentation modules.
  • The instrumentation/common-api module is only named that way so that it doesn't collide with the topmost common module, which causes this headache.
  • I treated the single annotation as instrumentation, suggesting that we may build more in the future (we've talked about it)
  • There could be a few unnecessary dependencies in the instrumentation libs
  • I tweaked how each library output aar is named. Please check my work and make sure I'm not insane. :)
  • The project builds and tests pass.
  • I had to change the interface of InitializationEvents.recordConfiguration() to take a map instead of config, to prevent improper dependency coupling. Just kicking the can down the road some more.

This structure is feeling surprisingly good to me.

Unless folks end up finding anything really concerning, I'm cool with moving forward with this approach and can probably start doing some piecemeal changes. As that branch stands now, it touches most files in the repo and would be madness to review. 🙃

@bidetofevil
Copy link
Contributor

Finally had a chance to take a look at the whole proposal. At a high level, I definitely agree with the "core + installable instrumentation modules" design. This is very much the direction of how I see the Embrace SDK going - this allows folks who want to leverage the Android Extension to select the feature set they want a la carte. If they only want crashes, they shouldn't have to bare the runtime burden of all the other features. I see the following advantages of such a design:

  • This keeps the core SDK as lightweight as possible, both in terms of install size and runtime burden
  • The scrutiny of adding new modules that pull in additional dependencies can reduced because they wouldn't automatically bloat the size of the core SDK they aren't installed out of the box
  • Managing minimum version of dependencies will be less tricky - if a module requires a new version of some dependency, it doesn't automatically move the entire SDK up a level
  • A modularized architecture should reduce the burden of contributions as you wouldn't have to know the lifecycle and workings of the entire thing to add or modify a single module.

That said, I do agree that the extension shouldn't be completely opinions free - there should be a reasonable set of default modules that are "enabled" out of the box so that the default, no-configuration experience provides a common set of features that the maintainers think unopinionated folks should have. Those that want to add or remove modules based on their needs, they should be able to do so, but for the portion of users who just want what we recommend, they should go with the default, which can be determined later (and does not have to be set in stone).

As the SDK goes down this path, there are probably some points to keep in mind:

  • What is the contract between the instrumentation module and the core module? Is there an additional API that will be exposed so the instrumentation module can interact the the core module, or will the just leverage the OTel Java API? If it's the former, dopes that mean additional versioning??
  • Are there rules about how instrumentation modules can build on top of or enhance each other, or should they be completed independent?
  • How do the instrumentation modules described differ from regular OTel instrumentation libraries? Is this just a packaging issue or is there something fundamentally different between the two?
  • How configurable should the core module be? That is, are the opinions it implements essentially fixed, or is there a desire for some aspects of it (e.g. how sessions are defined) be configurable and/or extensible?

There are probably a lot more issues that will come up as the extension heads down this path, but overall, I think it's a really positive step forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants