Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define OTEL_EXPERIMENTAL_CONFIG_FILE to ignore other env vars, add env var substitution default syntax #3948

Merged
merged 13 commits into from Apr 5, 2024

Conversation

jack-berg
Copy link
Member

@jack-berg jack-berg commented Mar 18, 2024

Fixes #3752.

Implementation of the configuration working group recommendation as described here.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved
Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in this #3752 (comment) and other comments on the discussion:

This would break the mechanism AWS lambda, otel-operator and potentially other providers/environments outside of otel repositories already use to specify default service name, resource attributes, propagators, and maybe other settings (like exporters).

The alternative suggested in the discussion (asking users to keep boilerplate for env vars in the file) is fragile. It also removes incentive from cloud providers to integrate with otel by providing common env vars. They would rather ask users to handle their existing env vars resulting in degraded user experience.

@jack-berg
Copy link
Member Author

jack-berg commented Mar 20, 2024

I can think of two ways to accommodate the requirement for platforms / distributions to customize configuration:

  1. There is language in this PR that states "Implementations MAY provide a mechanism to customize the configuration model parsed from OTEL_CONFIG_FILE." In java, this would be a SPI akin to AutoConfigurationCustomizer, which would give the implementer the ability to customize the parsed in memory representation of the configuration model with any platform specific requirements. The downside of this is approach is the language specific implementation requirement. But if you're already packaging up some sort of distribution, you probably already are used to language specific customizations so maybe not such a bad thing. The language could be strengthened from "MAY" to "SHOULD" or "MUST".
  2. A platform could detect the presence of OTEL_CONFIG_FILE. If set, copy the contents to a new path, modify the contents to include the platform customizations, and update OTEL_CONFIG_FILE to point to the new location. This should be straight forward to do and in a language agnostic way. This has the benefit of allowing the customizer to have much more control over the configuration than is available with updating env vars. When you view the problem through this lens, the idea that platforms can only customize through the existing env var scheme seems very limiting.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 20, 2024

Ability to customize SDK behavior does apply to cloud providers. Azure Functions don't know which distro users wants to use and can't inject the distro for them.

I see some problems with the second approach as well:

  • complexity. Cloud providers would deal with messy merges we tried to 'avoid' on otel side. I don't trust them us to do it well or consistently
  • permissions. User would need to give providers permissions to modify the file
  • secret management. I might store auth keys in the file and don't want my provider to even see it
  • breaking. This is the change of existing behavior - providers will need to change their existing code to adopt the file - if it's not breaking, then what is?

@jack-berg
Copy link
Member Author

complexity. Cloud providers would deal with messy merges we tried to 'avoid' on otel side. I don't trust them us to do it well or consistently

Here's an example of merging a custom resource attribute into a config file using an open source command line YAML processor called yq (i.e. YAML equivalent of the popular jq).

Given config.yaml with contents:

resource:
  attributes:
    service.name: my-service

To add a new resource attribute foo = bar, we can call yq eval-all '.resource.attributes.foo = "bar"' ./config.yaml > config-new.yaml. The output in config-new.yaml is:

resource:
  attributes:
    service.name: my-service
    foo: bar

Its a trade off. A bit more complex than setting OTEL_RESOURCE_ATTRIBUTES, but much more powerful since file configuration is much more expressive and there are many knobs of configuring how merge works.

permissions. User would need to give providers permissions to modify the file
secret management. I might store auth keys in the file and don't want my provider to even see it

I propose writing to a new file and reassigning OTEL_CONFIG_FILE to the new location rather than modifying in place. I don't buy the secret management piece. Code that already has access to read and write env vars which are expected to contain secrets is already implicitly trusted. Reading and overwriting env vars vs reading / modifying a config file are two sides of the same coin.

breaking. This is the change of existing behavior - providers will need to change their existing code to adopt the file - if it's not breaking, then what is?

You've said that azure functions doesn't know or control which distro a user wants to use. Its entirely possible for a user to choose a distribution that ignores OTEL_RESOURCE_ATTRIBUTES, and the azure functions implementation would be powerless to ensure those attributes are present. Just as its entirely possible for a user to choose a distribution to ignore OTEl_RESOURCE_ATTRIBUTES, a user can choose to set a new env var OTEL_CONFIG_FILE which changes the behavior of existing env vars.

I'll refer to the arguments I previously made about breaking changes for env vars: #3752 (comment) We should not consider a change in behavior resulting from a user opting into something new to be a breaking change. Consider another example:

A vendor provides a distribution of otel which sets env vars to export OTLP to their endpoint, i.e. OTEL_OTLP_EXPORTER_ENDPOINT=https://some-vendor:4318. They support explicit bucket histograms and the SDK produces them by default. We stabilize exponential histograms, and introduce a new env var to opt in OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION=base2_exponential_bucket_histogram. A user sets this new env var. The vendor doesn't support exponential histograms, and fails to ingest the data. Should otel not be permitted to add OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION for fear of breaking the distribution? No, of course we should be allowed to add that. The distribution is expected to adapt, and either: 1. Add support for exponential histograms. 2. Document that users should not set the new env var. 3. Detect the new env var, and override it to revert to explicit bucket histograms.

This convo is reminiscent of the debate we had over whether extending the span interface to include a new addLink operation was breaking. Some argued that extending interfaces breaks custom SDK implementations, which would not compile unless they published a new version implementing the new surface area. We decided this interpretation is wrong, and would jeopardize our ability to innovate and evolve the project. In #3695 we codified that SDK implementers need to keep their implementations up to date. "It is inevitable that OpenTelemetry will release new features, and that application developers and library maintainers will make use of these features." Its not exactly the same as what were talking about, but there are striking similarities: This PR aims to introduce a new features which are opt in, and which platforms like Azure functions would need to adapt to in order to keep the same behavior in the event that a user does opt in.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 20, 2024

there is a problem of conflict resolution. By pushing it away to cloud providers, we don't solve it, but make it worse. Azure will decide to do it differently in every SaaS, PaaS, FaaS, AWS will do it in yet another way.

I do think it makes user experience worse.
If we made a few common env vars interop with the config, it would be easy for us, it would be easy for cloud providers and also easy for users. Dead-simple rule looks like "if property that could come from env var was not specified in the config, SDK MUST check corresponding env var". I still don't understand what's controversial here.

I propose writing to a new file and reassigning OTEL_CONFIG_FILE to the new location rather than modifying in place.

Can you elaborate? User specifies OTEL_CONFIG_FILE value - do you want cloud providers to change its value?
What if user sets OTEL_CONFIG_FILE inside dockerfile and it's not even visible to the cloud provider?

I don't buy the secret management piece. Code that already has access to read and write env vars which are expected to contain secrets is already implicitly trusted. Reading and overwriting env vars vs reading / modifying a config file are two sides of the same coin.

I don't agree. I can use key-vaults to retrieve my secrets from, I can dynamically format my config file in the startup scripts/dockerfile to retrieve data from configuration services. Env vars are not a secure way to do secret management at all.
If the config file is inside the container or otherwise protected from access, cloud providers won't be able to access it.

You've said that azure functions doesn't know or control which distro a user wants to use. Its entirely possible for a user to choose a distribution that ignores OTEL_RESOURCE_ATTRIBUTES, and the azure functions implementation would be powerless to ensure those attributes are present. Just as its entirely possible for a user to choose a distribution to ignore OTEl_RESOURCE_ATTRIBUTES, a user can choose to set a new env var OTEL_CONFIG_FILE which changes the behavior of existing env vars.

Virtually all OTel SDKs support OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES env vars. Distro that choses to ignore them would not have a great experience on Azure functions (AWS lambda or in otel-operator). there are problems now, but this proposal would make things worse. From supported virtually everywhere, env vars would be degraded to "supported until user accidentally opts opt"

back-compat

The subject of back-compat is cloud providers, not users here. The breaking change affects them. Users would opt-in without realizing that it will break their cloud provider integration with otel.

I want to emphasize, that I don't understand why this is necessary - why we can't make critical env vars work.
I understand there was a procedure a decision was made. I respect the decision and the procedure, but I feel it's easy to accommodate my feedback. I also feel that if we did it we would be able to provide much better integration and user experience.
So I feel like my feedback is being dismissed only because the procedure/vote has already happened and not because of any other substantial reasons.

@jack-berg
Copy link
Member Author

jack-berg commented Mar 20, 2024

I want to emphasize, that I don't understand why this is necessary - why we can't make critical env vars work.

The key is that you're making an argument that this is a breaking change to OTEL_RESOURCE_ATTRIBUTES, but because of unintuitive merge semantics excluding things like OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_*, OTEL_BSP_*, etc. Its either a breaking change for all env vars or none. It doesn't make sense to say that its only considered a breaking change to OTEL_RESOURCE_ATTRIBUTES because we know how to cleanly merge that env var. How do you explain that to a platform that was relying on setting one of the env vars which was skipped? Declaring that something is breaking is a much stronger position / stance than declaring that we ought not to do something because of the user experience.

I think its more correct to frame your argument as: merging the set of env vars which can be cleanly merged is a better user experience and that this improved user experience should be given high priority as an evaluation criteria. Framing it as "not doing this is breaking" while cherry-picking which env vars it applies to is self-contradicting.

But as I stated here, the ideas you're discussing are an entirely new proposal, not a modification of the existing proposal. There's virtually nothing left of this recommendation once we adjust it. And that's fine. I don't agree, but its fine for us to disagree and use the escalation path available.

@lmolkova
Copy link
Contributor

I think its more correct to frame your argument as: merging the set of env vars which can be cleanly merged is a better user experience and that this improved user experience should be given high priority as an evaluation criteria. Framing it as "not doing this is breaking" while cherry-picking which env vars it applies to is self-contradicting.

I'm glad you've heard me. Let's frame it as overarching "better user experience" if it makes my point more understandable.

@RohitRanjanMS
Copy link

Hello @jack-berg , I'm part of the Azure Functions team, and we're currently adding support for OpenTelemetry. Our architecture consists of a host process and a worker process, with customer code running in the worker process. We handle the host process, while customers have full control over the worker process.

In the host process, we are developing providers for logs, traces, and metrics that will send telemetry data to the OTLP endpoint. This setup is influenced by environment variables that signal us to connect the OTLP exporter and resources. We expect customers to set up something similar in the worker process.

The configuration seems promising, aiming to offer customers more flexibility in setting up providers in the host process. I appreciate the convenience of using both configuration files and environment variables simultaneously. For instance, customers can use configuration files to establish providers and environment variables like OTEL_EXPORTER_OTLP_ENDPOINT for setting the endpoint. Unlike configurations, environment variables are accessible in both host and worker processes.

However, the proposed changes could significantly affect our implementation. If customers switch to using configurations by setting OTEL_CONFIG_FILE, it might disrupt the host process. We will have to come-up with a solution to resolve the conflict between config and env variables.

I understand the complexities and reasons behind your proposal, but it could significantly complicate our operations. Ideally, I'd like to see a solution where both environment variables and configuration files can exist together.

@marcalff
Copy link
Member

If customers switch to using configurations by setting OTEL_CONFIG_FILE, it might disrupt the host process.

Could you provide more details on this ?

Once the OTEL_CONFIG_FILE feature, namely parsing a configuration from yaml, is supported in every language SIG SDK, why would this host process be impacted in particular ?

@RohitRanjanMS
Copy link

Hi @marcalff , environment variables are accessible by both the host and worker processes. If the customer chooses to use OTEL_CONFIG_FILE, the OTel SDK operating within the host process will disregard all OTEL related environment variables.

@jack-berg
Copy link
Member Author

@RohitRanjanMS I don't understand the problem. I'll summarize my understanding, and leave you to correct anything I misinterpret:

  • The architecture runs two processes: 1. Host process managed by azure. 2. Worker process running user code.
  • Env vars are shared between these two processes.
  • The host process detects whether OTEL_EXPORTER_OTLP_ENDPOINT (and other OTEL_* env vars) is set. If so, the host process exports telemetry data according to these env vars. This allows the host process and worker process to export telemetry data according to the same config.
  • If a user sets OTEL_CONFIG_FILE to configure the worker process and stops setting other OTEL_* env vars, then the host process won't be able to export telemetry according to config in OTEL_CONFIG_FILE because it can't access the file contents. As a result, the telemetry export config between the host process and worker process will diverge.

Is that right?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 21, 2024

BTW I just realized that OTEL_CONFIG_FILE is experimental. Even if cloud providers could access the file and could do the merge conflict resolution that otel cannot do, you can't reasonably tell them us to implement the integration until the config is stable.

And also I realized that the full (correction: fullest possible) backward compatibility with existing env vars results in this config (please correct me if I'm wrong)

Do you expect all users to keep it?!

file_format: "0.1"
disabled: ${OTEL_SDK_DISABLED:-false}
resource:
  attributes:
    service.name: ${OTEL_SERVICE_NAME:-unknown_service}

attribute_limits:
  attribute_value_length_limit: ${OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT}
  attribute_count_limit: ${OTEL_ATTRIBUTE_COUNT_LIMIT:-128}

tracer_provider:
  processors:
    - batch:
        schedule_delay: ${OTEL_BSP_SCHEDULE_DELAY:-5000}
        export_timeout: ${OTEL_BSP_EXPORT_TIMEOUT:-30000}
        max_queue_size: ${OTEL_BSP_MAX_QUEUE_SIZE:-2048}
        max_export_batch_size: ${OTEL_BSP_MAX_EXPORT_BATCH_SIZE:-512}
        exporter:
          otlp:
            protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
            endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
            certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
            client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
            compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
            timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
  limits:
    attribute_value_length_limit: ${OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT}
    attribute_count_limit: ${OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT:-128}
    event_count_limit: ${OTEL_SPAN_EVENT_COUNT_LIMIT:-128}
    link_count_limit: ${OTEL_SPAN_LINK_COUNT_LIMIT:-128}
    event_attribute_count_limit: ${OTEL_EVENT_ATTRIBUTE_COUNT_LIMIT:-128}
    link_attribute_count_limit: ${OTEL_LINK_ATTRIBUTE_COUNT_LIMIT:-128}

meter_provider:
  readers:
    - periodic:
        interval: ${OTEL_METRIC_EXPORT_INTERVAL:-60000}
        timeout: ${OTEL_METRIC_EXPORT_TIMEOUT:-30000}
        exporter:
          otlp:
            protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
            endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
            certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
            client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
            compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
            timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
            temporality_preference: ${OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE:-cumulative}
            default_histogram_aggregation: ${OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION:-explicit_bucket_histogram}

logger_provider:
  processors:
    - batch:
        schedule_delay: ${OTEL_BLRP_SCHEDULE_DELAY:-1000}
        export_timeout: ${OTEL_BLRP_EXPORT_TIMEOUT:-30000}
        max_queue_size: ${OTEL_BLRP_MAX_QUEUE_SIZE:-2048}
        max_export_batch_size: ${OTEL_BLRP_MAX_EXPORT_BATCH_SIZE:-512}
        exporter:
          otlp:
            protocol: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http/protobuf}
            endpoint: ${OTEL_EXPORTER_OTLP_PROTOCOL:-http://localhost:-4318}
            certificate: ${OTEL_EXPORTER_OTLP_CERTIFICATE}
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
            client_certificate: ${OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE}
            compression: ${OTEL_EXPORTER_OTLP_COMPRESSION:-gzip}
            timeout: ${OTEL_EXPORTER_OTLP_TIMEOUT:-10000}
  limits:
    attribute_value_length_limit: ${OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMIT}
    attribute_count_limit: ${OTEL_LOGRECORD_ATTRIBUTE_COUNT_LIMIT:-128}

@jack-berg
Copy link
Member Author

BTW I just realized that OTEL_CONFIG_FILE is experimental.

Its not even experimental! It doesn't exist at all yet! 😁 But yes, file configuration as a whole is still experimental. That's part of the motivation behind the "wait and see" approach of this PR. Since its experimental, we'd be free to evolve the spec if there is strong user feedback.

And also I realized that the full backward compatibility with existing env vars results in this config (please correct me if I'm wrong)
Do you expect all users to keep it?!

Not full backwards compatibility - see this list of env vars not referenced. Its true that not all users will keep it. The idea is that by providing a starter template complete with comments describing the env var behavior, and funneling users to it by referencing prominently in docs and bundling it with distributions, that it will be harder for users to misinterpret. I.e. most users should intuit that explicitly deleting a reference to ${OTEL_SDK_DISABLED:-false} will cause OTEL_SDK_DISALBED to be ignored.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 21, 2024

Its not even experimental! It doesn't exist at all yet! 😁 But yes, file configuration as a whole is still experimental. That's part of the motivation behind the "wait and see" approach of this PR. Since its experimental, we'd be free to evolve the spec if there is strong user feedback.

Yeah and I'd be totally ok with "wait and see" approach if we didn't have existing specific cases where it would be problematic.
Or if the spec language would tell that existing integrations that rely on a TBD list of env vars will be able to keep using them regardless of user's choice to opt in into the config.

I.e. most users should intuit that explicitly deleting a reference to ${OTEL_SDK_DISABLED:-false} will cause OTEL_SDK_DISALBED to be ignored.

So I'm a user. I look into these 70 lines of config and keep only the things I care about, which could be something like

file_format: "0.1"
disabled: ${OTEL_SDK_DISABLED}
resource:
  attributes:
    service.name: ${MY_SERVICE_NAME}

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            protocol: protobuf
            endpoint: http://localhost:-4318
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}

meter_provider:
  readers:
    - periodic:
        interval: 5000
        exporter:
          otlp:
            protocol: protobuf
            endpoint: http://localhost:-4318
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}
            default_histogram_aggregation: explicit_bucket_histogram

logger_provider:
  processors:
    - batch:
        exporter:
          otlp:
            protocol: protobuf
            endpoint: http://localhost
            client_key: ${OTEL_EXPORTER_OTLP_CLIENT_KEY}

Now I have an incident, and I realize I need to specify OTEL_BSP_MAX_QUEUE_SIZE.

  1. In my environment changing env vars is significantly easier than redeploying the config (not a rare case)
  2. I didn't read the docs or forgot that config overwrites everything - I try adding OTEL_BSP_MAX_QUEUE_SIZE env var and restarting my service - it does not help
  3. Or even if I read the docs, now I need to redeploy the config.
    [Update]: so the best practice is to keep all the 70 lines unless you really know what you're doing, right?

Isn't it enough evidence that env var support is kinda necessary?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 21, 2024

Back-compat path forward I can think of is the following:

  • In presence of any existing OTEL_ env var the config is ignored - there is a warning that tell users they have a conflict. The priority is given to the existing stable thing instead of new experimental thing
  • We agree that we need to spend more time figuring out env var story: we might want to deprecate some and figure out merge conflict issues for others.
  • Once we figure out the story for variable X, we allow it to coexist with the config.
  • Eventually we get to the point that we only have env vars that interop with the config well.

@tigrannajaryan
Copy link
Member

isolated worker process (any language)

@lmolkova to clarify, in isolated mode the host process launches the worker process, injecting env variables like OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES?

In case of isolated worker, the telemetry comes from both - the host AND worker process.

Can you please also clarify what telemetry is the host process emitting? Does host process also use Otel SDK and emits telemetry about itself? What values does the host process use for OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES? Are they the same values as injected into worker process or different values? Does user control what destination to use for host's telemetry or is it controlled by cloud provider?

Worker files are not visible to the host process in a general case (e.g. you can run your worker process in a container) and the file can be inside the image

I think this is a key limitation. In this case supposedly the config file provided by the user via OTEL_CONFIG_FILE should reside inside the container and the patching approach @jack-berg suggested seems impossible to perform for the host process since host process can't write to container's filesystem (unless the filesystem is mapped between the host and container?).

@RohitRanjanMS
Copy link

Hi @tigrannajaryan
Currently, our support extends only to ApplicationInsights, and it's implemented through a close integration between Functions and AI.
We handle the host process, which you can think of as a service that manages the worker process. This worker process is where customers' code is executed. Customers maintain complete control over this worker process and have the liberty to instrument it as desired.
It's important to note that environment variables are shared between the Host and Worker processes, but configurations are not.
We plan to set up tracing, logging, and metrics providers in the host process using environment variables. Since these variables are shared, they can also be leveraged by customers in the worker process.
Our system looks for either an ApplicationInsights connection string or an OTEL endpoint in the environment variables. Depending on what is found, we then connect to the appropriate exporter.

@lmolkova
Copy link
Contributor

@lmolkova to clarify, in isolated mode the host process launches the worker process, injecting env variables like OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES?

AFAIK yes, but users can override/specify env vars as well

Can you please also clarify what telemetry is the host process emitting? Does host process also use Otel SDK and emits telemetry about itself?

Host process is doing work on behalf of worker process. E.g. worker can return a byte array, but Function is configured to store this data in an database or object store. Or worker can be triggered by queue message and host process is the one that interacts with the queue and passes the content of the message to worker. These things are called 'bindings' - they are part of user app, but done on behalf of user.

Host collects telemetry from these bindings using OTel instrumentation libraries and OTel SDK

What values does the host process use for OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES? Are they the same values as injected into worker process or different values? Does user control what destination to use for host's telemetry or is it controlled by cloud provider?

Resource attributes describe Function name (as service name), and the cloud resource. They are the same for both processes.
Users can control the destination - they are expected to set OTEL_EXPORTER_OTLP_ENDPOINT (once but for both processes)

@lmolkova
Copy link
Contributor

lmolkova commented Mar 27, 2024

Worker files are not visible to the host process in a general case (e.g. you can run your worker process in a container) and the file can be inside the image

I think this is a key limitation. In this case supposedly the config file provided by the user via OTEL_CONFIG_FILE should reside inside the container and #3948 (comment) @jack-berg suggested seems impossible to perform for the host process since host process can't write to container's filesystem (unless the filesystem is mapped between the host and container?).

I think this is the hardest limitation, even if we can overcome it in one place, there are other services, service meshes, instrumentations for web servers and whatnot, there could be others. We can't assume config is accessible or modifiable.

And even if it is, I argue that softer limitations are even more severe: reading/modifying user data without explicit consent is a red flag for me and I'd have to bring it up with Microsoft legal and privacy folks to guide us if it's possible.

@lmolkova lmolkova self-requested a review March 28, 2024 14:55
@jack-berg
Copy link
Member Author

jack-berg commented Mar 28, 2024

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision:

Generally, we will follow @trask’s comment, proceeding with this PR with a few changes:

  • Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.
  • Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. Deprecate environment variables which do not interoperate with file config when ready for stabilization #3967
  • Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with this PR without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). Ensure that file config has solution for platforms which contribute to config #3966

I've updated this PR to reflect this decision. @open-telemetry/technical-committee, @open-telemetry/configuration-maintainers, and participants in this conversation, please review.

@jack-berg jack-berg changed the title Define OTEL_CONFIG_FILE to ignore other env vars, add env var substitution default syntax Define OTEL_EXPERIMENTAL_CONFIG_FILE to ignore other env vars, add env var substitution default syntax Mar 28, 2024
@marcalff
Copy link
Member

@jack-berg nit: the PR text still contains a reference to OTEL_CONFIG_FILE.

@jack-berg
Copy link
Member Author

@open-telemetry/configuration-maintainers PTAL. I'll plan on merging once I get a couple of approvals from you all.

@jack-berg
Copy link
Member Author

Will merge tomorrow if there are no other comments.

@jack-berg jack-berg merged commit aafb77a into open-telemetry:main Apr 5, 2024
7 checks passed
configuration model. Implementations MAY provide a mechanism to customize the
configuration model parsed from `OTEL_EXPERIMENTAL_CONFIG_FILE`.

Users are encouraged to use the `sdk-config.yaml` (TODO: Add link when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be otel-config.yaml, there can be plenty of SDKs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should file configuration merge environment variable configuration?