Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXAMPLES] Improve ForceFlush call to also work with NoopTracerProvider #2506

Open
ecourreges-orange opened this issue Jan 23, 2024 · 7 comments · May be fixed by #2664
Open

[EXAMPLES] Improve ForceFlush call to also work with NoopTracerProvider #2506

ecourreges-orange opened this issue Jan 23, 2024 · 7 comments · May be fixed by #2664
Assignees
Labels
good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ecourreges-orange
Copy link
Contributor

Is your feature request related to a problem?
Yes, if you copy and paste code from the examples, which most users will do, you end up with the code from cleanupTracer:

void CleanupTracer()
{
  // We call ForceFlush to prevent to cancel running exportings, It's optional.
  opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
      trace::Provider::GetTracerProvider();
  if (provider)
  {
    static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();
  }

  std::shared_ptr<opentelemetry::trace::TracerProvider> none;
  trace::Provider::SetTracerProvider(none);
}

Which is code that does not work when called before any initialization.
This may happen at least in unit tests of client code (happened to me and made me search for the best solution for 1h).
Also, it is better to call ForceFlush with a time argument so that people know they can use it, and you won't have issues posted like "why does my app not shutdown?".

Describe the solution you'd like
I would like the example code to work in all cases, inited or not.

Describe alternatives you've considered
Changing from static_cast to dynamic cast and checking if the provider exists fixes this,
Also adding a 10s timeout to ForceFlush should cover typical cases gracefully.

void CleanupTracer()
{
  // We call ForceFlush to prevent cancelling running exports, It's optional.
  opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
      trace::Provider::GetTracerProvider();
    if (provider)
    {
        trace_sdk::TracerProvider* tracerProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
        if ( tracerProvider) {
            // Remove arg if you want to wait forever
            tracerProvider->ForceFlush(std::chrono::milliseconds(10000));
        }
    }
  std::shared_ptr<opentelemetry::trace::TracerProvider> none;
  trace::Provider::SetTracerProvider(none);
}

If you would like, I can provide the PR for the 4 examples that i found that have this code:

static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();

static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();

static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();

static_cast<trace_sdk::TracerProvider *>(provider.get())->ForceFlush();

@github-actions github-actions bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 23, 2024
@lalitb
Copy link
Member

lalitb commented Jan 24, 2024

Make sense, but dynamic_cast could only be used when RTTI is enabled. Should we keep both the options, something like this (not tested):

void CleanupTracer() {
    opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> provider =
        trace::Provider::GetTracerProvider();

    if (provider) {
#ifdef OPENTELEMETRY_RTTI_ENABLED
        // When RTTI is enabled, use dynamic_cast for safety
        trace_sdk::TracerProvider* sdkProvider = dynamic_cast<trace_sdk::TracerProvider*>(provider.get());
        if (sdkProvider) {
            sdkProvider->ForceFlush();
        }
#else
        // Without RTTI, use static_cast
        // Note: This is safe only if the TracerProvider is correctly initialized as trace_sdk::TracerProvider
        static_cast<trace_sdk::TracerProvider*>(provider.get())->ForceFlush();
#endif
    }

    std::shared_ptr<opentelemetry::trace::TracerProvider> none;
    trace::Provider::SetTracerProvider(none);
}
   

Assigned to you @ecourreges-orange. Thanks for troubleshooting :)

@marcalff
Copy link
Member

marcalff commented Jan 24, 2024

Calling GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.

A possible change is for the code to keep a global variable on the SDK provider used instead, and never call GetTracerProvider() from the SDK init/cleanup code in the application.

Instrumented code still can call GetTracerProvider() of course.

@marcalff marcalff added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Good for taking. Extra help will be provided by maintainers and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2024
Copy link

This issue is available for anyone to work on. Make sure to reference this issue in your pull request.
✨ Thank you for your contribution! ✨

@marcalff marcalff added the good first issue Good for newcomers label Jan 24, 2024
@marcalff marcalff changed the title In examples: Improve ForceFlush call to also work with NoopTracerProvider [EXAMPLES] Improve ForceFlush call to also work with NoopTracerProvider Jan 24, 2024
@lalitb
Copy link
Member

lalitb commented Jan 24, 2024

Calling GetTracerProvider(), which returns an API level object, and somehow getting at the SDK provider from it, is by definition wrong.

A possible change is for the code to keep a global variable on the SDK provider used instead, and never call GetTracerProvider() from the SDK init/cleanup code in the application.

Instrumented code still can call GetTracerProvider() of course.

Agree. This would be the correct approach.

@marcalff
Copy link
Member

I think there is more to this bug report.

Calling ForceFlush on the NoopTracerProvider / NoopTracer should be illegal in the first place.

The API spec only defines:

  • a TracerProvider::GetTracer() operation
  • a Tracer::CreateSpan() operation

there is no Flush at the API level.

opentelemetry-cpp exposed extra ForceFlush() and Close() methods on the API tracer, which is outside the specs, and not needed to instrument an application.

These methods belong to the SDK only.

For ABI_VERSION_1, removing these will be a breaking change, so the best is to just leave them.

For ABI_VERSION_2, I think ForceFlush() and Close() should be removed from the API, and only be available in the SDK.

cc @open-telemetry/cpp-approvers

marcalff added a commit to marcalff/opentelemetry-cpp that referenced this issue May 9, 2024
@marcalff marcalff linked a pull request May 9, 2024 that will close this issue
3 tasks
@malkia
Copy link

malkia commented May 9, 2024

Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.

dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).

@marcalff
Copy link
Member

Instead of having two code paths (nightmares) for static_cast and dynamic_cast - can't we use something akin to Qt's cast - https://doc.qt.io/qt-6/qobject.html#qobject_cast - it's probably more work to add this - but might be worth doing instead.

dynamic_cast also tends to get much slower with larger apps (though it doesn't look like it's called in the critical path, but may one day).

Or do not cast at all, see PR #2664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants