Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] BatchLogRecordProcessor::ForceFlush hangs for 10 seconds #2583

Open
ggaudeau opened this issue Mar 8, 2024 · 6 comments · May be fixed by #2584
Open

[SDK] BatchLogRecordProcessor::ForceFlush hangs for 10 seconds #2583

ggaudeau opened this issue Mar 8, 2024 · 6 comments · May be fixed by #2584
Assignees
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ggaudeau
Copy link

ggaudeau commented Mar 8, 2024

Platform : Ubuntu 23.10
Build: opentelemetry 1.11.0 with WITH_ASYNC_EXPORT_PREVIEW enabled
Deps: curl 7.85.0

Here are the lines of code used to initialize the logger provider:

opentelemetry::exporter::otlp::OtlpHttpLogRecordExporterOptions opts;

auto exporter =
    opentelemetry::exporter::otlp::OtlpHttpLogRecordExporterFactory::Create(
        opts);

opentelemetry::sdk::logs::BatchLogRecordProcessorOptions batchOptions;

auto processor =
    opentelemetry::sdk::logs::BatchLogRecordProcessorFactory::Create(
        std::move(exporter),
        batchOptions);

auto resource = opentelemetry::sdk::resource::Resource::Create(attributes);

std::shared_ptr<opentelemetry::logs::LoggerProvider> provider =
    opentelemetry::sdk::logs::LoggerProviderFactory::Create(
        std::move(processor),
        resource);

opentelemetry::logs::Provider::SetLoggerProvider(provider);

Before quitting my application I do a ForceFlush with a timeout equal to 1 second.

auto loggerProvider =
            reinterpret_cast<opentelemetry::sdk::logs::LoggerProvider*>(
                provider);

loggerProvider->ForceFlush(std::chrono::seconds(1));

std::shared_ptr<opentelemetry::logs::LoggerProvider> none;
opentelemetry::logs::Provider::SetLoggerProvider(none);

When the network connection is not good, it can happen that my application hangs for 10 seconds, which corresponds to the timeout of the curl request which I don't want to change.

I'd like to know why the timeout is not taken into account.

@ggaudeau ggaudeau added the bug Something isn't working label Mar 8, 2024
@github-actions github-actions bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 8, 2024
@owent
Copy link
Member

owent commented Mar 10, 2024

ForceFlush will wait for 1 second, but opentelemetry::logs::Provider::SetLoggerProvider(none) will make internal exporters shutdown can wait for all pending requests to finish.
Could you please try to call loggerProvider->Shutdown(std::chrono::seconds(1)); instead of loggerProvider->ForceFlush(std::chrono::seconds(1));. This function will try to cancel all pending requests when got progress callback called by libcurl.

@owent owent linked a pull request Mar 10, 2024 that will close this issue
3 tasks
@ggaudeau
Copy link
Author

Being in asynchronous mode, the Shutdown will cancel the sending of the last logs, which I'd like to avoid.

image

To be more accurate, when I set a one-second timeout and there are network problems, the app stays five seconds at line 119 and almost five seconds again in block 129-138

Basically, I switched to asynchronous mode because the processor was stuck in the Export method when I asked it to ForceFlush or Shutdown

@ggaudeau
Copy link
Author

I am using Charles application to simulate throttling.

@esigo esigo added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 11, 2024
@esigo
Copy link
Member

esigo commented Mar 11, 2024

SIG meeting 11.03.2024: need a decision on which timeout should be used for shutdown, and if we need to flush the last buffer.
would be great to check the implementation in the other repos and have a consistent solution as other repos.

Copy link

This issue was marked as stale due to lack of activity.

@github-actions github-actions bot added the Stale label May 11, 2024
@owent
Copy link
Member

owent commented May 15, 2024

#2584 may also solve this problem.

@github-actions github-actions bot removed the Stale label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants