Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logback appender attaches to local Grpc Context, leading to cancellation failures. #537

Closed
jsuereth opened this issue Sep 14, 2021 · 10 comments
Assignees
Labels
api: logging Issues related to the googleapis/java-logging-logback API. lang: java Issues specific to Java. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jsuereth
Copy link

Currently, the logback appender is attaching its gRPC calls to the thread-local GRPC context. This means that, as a user of gRPC, if I have a tight deadline for my production RPCs, it's likely the gRPC context gets cancelled for my telemetry-plane logs RPCs.

This can be reproduced with the following code:

public class ContextIssues {
  private static final Logger logger =
      LoggerFactory.getLogger(ContextIssues.class.getName());

  private static void runCancelled(Runnable runnable){
    CancellableContext withCancellation = io.grpc.Context.current().withCancellation();
    withCancellation.run(() -> {
      withCancellation.cancel(new RuntimeException("Test cancel"));
      runnable.run();
    });
  }
  public static void main(String[] args) throws IOException {
     runCancelled(() -> {
       logger.info("Please send me");
     });
  }
}

Which throws something like the following:

SEVERE: RuntimeException while executing runnable CallbackListener{com.google.api.core.ApiFutures$1@546a179} with executor MoreExecutors.directExecutor()
java.lang.RuntimeException: com.google.cloud.logging.LoggingException: io.grpc.StatusRuntimeException: CANCELLED: Context cancelled
        at com.google.cloud.logging.LoggingImpl$8.onFailure(LoggingImpl.java:751)
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:100)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:126)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
        at com.google.api.gax.retrying.BasicRetryingFuture.handleAttempt(BasicRetryingFuture.java:200)
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:135)
        at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:117)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
        at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:95)
        at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:77)
        at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
        at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
        at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
        at io.grpc.internal.ClientCallImpl$1ClosedByContext.runInContext(ClientCallImpl.java:212)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.google.cloud.logging.LoggingException: io.grpc.StatusRuntimeException: CANCELLED: Context cancelled
        at com.google.cloud.logging.spi.v2.GrpcLoggingRpc$2.apply(GrpcLoggingRpc.java:201)
        at com.google.cloud.logging.spi.v2.GrpcLoggingRpc$2.apply(GrpcLoggingRpc.java:195)
        at com.google.api.core.ApiFutures$ApiFunctionToGuavaFunction.apply(ApiFutures.java:240)
        at com.google.common.util.concurrent.AbstractCatchingFuture$CatchingFuture.doFallback(AbstractCatchingFuture.java:223)
        at com.google.common.util.concurrent.AbstractCatchingFuture$CatchingFuture.doFallback(AbstractCatchingFuture.java:211)
        at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:124)
        ... 30 more
Caused by: com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: Context cancelled
        at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:43)
        at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
        at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
        ... 17 more
Caused by: io.grpc.StatusRuntimeException: CANCELLED: Context cancelled
        at io.grpc.Status.asRuntimeException(Status.java:535)
        ... 9 more

This should be fixable via wrapping all call to the logging API in a new grpc Context (similar to what gRPC does for its own control plane calls).

  • Option Add #1 - brand new context Context prevContext = context.attach(); w/ corresponding detach after sending rpc.
  • Option Add #2 - fork (preserving all information except cancellation): e.g. Context ctx = Context.current().fork();

I'm not sure where you'd like to fix it, but happy to submit a PR with a fix.

@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/java-logging-logback API. label Sep 14, 2021
@minherz minherz added lang: java Issues specific to Java. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Sep 15, 2021
@minherz
Copy link
Contributor

minherz commented Sep 15, 2021

After discussing the problem with @jsuereth we come to the following action items:

  • verify whether configuring synchronous log writing resolves the problem
  • if sync writing works, expose the configuration in the appender if it is missing

@jsuereth
Copy link
Author

FYI - I think this is actually a bug. Users of gRPC are unable to successfully use the log-appender without risk of losing logs due to cancellation issues.

I was unable to configure sync writing via the logback.xml, so at a minimum that's missing.

@minherz
Copy link
Contributor

minherz commented Sep 16, 2021

Execution action items has the following results:

The follow up actions planned include:

  • review use of context in java-logging that is used by the appender to write logs
  • identify what can be done to avoid using default execution context in the logging library

@aphinier
Copy link

Any update on this issue? Continuously seeing this issue happened on my service.

@Jonpez2
Copy link

Jonpez2 commented Nov 25, 2021

Hello! We are also experiencing this issue; it took us quite a while to understand what was going on. Is there any plan to fix it?

Thanks!

@eypher
Copy link

eypher commented Nov 29, 2021

We verified that setting writeSynchronicity=SYNC stops the flood of CANCELLED exceptions, but are worried about performance of synchronous logging

@minherz
Copy link
Contributor

minherz commented Dec 9, 2021

@eypher are you running on Google Cloud? In that case you can leverage support for logging to stdout that provide serverless environments and logging agents. With the planned release (ETA January'22) of structured logs support it let you use the library to print the structured logs to stdout and allow to the environment to forward it to Cloud Logging.

We are working to resolve this problem but the changes in core components (like gax and grpc) take time to do.

@minherz
Copy link
Contributor

minherz commented Dec 13, 2021

@jsuereth right now we can only substitute another context in the

protected void append(ILoggingEvent e) {
LogEntry logEntry = logEntryFor(e);
getLogging().write(Collections.singleton(logEntry), defaultWriteOptions);
}

by enclosing the write() call within:

io.grpc.Context loggingContext = io.grpc.Context.current().fork();
io.grpc.Context prevContext = loggingContext.attach();
try {
    getLogging().write(Collections.singleton(logEntry), defaultWriteOptions);
} finally {
    loggingContext.detach(prevContext);
}

However, we do not have a set of tests to validate all possible behaviors in a case of the asynchronous log writing which batching multiple write requests together.
It might be possible that replacing the context will have negative effect on other grpc connections. We are working to investigate if it is possible to implement it in the communication layers (e.g. gax).

@minherz
Copy link
Contributor

minherz commented Jan 20, 2022

@jsuereth using the latest versions of the google-cloud-bom and the library i cannot reproduce the problem any longer. See the gist with pom.xml and TestLogback.java files which work without throwing exceptions. The Java file contains (almost) exact code sample for reproducing the problem.

@simonz130 simonz130 assigned minherz and unassigned simonz130 Jan 26, 2022
@minherz
Copy link
Contributor

minherz commented Jan 31, 2022

Confirmed with the reporter that the issue does not reproduce with the latest versions of the logging library:

  • java-logging v3.6.0
  • java-logging-logback v0.123.0

The fix comes from gax after they implemented internal thread pool for managing rpc communication channels.

@minherz minherz closed this as completed Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the googleapis/java-logging-logback API. lang: java Issues specific to Java. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

6 participants