Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enhance DefaultUnhandledExceptionsReporter for better exception repor… #5601

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

coodercl
Copy link

…ting

Motivation: #5567

Currently, the DefaultUnhandledExceptionsReporter stores only one exception and the number of unhandled exceptions in an interval. However, this implementation lacks sufficient information for effective debugging and analysis. To address this limitation, we propose enhancing the reporter to provide more detailed information about unhandled exceptions.

Proposed Changes:

  • Store Every Unique Exception and Its Count.
    • Modify the reporter to store every unique exception encountered along with its occurrence count. A unique exception refers to instances where the class type and stack traces are identical.
  • Include Corresponding Context
    • Enhance the reporting mechanism to include the corresponding context of each exception.

Modifications:

  • Store Every Unique Exception and Its Count
  • Include Corresponding Context

Result:

@@ -106,6 +109,7 @@ private void reportException() {
"detailed error logs. One of the thrown exceptions:",
newExceptionsCount,
TextFormatter.elapsed(intervalMillis, TimeUnit.MILLISECONDS), exception);
thrownExceptionCounters.computeIfAbsent(exception, k -> new LongAdder()).increment();
Copy link
Author

@coodercl coodercl Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@injae-kim nim, I've implemented recording the exception in the thrownExceptionCounters when it's not null.
Did I understand the intention correctly?
Please let me know if I've misunderstood anything.
Thank you. 🙇

@@ -59,7 +62,7 @@ final class DefaultUnhandledExceptionsReporter implements UnhandledExceptionsRep
}

@Override
public void report(Throwable cause) {
public void report(ServiceRequestContext ctx, Throwable cause) {
if (reportingTaskFuture == null && scheduledUpdater.compareAndSet(this, 0, 1)) {
reportingTaskFuture = CommonPools.workerGroup().next().scheduleAtFixedRate(
this::reportException, intervalMillis, intervalMillis, TimeUnit.MILLISECONDS);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about this code.
Why is this::reportException called repeatedly at regular intervals?
I'm wondering why we don't call this::reportException only once when the report method is invoked.
Thank you. 🙇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we lazily start the periodic timer task that logs and clears the exceptions and their count mapping every few seconds. The timer is started only once since it's guarded by the compareAndSet() call.

The reason we do this way is because otherwise the recently reported exceptions will not be logged until the 'next' exception is reported, which is not always the case.

@ikhoon ikhoon added improvement sprint Issues for OSS Sprint participants labels Apr 23, 2024
.isEqualTo(1);
assertThat(defaultUnhandledExceptionsReporter.getThrownExceptions().get(cause3).first().longValue())
.isEqualTo(1);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@injae-kim nim, I have a question 🙇
When referring to this test, even though it's the same IllegalArgumentException, it's being recognized as a new key because they are different objects.
Is this the direction we want? I thought that even if they are different objects, they should be recognized as the same key if they have the same type. 😄
Thank you~!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait maintainer's comment here~! 🙇

Copy link
Contributor

@ikhoon ikhoon Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stack traces and exception messages differ, the same exception type should be corrected separately. Exceptions.traceText(cause) can be used as keys instead of instances themselves.

@@ -50,7 +55,7 @@ final class DefaultUnhandledExceptionsReporter implements UnhandledExceptionsRep
@Nullable
private ScheduledFuture<?> reportingTaskFuture;
@Nullable
private Throwable thrownException;
private Map<Throwable, Pair<LongAdder, ServiceRequestContext>> thrownExceptions;
Copy link
Contributor

@injae-kim injae-kim Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // akka
    implementation libs.akka.actor.typed.v213
private static class ThrowableEntry {
  LongAdder adder;
  ServiceRequestContext ctx;
}

private Map<Throwable, ThrowableEntry> thrownExceptions;

oh instead of import Pair on akka, can we add some class like ThrowableEntry?
then we don't need to import akka 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to add an additional dependency for this. Should we use Map.Entry or a custom class as suggested by @injae-kim instead?

@@ -37,6 +39,7 @@ final class DefaultUnhandledExceptionsReporter implements UnhandledExceptionsRep
private static final AtomicIntegerFieldUpdater<DefaultUnhandledExceptionsReporter> scheduledUpdater =
AtomicIntegerFieldUpdater.newUpdater(DefaultUnhandledExceptionsReporter.class,
"scheduled");
private Map<Throwable, LongAdder> thrownExceptionCounters = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private Map<Throwable, LongAdder> thrownExceptionCounters = new ConcurrentHashMap<>();
private final Map<Throwable, LongAdder> thrownExceptionCounters = new ConcurrentHashMap<>();

@@ -50,7 +55,7 @@ final class DefaultUnhandledExceptionsReporter implements UnhandledExceptionsRep
@Nullable
private ScheduledFuture<?> reportingTaskFuture;
@Nullable
private Throwable thrownException;
private Map<Throwable, Pair<LongAdder, ServiceRequestContext>> thrownExceptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to add an additional dependency for this. Should we use Map.Entry or a custom class as suggested by @injae-kim instead?

if (thrownException == null) {
thrownException = cause;
if (thrownExceptions == null) {
thrownExceptions = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called by multiple threads for which HashMap can' be used.

TextFormatter.elapsed(intervalMillis, TimeUnit.MILLISECONDS), exception);
thrownException = null;
TextFormatter.elapsed(intervalMillis, TimeUnit.MILLISECONDS), exceptions.toString());
thrownExceptions = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of a dedicated class to collect exceptions and encapsulate the logic?

final class ExceptionStats {
    private Map<String, ExceptionContext> exceptions = ...;
    // A lock is required to thread-safely update `exceptions` field
    private final ReentrantShortLock lock = ...

    void record(ServiceRequestContext ctx, Throwable cause) {
        lock.lock();
        try {
            exceptions.compute(Exceptions.traceText(cause), (key, value) -> {
                ...
                value.increase();
                ...
            });
        } finally {
            lock.unlock();
        }
    }

    Collection<ExceptionContext> dump() {
        lock.lock();
        try {
            final Map<String, ExceptionContext> oldExceptions = exceptions;
            // Allocate a new map
            exceptions = new HashMap<>();
            return oldExceptions.values();
        } finally {
            lock.unlock();
        }
    }

    private static class ExceptionContext {
        private final Throwable exception;
        // Capture the first context when the exception is raised for better debugging
        private final ServiceRequestContext ctx;
        private long counter = 1;
        ...
    }
}

When reportException() is invoked, we may dump the counted exceptions and log all of them.

private void reportException() {
    for (exceptionContext: exceptionStats.dump()) {
          logger.warn("Observed {} exception(s) that ...", exceptionContext.count(), ...);
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce ExceptionStats class looks nice to me too here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikhoon nim, thank you so much for providing detailed explanations along with the example code.
It has been really helpful in enhancing my understanding.
I would like to verify if I have correctly understood your instructions first and then proceed to write test codes.
Would you please review it again? 🙇

@github-actions github-actions bot added the Stale label May 26, 2024
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any progress since the last review, @coodercl?

@@ -162,6 +162,9 @@ dependencies {
}
testImplementation libs.brave6.context.slf4j
testImplementation libs.brave6.instrumentation.http.tests

// akka
implementation libs.akka.actor.typed.v213
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As others pointed out, we can go without this dependency.

@@ -59,7 +62,7 @@ final class DefaultUnhandledExceptionsReporter implements UnhandledExceptionsRep
}

@Override
public void report(Throwable cause) {
public void report(ServiceRequestContext ctx, Throwable cause) {
if (reportingTaskFuture == null && scheduledUpdater.compareAndSet(this, 0, 1)) {
reportingTaskFuture = CommonPools.workerGroup().next().scheduleAtFixedRate(
this::reportException, intervalMillis, intervalMillis, TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we lazily start the periodic timer task that logs and clears the exceptions and their count mapping every few seconds. The timer is started only once since it's guarded by the compareAndSet() call.

The reason we do this way is because otherwise the recently reported exceptions will not be logged until the 'next' exception is reported, which is not always the case.

@coodercl
Copy link
Author

coodercl commented Jun 7, 2024

Oh, @trustin nim, sorry for the delay in updating. 😢 🙏
Let me update this PR by next week at the latest!
Thank you so much for the detailed explanation! 🙇 🙇

@github-actions github-actions bot removed the Stale label Jun 8, 2024
@coodercl
Copy link
Author

Let me ping when I'm ready for review~! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement sprint Issues for OSS Sprint participants
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance DefaultUnhandledExceptionsReporter for Better Exception Reporting
4 participants