New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Overriding the default behavior in case of unhandled exceptions and fatal errors. #101560
Comments
Defines |
I wonder if having the messages etc. as 16 bit strings is the right way. Our coreclr_xxx hosting APIs use 8 bit (UTF-8) strings so that users on Unix don't have to convert them using some external library. 16 bit strings are very unusual on Unix and on Windows, it is trivial to do the conversion if needed thanks to the Windows APIs. |
We had to choose between The key observations were -
|
In case the native handler on Unix would want to do anything sensible with these strings, even just writing them to a text log file, I think it would most likely need to convert them. In the runtime, we can easily convert them into stack allocated buffers without disturbing the process state. I still think it would be better if whatever string enters or leaves the runtime from the native side was 8 bit. Similar to the hosting APIs, it would add a little overhead for the handler on Windows, but much less than the overhead that would otherwise be needed on Unix. |
The approach in question was to have a |
How complex is UTF-16 to UTF-8 conversion? Is there IO or allocations? |
There will be allocations for sure. The IO is nil or very limited if done from managed code if I recall. The native side trans-coding is also new and I believe avoids IO. It isn't the worst idea if it is just for logging. The tricky bit is if the callback comes back into managed though - that is my biggest concern. |
It can be ~100 - ~1000 lines depending on how optimized implementation you want. The problem is that there is no broadly available standardized method to do this conversion on non-Windows platforms, so everybody using this API on non-Windows would have to copy the conversion routine from somewhere. As a (usability) test for this API, you can write a sample handler that produces .json that captures the failure details (similar to .json that is produced by native AOT crash handler today). You will hit the UTF16 conversion problem pretty much immediately.
Calling managed implementation is not an option for the crash handlers. |
If a string comes from native assert (like from GC), running managed code might not be an option. |
Right, they will likely need to use ICU or some other library.
Excellent
Fair. This implies UTF-8 should be the preference then. |
That is not too bad. |
ICU is big, complicated library. It is not something you want to depend on in a crash handler. I think that the most appropriate solution would be to include source for the simplest possible conversion from somewhere. |
Is it possible to get content of I suggest to do it on demand because it will create issues for |
If conversion to UTF-8 is going to be needed in majority of scenarios (especially on Unix), I think we should do UTF-8 rather than expect that the end user will convert. I'll update the proposal to have |
On the other hand, a fatal error can happen only once in the life of an app, so perhaps performance of this API will not be as critical as in scenarios that can repeat. |
@vladimir-cheverdyuk-altium has a good point about having an option to suppress computation of the textual stacktrace or doing other non-trivial operations if the crash handler is not interested in the information. Such non-trivial operations can hit secondary crashes or otherwise interfere with the diagnosability of the original unhandled exception. I had similar concern in the original discussion. |
Runtime's implementation (which is pretty much standalone) could also be an option for users. e.g. this test code#include <stdio.h>
#include <errno.h>
#include <minipal/utf8.h>
size_t wstrlen(CHAR16_T* str)
{
size_t len = 0;
while (str[len++]);
return len;
}
void handleErrors()
{
if (errno == MINIPAL_ERROR_INSUFFICIENT_BUFFER)
{
printf ("Allocation failed (%d)", errno);
abort();
}
else if (errno == MINIPAL_ERROR_NO_UNICODE_TRANSLATION)
{
printf ("Illegal byte sequence encountered in the input. (%d)", errno);
abort();
}
}
int main(void)
{
CHAR16_T wstr[] = u"ハローワールド! 👋 🌏";
int wlen = wstrlen(wstr);
size_t mblen = minipal_get_length_utf16_to_utf8(wstr, wlen, 0);
handleErrors();
char* mbstr = (char *)malloc((mblen + 1) * sizeof(char));
size_t written = minipal_convert_utf16_to_utf8 (wstr, wlen, mbstr, mblen, 0);
handleErrors();
printf("Conversion completed. mblen: %zu, mbstr: %s\n", written, mbstr);
return 0;
} can be built with sources acquired from tag link: $ curl --create-dirs --output-dir external/minipal -sSLO \
"https://raw.githubusercontent.com/dotnet/runtime/v8.0.4/src/native/minipal/{utf8.c,utf8.h,utils.h}"
$ clang -Iexternal test.c external/minipal/utf8.c -o testutf8
$ ./testutf8
Conversion completed. mblen: 33, mbstr: ハローワールド! 👋 🌏 |
At the time of reporting the crash it would be too late to decide whether FailFast should have captured the exception message for the unhandled exception or not. We could add a bool parameter to The benefits of that would be slightly faster handling of the crash and more robust behavior in case Cases like OOM or stack overflow are always tricky. Also Basically - we can add a knob to Is it really that often that |
As it was stated in #98788, ex.ToString will make COM calls slow if it was in the call stack and it is why I'm concerned. |
Perhaps I do not understand the scenario... When a managed exception is unhandled(*), the runtime will call
I understand the reliability concern about (*) Unhandled here means - "really unhandled". If there was |
I am not worried about performance. I am worried about providing the right information to the crash handler, without introducing unnecessary failure points between the failure point and the call of the crash handler. On one end, you can have crash handlers that just want to capture crashdump for offline processing and do nothing else. Computing the textual stacktrace is source of unnecessary failure points for those. On the other end, you can have crash handlers that want to capture many details like stacktrace with file and line numbers if possible. The design should accommodate the whole spectrum. (We do not need to implement everything in the first round, but we should have an idea for how we would go about covering the whole spectrum.) |
@VSadov
COM calls slows down after ex.ToString called if some module is on the call stack. Some code that walks stack converts module to read/write mode from read-only mode and as result search became linear instead of binary. |
I think the suggested API shape is sufficient for the first round. Then we can see how the API will be used and if/how it can be made better. If we find that we really need to allow configuring Basically - I think there is a way to react to the need to configure, but maybe there is no need. In such case having a simpler API would be better. |
It assumes that the handler wants the fixed information for all types of crashes. It is not necessarily the case. I am coming back to the callback idea: By default, the crash handler should only get information that does not require any extra memory allocations or substantial effort to compute. Everything else should be provided via callbacks. I think it is the most future proof and flexible design. |
Callbacks are more flexible, but may require that there is some state that can be interrogated. Right now we can store all the relevant info that we have to the stack and then invoke the handler while passing the pointers. If the handler can come back for more info, we would need to store that info ahead of time somewhere (likely not on stack). We could defer some things like converting to UTF-8, but assuming that is relatively trivial, it may be not something that matters to defer. Alternatively, we could provide the extra info by digging through the memory of the dead process - like debugger does. Maybe that would be better served by just exposing debugging APIs? An additional advantage for that would be that it might also work over a core dump, provided that debug info is available. As for configuring the behavior of This question would be asked only once per lifetime of the process, so there is not a lot of advantages over pre-configuring that via a bool parameter when setting up a handler. |
Maybe that callback can return flags that will specify what optional fields of FatalErrorInfo it should populate? |
Can this proposal be updated to use a public delegate void UnhandledExceptionHandler(UnhandledExceptionHandlerEventArgs e);
public sealed class UnhandledExceptionHandlerEventArgs
{
private TaskCompletionSource? m_tcs;
private sealed class UnhandledExceptionHandlerDeferral(TaskCompletionSource tcs) : IDisposable
{
public void Dispose() => tcs.TrySetResult();
}
public Exception Exception { get; }
public IDisposable GetDeferral() => new UnhandledExceptionHandlerDeferral(m_tcs ??= new());
public bool Handled { get; set; }
} Usage 1: synchronized handler void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
e.Handled = true;
} Usage 2: asynchronized handler async void MyHandler(UnhandledExceptionHandlerEventArgs e)
{
using var deferral = e.GetDeferral();
await MyAsyncHandlerMethod();
e.Handled = true;
} |
There is nothing else that the thread dealing with the unhandled exception can do until the unhandled exception was dealt with, so there is no advantage in complicating this API with async. If you handler needs to call async methods, it should use Task.Wait. |
I've made changes to the fatal error handling API that allow querying for the error text if needed.
If the custom logger prints the provided text to console, it will achieve the same behavior as in the default handler. The same pattern may be followed in the future to allow querying for some other data, not necessarily textual. |
Nit: |
Yes, that may be desirable. I’ll modify the description to allow for sending that string in pieces. |
Updated for possibility of multiple calls to the |
void (*pfnGetFatalErrorLog)(
FatalErrorInfo* errorData,
void (*pfnLogAction)(void *userContext, char8_t* logString),
void* userContext); The above signature needs explicit calling conventions. I assume we want
The |
Re: The previous proposal and a discussion that led to this proposal - (#42275)
Background and motivation
The current default behavior in a case of unhandled exception is termination of a process.
The current default behavior in a case of a fatal error is print exception to console and invoke Watson/CrashDump.
While satisfactory to the majority of uses, the scheme is not flexible enough for some classes of scenarios.
Scenarios like Designers, REPLs or game scripting that host user provided code are not able to handle unhandled exceptions thrown by the user provided code. Unhandled exceptions on finalizer thread, threadpool threads or user created threads will take down the whole process. This is not desirable experience for these types of scenarios.
In addition, there are customers that have existing infrastructure for postmortem analysis of failures and inclusion of .NET components requires interfacing with or overriding the way the fatal errors are handled.
API Proposal
API for process-wide handling of unhandled exception
The semantics of unhandled exception handler follows the model of imaginary handler like the following inserted in places where the exception will not lead to process termination regardless of what
handler()
returns.In particular:
false
.(Whether the infrastructure thread continues or restarted is unspecified, but the process should be able to proceed)
main()
will not install the try/catch like aboveAPI Proposal for custom handling of fatal errors
Managed API to set up the handler.
The shape of the FatalErrorHandler, if implemented in c++
(the default calling convention for the given platform is used)
With
FatalErrorHandlerResult
andFatalErrorInfo
defined in "FatalErrorHandling.h" under src/native/public:API Usage
Setting up a handler for unhandled exceptions:
Setting up a handler for fatal errors:
Setting up the handler for the process (C# code in the actual app):
The handler. (c++ in
myCustomCrashHandler.dll
)Alternative Designs
Unmanaged hosting API that enables this behavior. (CoreCLR has undocumented and poorly tested configuration option for this today. #39587. This option is going to be replaced by this API.)
Extending
AppDomain.CurrentDomain.UnhandledException
API and makeIsTerminating
property writeable to allow "handling".Upon scanning the existing use of this API it was found that
IsTerminating
is often used as a fact - whether an exception is terminal or not. Changing the behavior to mean "configurable" will be a breaking change to those uses.Risks
This APIs can be abused to ignore unhandled exceptions or fatal errors in scenarios where it is not warranted.
The text was updated successfully, but these errors were encountered: