Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

Question: Is there a special reason for using libunwind on systems with an elf based file format? #8305

Open
RalfKornmannEnvision opened this issue Sep 9, 2020 · 10 comments

Comments

@RalfKornmannEnvision
Copy link
Contributor

I totally understand that it's the natural solution to include the unwind information as DWARF when the object file format is ELF. And then use libunwind to make use of this data. It's the same way LLVM based compilers do it. 

But while stepping through the actual unwind function (stepWithDwarf) in the past  could not avoid to notice that this whole process is very inefficient. First it needs to decode and parse the dwarf FDE. After this it loops over all possible registers to check if they are even used and if the code needs to figure out what type of register it is and how it needs to be handled. This loop might not that bad on X86 and X64 with only 8 and 32 possible registers. But for ARM64 there are already 95 and it got even worse with ARM that uses 287.

This all might not be a big issue for C++ and similar languages were exceptions are literal exceptions or not used at. But with C# and .Net we have a GC that needs to walk the stack quite regular.

If I understand it correct the regular VM for .Net Core/.Net 5 has a custom stack walk that is more in line with what MSVC does:
ARM: https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=vs-2019
AMD64: https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2019
ARM64: https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=vs-2019 

This looks way more efficient. 

Therefore I like to ask if there is a reason to not use a custom stack walk solution in CoreRT, too?
I must confess that I haven't profiled how bad it is in reality. But as GC pause are one of the bigger issues with game development in C# I get a little bit uneasy when seeing all these loops in the stepping code.

@MichalStrehovsky
Copy link
Member

I think the main reason was that it was the fastest to bring up. Same reason why we use LLVM to write out object files instead of a handwritten PE/ELF/Mach-O emitter.

There's some discussion around that here: #3784. Again, deferring to @jkotas opinion but I think we would still want a handwritten unwinder eventually.

@RalfKornmannEnvision
Copy link
Contributor Author

Sorry, I might need to start to do better searches on old issues. 

If there is still interest in replacing libunwind for the managed code parts I might take this on for AMD64 & ARM64 & ELF/DWARF based systems. We have a huge interest in getting the GC as fast as possible for these Platforms. 

@jkotas
Copy link
Member

jkotas commented Sep 9, 2020

As @MichalStrehovsky said.

@RalfKornmannEnvision
Copy link
Contributor Author

I took another look at this and think the best way to implement this is by adding some "Fast Unwind Data" to the LSDA. 64 or even 32 bits should be enough. This will only need a new flag and some additional data add the end of the currenz LSDA block. 

If the code needs to unwind it can check if the flag is set and use the fast path if not it can fall back to the libunwind solution that is already there. 

This way we don't need to touch the DWARF unwind data at all during runtime for most of the managed code. The calculation of the "Fast Unwind Data" could be done in the compiler. instead. If it detects that it is not a simple case the flag and data are just not set.

@jkotas
Copy link
Member

jkotas commented Sep 14, 2020

I would prefer to avoid having two different variants of unwind data. Stack unwinding is a bug farm for hard to debug crashes. Having two very different paths makes it much worse.

@RalfKornmannEnvision
Copy link
Contributor Author

I understand your concerns. If we need to handle the unwinding of all managed and unmanaged code with a single code path the options for optimization will be limited. The only thing I can see is the loop over all registers. If this is done processor specific it can be unrolled and the checking for the register type is needed. As we have no control over the unwind data that the c compiler generates we still need the full parse engine for the Unwinddata.

@jkotas
Copy link
Member

jkotas commented Sep 14, 2020

libunwind in CoreRT is only used to unwind RyuJIT generated code. It just needs to handle unwind codes that RyuJIT is producing.

@RalfKornmannEnvision
Copy link
Contributor Author

Than I misunderstood the linked old discussion. It sounded like there is need to walk code that is generated by RyuJit and code that comes from the C compiler. If it's just the RyuJit code we can solve this with just one code path.

@MichalStrehovsky
Copy link
Member

Yeah, the conversation in #3784 diverged to CoreCLR in the second post. CoreCLR has a lot of "manually managed" C++ code that the GC needs to operate on. CoreRT avoided all the trouble associated with that by just writing everything in C#.

@RalfKornmannEnvision
Copy link
Contributor Author

Implemented a custom unwinder for ARM64 #8345
Tested it by letting it run in parallel with the libunwind solution and comparing the produced regdisplay values from both.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants