Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdotnet application crash when GetFunction<setup_Rmainloop>()(); #139

Open
armgong opened this issue Jan 9, 2021 · 14 comments
Open

Rdotnet application crash when GetFunction<setup_Rmainloop>()(); #139

armgong opened this issue Jan 9, 2021 · 14 comments

Comments

@armgong
Copy link

armgong commented Jan 9, 2021

wrtie a very simple c# application (build it and rdotnet both x64) , R 4.0.3 x86_64 installed at c:/soft/r ,but when install I choose no store information in registry

using System;
using RDotNet;
namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            REngine.SetEnvironmentVariables("c:/soft/r/bin/x64", "c:/soft/r");
            var engine = REngine.GetInstance(); //crash here 
            engine.Evaluate("letters[1:26]");
            Console.WriteLine("Hello World!");
        }
    }
}
trace to source REngine.cs
public void Initialize(StartupParameter parameter = null, ICharacterDevice device = null, bool setupMainLoop = true)
{
....
 GetFunction<setup_Rmainloop>()(); //crash here 
@sWW26
Copy link

sWW26 commented Mar 20, 2021

I'm hitting the same problem using R 4.0.4 in the default install location on Windows

@GeorgeS2019
Copy link

GeorgeS2019 commented Mar 29, 2021

R4.0.3 and R4.0.4 fail, R4.0.2 is OK for Windows
GetFunction<setup_Rmainloop>()(); //crash here

Reported previously

Possible relevant discussion

@lrasmus
Copy link
Contributor

lrasmus commented Aug 5, 2021

Have (finally, sorry it took so long...) started looking into this. Not sure if anyone else had made progress or has suggestions on places to look. I started by scanning the R changelog for 4.0.3, but nothing jumped out. Spent some time peeking at commits, but nothing jumped out either. Will dive more into individual commits, but realize that may just be a time sink.

@lrasmus
Copy link
Contributor

lrasmus commented Aug 13, 2021

I'm not sure if I'm chasing a red herring here, but here's what I've found so far.
SPOILER No answers or solutions yet.

First, just reproduced everything that was reported across various threads (not that I doubted any of you!)

  • Windows 10 + R 4.0.3 x64 - crash
  • Windows 10 + R 4.0.3 x86 - no crash
  • Windows 10 + R 4.0.2 x64 - no crash

Next step, let's see what we can get from the debugger. Visual Studio was showing me a crash coming from msvcrt.dll (TestCLI.exe is just my simple wrapper program):

Unhandled exception at 0x00007FF8FB540A70 (msvcrt.dll) in TestCLI.exe: RangeChecks instrumentation code detected an out of range array access.

I was hoping to get more information, so I loaded up Dr. Mingw and could see a more complete stack trace.

image

Here's my first hint that I may be chasing something wrong - when I look at setup_Rmainloop, I don't see it directly calling locale2charset, so it's not clear to me how it got there. I tried to fumble my way around that, but came up empty.

Then, I thought I'd spend some time looking at the changes from R 4.0.2 and 4.0.3. Nothing really jumped out at me around the call to locale2charset.

But... one thing that DID catch my eye my eye was a change in src/gnuwin32/fixed/h/psignal.h in how it's calling setjmp. We don't see that in the stack trace, but we DO see longjmp, so it seems like it could be related, especially as one Go-related blog post discussed something similar.

Where this ended up then is a Trend Micro post that details changes in Windows 10 Anniversary Update with how it handles jumps, and explicitly mentions __except_validate_jump_buffer (which is where the actual exception comes from).

What this led me to is the question 'What if it's related to Windows 10 or Anniversary Update`? Again, this may be totally off base but I wanted to think through it. Just for fun, I loaded up a Windows 7 VM and discovered there is no crash with R 4.0.3 x64 in Windows 7.

As I promised, this doesn't answer anything yet. It gives some clues and evidence that this may be related to the setjmp change in 4.0.3 and specifically affects Windows 10, but I can't explain why that is or why it's working for the same 32-bit build.

I'm going to keep going down this path, but wanted to share findings to date and see if anyone thinks I'm headed in the wrong direction. I'm still not comfortable with the fact that my stack trace from Dr. Mingw doesn't really match with the R source code, but it's consistently coming back to that call stack across other debuggers like procdump so... ?

@lrasmus
Copy link
Contributor

lrasmus commented Aug 16, 2021

Some progress in narrowing down the issue (will work on resolution next). It turns out I was mostly right. The stack trace I had posted was correct in the msvcrt.dll routines ultimately halting the program (but sure it was wrong about the call from the main loop to locale2charset).

With a million thanks to r-windows/r-base, I was able to quickly and easily get a local copy of R built. I confirmed the crash occurred as we've seen. I then commented out the setjmp defines that took place in R 4.0.3, rebuilt R and.... no crash on x64.

So it does seem this change in R 4.0.3 is what's causing our issue. It's still not 100% clear why it's only affecting x64, but these are the confirmations I needed to start figuring that out. In addition, having the ability to build R locally will allow me to enable debug symbols.

@lrasmus
Copy link
Contributor

lrasmus commented Aug 20, 2021

Still no answers, but making my way through investigation.

I adapted a simple C program from the main R code base that sets up and calls setup_Rmainloop. This works - it does not crash. I then created a more stripped down C# program that mimics what RDotNet does, but it strips out a lot of extra code and removes the dependency on DynamicInterop. I did not have any reason to suspect that package, but I wanted to remove potential confounding factors. This still crashes in the same spot.

Code is here - https://github.com/lrasmus/rdotnet-r403-repro (it's dirty, just set up to work for me, but you can at least see what I'm doing).

So while we appear to be calling the R.dll in the 'same' way from a C and a C# application, one works (C) and the other doesn't (C#).

@lrasmus
Copy link
Contributor

lrasmus commented Aug 21, 2021

Able to find the root cause. This is apparently tied to Windows security feature Control Flow Guard.

Using dumpbin /HEADERS on my C# test application, we can see that it has CFG enabled

           C160 DLL characteristics
                   High Entropy Virtual Addresses
                   Dynamic base
                   NX compatible
                   Control Flow Guard   <------ Here it is
                   Terminal Server Aware

Running dumpbin /HEADERS on the C test application (which wasn't crashing), there is no CFG enabled:

              0 DLL characteristics

I am not going to recommend this en masse because it is decreasing security checks, but I can confirm that following these instructions and disabling CFG, my C# application no longer crashes. And just for fun, I enabled CFG on the C test app that was working... you guessed it - the crash starts appearing.

image

Probably an easier way to do this is to disable CFG for any dotnet program.

@lrasmus
Copy link
Contributor

lrasmus commented Aug 24, 2021

Proposed Solution - however, please know that this is removing the Control Flow Guard (CFG) security feature enabled by default in .NET apps, so it is up to you to perform a risk assessment and make an informed decision.

This is following the previously posted steps to disable CFG for a dotnet program. From what I can tell, this must be done on the host application. So I don't believe this can be done on the official rdotnet assembly (I had tested this at least, and it didn't work for me).

Here's what you'll need to do if you are building from Visual Studio:

  1. Either add to your system path the location of the link.exe command, or find the full path of that utility. This path is set in the x64 Native Tools Command Prompt, but was not set in Visual Studio 2019. If I run the Native Tools Command prompt, running which link gave me the path I needed.
  2. For your project, in Solution Explorer right click and select "Properties".
  3. Select "Build Events"
  4. Add an entry to the Post-build event command line. Depending on if you're using the full path or if you have link available from the command line, you'll need to adjust this accordingly:
    "c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\bin\HostX64\x64\link.exe" /EDIT /GUARD:NO $(TargetDir)$(TargetName).exe
  5. Clean and rebuild the project

A few things about step 4. In my setup, I was doing this for a test executable. The TargetPath macro actually goes to the DLL that is built as part of the compilation, not the EXE. We need the link command to run on the host, which is the EXE and not the DLL. So I had to put this together with a few other macros. Also, it appears that CFG is inherited, so I didn't need to do this on every .NET assembly that my application is using.

@lrasmus
Copy link
Contributor

lrasmus commented Aug 27, 2021

FYI - this has been reported to R-core: https://bugs.r-project.org/show_bug.cgi?id=18180

@armgong
Copy link
Author

armgong commented Sep 20, 2021

good to hear the cause of this issue wase founded, now the ball is in r-core or mingw or gcc team's court

@UnclAlDeveloper
Copy link

Irasmus, I had the same issues trying to convert to .Net 5.0, R 4.1.1 on Windows Server 2016. I had cloned the rdotnet and dynamic-interop gits and was seeing a problem at exactly the place. REngine worked fine until .Net 4.7.2 but not under .Net 5.0. Your work-around was able to solve the problem (at least until there is a permanent fix made).

@lrasmus
Copy link
Contributor

lrasmus commented Dec 1, 2021

Progress - it sounds like the R team has found and proposed a fix! This is still in the development branch of R, but exciting that it seems we have a resolution on the horizon: https://bugs.r-project.org/show_bug.cgi?id=18180#c15

@nhirschey
Copy link

@lrasmus, thanks for all your work to get this debugged and fixed with the R team. Looks like we're all good now on the recently released 4.2.0:

#r "nuget: R.NET"

open RDotNet    

REngine.SetEnvironmentVariables("c:/program files/r/r-4.2.0/bin/x64", "c:/program files/r/r-4.2.0")

let  engine = REngine.GetInstance()

engine
(* // output confirms on 4.2.0
val it: REngine = RDotNet.REngine {AutoPrint = false;
                                   BaseNamespace = RDotNet.REnvironment;
                                   Compatibility = ALTREP;
                                   Disposed = false;
                                   DllVersion = "4.2.0";
                                   EmptyEnvironment = RDotNet.REnvironment;
                                   EnableLock = true;
                                   Filename = "R.dll";
                                   GlobalEnvironment = RDotNet.REnvironment;
                                   ID = "R.NET";
                                   IsRunning = true;
                                   NaString = RDotNet.SymbolicExpression;
                                   NaStringPointer = 2240532559840n;
                                   NilValue = RDotNet.SymbolicExpression;
                                   UnboundValue = RDotNet.SymbolicExpression;}
*)
engine.Evaluate("letters[1:26]").AsCharacter()
// val it: CharacterVector = seq ["a"; "b"; "c"; "d"; ...]

@lrasmus
Copy link
Contributor

lrasmus commented Apr 29, 2022

That's so good to hear @nhirschey, thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants