Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly random NullReferenceException in async state machine #234

Closed
Skyppid opened this issue May 10, 2024 · 13 comments
Closed

Seemingly random NullReferenceException in async state machine #234

Skyppid opened this issue May 10, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@Skyppid
Copy link

Skyppid commented May 10, 2024

For our DTO mappings I generate an async method that builds a proxy around a generic base implementation. Basically it takes the untyped version of a source model, casts it into it's actual model, passes it to the base method and then returns the generic DTO instead of a specific implementation of it.

So far so good, the method itself works. The odd issue is that when I run the compiled method against multiple models of the same type it sometimes fails with a NullReferenceException and sometimes it does not. I added logging instructions and null-checks in the generated method to be sure nothing is null. I checked before calling the method and also added try-catch handlers in the methods that are called from inside this generated method.

From the exception I don't get any information as to what could be the reason. By now I eliminated all sources I could possibly see and guess it might be an issue in the generated state machine itself.

@sakno Is there a way to debug the generated output? Some way to generate PDBs for it so I can step into the methods? Or is there any other way I can trace this down? I would guess there must be a slight oversight as dotnext seems to be well tested and established. But I can't find the source of this exception and stack traces are not really helpful:

Object reference not set to an instance of an object.

at lambda_method131(Closure, PoolingAsyncStateMachine2&) at DotNext.Runtime.CompilerServices.PoolingAsyncStateMachine2.System.Runtime.CompilerServices.IAsyncStateMachine.MoveNext() in /_/src/DotNext.Metaprogramming/Runtime/CompilerServices/AsyncStateMachine.Pooling.cs:line 363
--- End of stack trace from previous location ---
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at System.Threading.Tasks.ValueTask1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) --- End of stack trace from previous location --- at ChangeHistoryService.Utilities.PolymorphicMapper.MapChanges(IEnumerable1 changes, IServiceProvider services, CancellationToken cancellationToken)+MoveNext() in ...\ChangeHistoryService\Utilities\PolymorphicMapper.cs:line 152

I can provide you with the lambda body if needed to check the generated state machine. Or whatever information you need.

@Skyppid
Copy link
Author

Skyppid commented May 13, 2024

I could nail it down to the following. This snippet shows a part of the generated transition lambda:

    .Block() {
        .Switch ($var2.StateId) {
        .Case (1U):
                .Goto state_1 { }
        .Case (2U):
                .Goto state_2 { }
        .Default:
                .Default(System.Void)
        };
        // Other code
        .Block() {
            (($var2.State).Rest).Item1 = .Call (.Call (($var2.State).Item7).MapEvent(
                ($var2.State).Item1,
                (ChangeHistoryService.IterationModel.Changes.EventChangeDtoBase`1[ChangeHistoryService.IterationModel.Changes.Projects.FileCreatedChange])($var2.State).Item7,
                ($var2.State).Item2,
                ($var2.State).Item3)).GetAwaiter();
            .If (
                .Call $var2.MoveNext(
                    (($var2.State).Rest).Item1,
                    1U)
            ) {
                .Default(System.Void)
            } .Else {
                .Return end_async_method { }
            };
            .Label
            .LabelTarget state_1:;
            .Call ((($var2.State).Rest).Item1).GetResult()

The call fails when trying to execute the last line. In this case it seems like the state is not defined. I'm not really getting much insight as to what is causing this:
image

Any idea?

PS: When I call the compiled method with .Result no exception is thrown. There must be something off with the state. I have not much insight into the workings and it's hard to guess. I ran through the entire code down into the CLR. I can't find any reason why the State becomes missing. During the creation of the state machine box it seems to be there.

Help would really be appreciated here. Thanks!

@sakno sakno self-assigned this May 13, 2024
@sakno sakno added this to Opened in Metaprogramming via automation May 13, 2024
@sakno sakno added the bug Something isn't working label May 13, 2024
@sakno
Copy link
Collaborator

sakno commented May 13, 2024

Do you have repro code?

@sakno
Copy link
Collaborator

sakno commented May 13, 2024

Some way to generate PDBs for it so I can step into the methods?

Unfortunately, no. This is supported in .NET Framework only.

@Skyppid
Copy link
Author

Skyppid commented May 14, 2024

@sakno I added you to a small reproduction repo in our organization. I guess you should be notified with a link, if not tell me please.

When you run it you can see that the resulting DTO has all the "steps" but each step has only one change although it should have multiple. If you set a breakpoint in the catch handler of the PolymorphicMapper.MapChanges method, then you'll see the exception thrown.

Hope you can figure things out with this :) Thanks for the help!

Tip: Using https://github.com/MapsterMapper/ExpressionDebugger I managed to at least get a bit more information out of it. Using Rider I could step into most parts (except the generated method unfortunately). Helps a lot. It then also actually hit an exception breakpoint itself when the mapping was done where you could do a single step and land inside the AsyncStateMachine.Next method where you could inspect the state of the machine (and the empty State).

@sakno
Copy link
Collaborator

sakno commented May 14, 2024

I see the link, thanks. The problem is that I need an isolated repro code to convert it into the test.

@Skyppid
Copy link
Author

Skyppid commented May 14, 2024

Ah okay. Well, it's hard to say what is required to reproduce it. Not sure what can be stripped and how a test for that should look like... I'll try to strip everything around it as much as possible.

@Skyppid
Copy link
Author

Skyppid commented May 14, 2024

Alright, I removed everything that is not needed. Still reproduces. Should be few enough to convert it into a test, I hope. Can you work with that now?

@sakno
Copy link
Collaborator

sakno commented May 14, 2024

Reproduced with the following test:

[Fact]
public static async Task RegressionIssue234()
{
    var method1 = typeof(LambdaTests).GetMethod(nameof(DoNothingAsync), BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
    var method2 = typeof(LambdaTests).GetMethod(nameof(DoYieldAsync), BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
    var lambda = AsyncLambda<Func<Task<int>>>((ctx, result) =>
    {
        Await(Expression.Call(method2));
        Await(Expression.Call(method1));
        Assign(result, 42.Const());
    }).Compile();

    for (var i = 0; i < 100; i++)
    {
        Equal(42, await lambda.Invoke());
    }
}

private static Task DoNothingAsync() => Task.CompletedTask;

private static async Task DoYieldAsync()
{
    await Task.Yield();
}

@Skyppid
Copy link
Author

Skyppid commented May 14, 2024

Wonderful, good to hear it helped you nail it down to a small test :) I'm really curious as to what is causing this issue. I couldn't identify any issue in the control flow yet.

Btw: You should maybe consider adding some kind of donation option to your repository (if just for a coffee) ;)

sakno added a commit that referenced this issue May 14, 2024
@sakno
Copy link
Collaborator

sakno commented May 14, 2024

It's fixed. Could you check it from develop branch? The main problem was cleanup inside of IAsyncStateMachine.MoveNext implementation for pooling versions of state machine. When pooling is used, the fields should not be set after SetResult to avoid concurrency with the .NET pool of value tasks. The root case is the following:

  1. Async method reaches final state with StateId = 0
  2. It calls SetResult that resumes the caller
  3. Caller consumes ValueTask using GetResult. This method returns boxed state machine back.
  4. A new execution takes this cached state machine
  5. In the same time, the execution flow from step 2 continues and erases the fields of the boxed state machine which is taken by another async flow.

@sakno
Copy link
Collaborator

sakno commented May 14, 2024

Moreover, when PoolingAsyncValueTaskMethodBuilder returning back the boxed state machine, it sets all fields to default. No need to do that twice.

@Skyppid
Copy link
Author

Skyppid commented May 15, 2024

Looks splendid, works like a charm. I really expected it to be more, but it makes perfect sense that a pooled state machine might be reused before it should be.

Thank you very much!

@sakno
Copy link
Collaborator

sakno commented May 15, 2024

Release 5.3.1 has been published

@sakno sakno closed this as completed May 15, 2024
Metaprogramming automation moved this from Opened to Closed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Metaprogramming
  
Closed
Development

No branches or pull requests

2 participants