Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOREVIEW][NOMERGE] HTTP/3 stress data corruption hunt #101624

Closed
wants to merge 21 commits into from

Conversation

rzikm
Copy link
Member

@rzikm rzikm commented Apr 26, 2024

This PR is just for running http stress pipeline with custom code which will hopefully allow us to diagnose #76183.

@rzikm rzikm added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Apr 26, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@rzikm
Copy link
Member Author

rzikm commented Apr 26, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 26, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 27, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 27, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 27, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 27, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 1, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 1, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 1, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 2, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 2, 2024

Okay, this is interesting.

server_1  | Process terminated. Diverging at offset 163, expected 0x23, got 0x24
server_1  |    at System.Environment.FailFast(System.Runtime.CompilerServices.StackCrawlMarkHandle, System.String, System.Runtime.CompilerServices.ObjectHandleOnStack, System.String)
server_1  |    at System.Environment.FailFast(System.Threading.StackCrawlMark ByRef, System.String, System.Exception, System.String)
server_1  |    at System.Environment.FailFast(System.String)
server_1  |    at System.IO.Pipelines.Pipe.ValidateWritten(System.Span`1<Byte>)
server_1  |    at System.IO.Pipelines.Pipe.Advance(Int32)
server_1  |    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Stream.ProcessDataFrameAsync(System.Buffers.ReadOnlySequence`1<Byte> ByRef)
server_1  |    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Stream.ProcessHttp3Stream[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>, System.Buffers.ReadOnlySequence`1<Byte> ByRef, Boolean)
server_1  |    at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Stream+<ProcessRequestAsync>d__99`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
server_1  |    at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
server_1  |    at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http3.Http3Stream+<ProcessRequestAsync>d__99`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], Microsoft.AspNetCore.Server.Kestrel.Core, Version=9.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60]].MoveNext(System.Threading.Thread)
server_1  |    at System.Threading.ThreadPoolWorkQueue.Dispatch()
server_1  |    at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()

We hit validation because we apparently a byte was somehow lost. (we failed validation on the pipe which reads body contents). When I backtrace to what was read form the pipe which reads raw HTTP frames, I see

0:010> !DumpObj /d 00007fb31197f898
Name:        System.IO.Pipelines.BufferSegment
MethodTable: 00007ff324d154f8
EEClass:     00007ff324d035d0
Tracked Type: false
Size:        96(0x60) bytes
File:        /live-runtime-artifacts/testhost/net9.0-linux-Release-x64/shared/Microsoft.AspNetCore.App/9.0.0-preview.4.24223.1/System.IO.Pipelines.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff323e8bf70  400002e       18 ...Private.CoreLib]]  1 instance 00007fb31197f8b0 <Memory>k__BackingField
00007ff324d15200  400002f        8 ...Private.CoreLib]]  0 instance 00007fb311919c38 <Next>k__BackingField
00007ff32290a6e0  4000030       10         System.Int64  1 instance              521 <RunningIndex>k__BackingField
00007ff32482b268  4000014       28 ...Private.CoreLib]]  0 instance 0000000000000000 _memoryOwner
00007ff322c38860  4000015       30        System.Byte[]  0 instance 00007fb30e0b9b08 _array
00007ff324d154f8  4000016       38 ...nes.BufferSegment  0 instance 00007fb311919c38 _next
00007ff3228e1188  4000017       40         System.Int32  1 instance                2 _end
00007ff32482bf20  4000018       48 ...Private.CoreLib]]  1 instance 00007fb31197f8e0 <AvailableMemory>k__BackingField
0:010> !DumpObj /d 00007fb30e0b9b08
Name:        System.Byte[]
MethodTable: 00007ff322c38860
EEClass:     00007ff322c38810
Tracked Type: false
Size:        4120(0x1018) bytes
Array:       Rank 1, Number of elements 4096, Type Byte (Print Array)
Content:     ........................22:26:44 GMT_M.Kestrel..........................................................rLClDa1Vp5Wc937S8R42sVCE
Fields:
None
0:010> db 00007fb30e0b9b18 L10
00007fb3`0e0b9b18  0e 01 01 06 00 01 07 00-01 08 00 01 09 00 01 0a  ................
0:010> !DumpObj /d 00007fb311919c38
Name:        System.IO.Pipelines.BufferSegment
MethodTable: 00007ff324d154f8
EEClass:     00007ff324d035d0
Tracked Type: false
Size:        96(0x60) bytes
File:        /live-runtime-artifacts/testhost/net9.0-linux-Release-x64/shared/Microsoft.AspNetCore.App/9.0.0-preview.4.24223.1/System.IO.Pipelines.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff323e8bf70  400002e       18 ...Private.CoreLib]]  1 instance 00007fb311919c50 <Memory>k__BackingField
00007ff324d15200  400002f        8 ...Private.CoreLib]]  0 instance 00007fb31192e918 <Next>k__BackingField
00007ff32290a6e0  4000030       10         System.Int64  1 instance              523 <RunningIndex>k__BackingField
00007ff32482b268  4000014       28 ...Private.CoreLib]]  0 instance 0000000000000000 _memoryOwner
00007ff322c38860  4000015       30        System.Byte[]  0 instance 00007fb30e1562f0 _array
00007ff324d154f8  4000016       38 ...nes.BufferSegment  0 instance 00007fb31192e918 _next
00007ff3228e1188  4000017       40         System.Int32  1 instance               24 _end
00007ff32482bf20  4000018       48 ...Private.CoreLib]]  1 instance 00007fb311919c80 <AvailableMemory>k__BackingField
0:010> !DumpObj /d 00007fb30e1562f0
Name:        System.Byte[]
MethodTable: 00007ff322c38860
EEClass:     00007ff322c38810
Tracked Type: false
Size:        4120(0x1018) bytes
Array:       Rank 1, Number of elements 4096, Type Byte (Print Array)
Content:     #..$..%..&..'..(..)..*..Xgx8uNdGaPE57gSXUiA46pIKKxUh7H3lm37b1JTVCNhU993WJqwdZ4phNcowtw9JLSI4YqVBEesi8nABuxOHKQChDQDPUWcGEN9gqyyS
Fields:
None
0:010> db 00007fb30e156300 L10
00007fb3`0e156300  23 00 01 24 00 01 25 00-01 26 00 01 27 00 01 28  #..$..%..&..'..(

The reconstructed contents in a more readable form are

0e 01 23 00 01 24 00 01 25 00 ...

So the HTTP frame type got corrupted somehow, so the server skips the "unknown" HTTP frame and then processes the 00 01 24 frame.

Now the issue is how the frame could've got corrupted. The HTTP3 Frame pipe runs validation code which asserts that the expected HTTP3 frame bytes have been received so the corruption must have happened after the contents were written to the pipe.

@rzikm
Copy link
Member Author

rzikm commented May 2, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 2, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 2, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 3, 2024

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented May 3, 2024

Hunt successfull, root cause found

@rzikm rzikm closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Http NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant