Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session lock does not expire if SessionReceiver's PrefetchSize is larger than the number of messages in the session #694

Open
michaelmcmaster opened this issue Jan 26, 2024 · 2 comments

Comments

@michaelmcmaster
Copy link

michaelmcmaster commented Jan 26, 2024

Description

If the Service Bus receiver uses a PrefetchSize that is larger than the number of messages in the Service Bus session, the session lock seems to be automatically (and indefinitely) renewed until the client connection is closed... and allows a malfunctioning client to "hang" a session indefinitely.

Related Observations

  • It does not matter if the session is from a Queue or a Subscription - the failure mode exists with both
  • Since the session lock never actually expires, the DeliveryCount on the received (but not completed) message(s) is not incremented
  • Doing a (single) ReceiveMessage with a non-zero PrefetchSize appears to behave the same as a (batch) ReceiveMessages with a maxMessages of the same value

I cannot determine if the session lock is being automatically renewed by the client or by something server-side. I don't see any activity in the client logs that indicates the client is (automatically) renewing the session lock. The session lock is released (on the server) if the connection between the client and the server is severed.

This issue appears to be related to the AMQP transport. Running a similar test using the older WindowsAzure.ServiceBus client using SBMP transport works as expected (ie. session lock is lost regardless of number of messages), but switching the transport to AMQP behaves identical to the Azure.Messaging.ServiceBus client (ie. session lock doesn't expire, as outlined in the issue).

Recreate

I posted a Visual Studio 2022 solution (console application) that recreates the issue to GitHub. Informational logs are written to the console, while trace logs are written to a file in the working directory.

This application can be pointed to a ServiceBus, and it will:

  • Bootstrap (create or delete and recreate) a fresh queue (default name of 'session_lock_failure')
    • RequiresSession set to true
    • LockDuration set to 15 seconds (intentionally short for quicker testing, but the issue recreates with any time span)
  • Send a configurable number messages to the queue
    • all messages have the same session identifier
  • Using the Service Bus client, performs an AcceptNextSession operation
    • ReceiverOptions set to a configurable PrefetchSize
  • Using the sessionReceiver from ^^^, performs a ReceiveMessage operation
  • Simulates a "slow receiver" by delaying until the sessionReceiver's SessionLockedUntil is reached
  • Delays for an additional amount of time (amount of additional delay doesn't seem to matter)
  • Using the sessionReceiver and receivedMessage from ^^^, performs a CompleteMessage operation

ISSUE: In this ^^^ scenario, the CompleteMessage should always result in a SessionLockLost exception, but if the PrefetchSize is larger than the number of messages in the session, the session lock is never lost (remains locked indefinitely) and the message(s) are successfully completed (removed from the queue).

Command Line Options

-c, --connection    Required. Service Bus connection string (Manage, Send, Listen)
-m, --messages      (Default: 1) Number of messages to put into Service Bus (single session)
-p, --prefetch      (Default: 2) Service Bus receive prefetch size
-q, --queue         (Default: session_lock_failure) Service Bus queue name

Scenario 1 (OK) : messages >= prefetch

With this scenario, the Service Bus behaves according to official documentation. During the delay, the server-side expires the session lock and a SessionLockLost exception is thrown when the client-side attempts (after the delay) to complete the messages.

Command Line: SessionLockFailure.exe -c "******" -m 2 -p 2

2024-01-26T15:34:45.8599208-06:00 [INF] [1] SessionLockFailure running
2024-01-26T15:34:47.8922086-06:00 [INF] [10] ServiceBus client connected
2024-01-26T15:34:47.9612183-06:00 [INF] [10] Sending partial batch: [1]
2024-01-26T15:34:48.4635329-06:00 [INF] [5] Sent [2] messages in [0.57] seconds (3.49 msg/s).
2024-01-26T15:34:48.5755560-06:00 [INF] [10] AcceptNextSession: SessionId:[0], LockedUntil:[2024-01-26T15:35:03.5221389-06:00]
2024-01-26T15:34:48.6002566-06:00 [INF] [7] Delay:[00:00:14.9220584] to allow session lock to expire
2024-01-26T15:35:03.5301752-06:00 [INF] [7] Delay:[00:05:00] for extra measure
2024-01-26T15:40:03.5212820-06:00 [INF] [34] CompleteMessage: SessionId:[0], SequenceNumber:[1]
2024-01-26T15:40:03.5328858-06:00 [WRN] [34] CompleteMessage: Session lock lost (expected)
Azure.Messaging.ServiceBus.ServiceBusException: The session lock has expired on the MessageSession. Accept a new MessageSession. TrackingId:*****, SystemTracker:***:***:amqps://******/***;0:7:8:source(address:/session_lock_failure,filter:[com.microsoft:session-filter:]), Timestamp:2024-01-26T21:35:03 (SessionLockLost).

Scenario 2 (Failure) : messages < prefetch

With this scenario, the Service Bus misbehaves (session lock is held indefinitely). During the delay, the server-side does not expire the session lock. The (server-side) session lock is being indefinitely maintained by the client connection... causing the session to be indefinitely stalled until the client connection is terminated. This can be further confirmed by attempting an AcquireNextSession + Receive (ex. from ServiceBusExplorer) during the delay period. The messages are successfully completed when the client-side attempts (after the delay) to complete the messages... but they shouldn't be, as the session lock should have been lost.

Command Line: SessionLockFailure.exe -c "*****" -m 1 -p 2

2024-01-26T14:48:22.8077862-06:00 [INF] [1] SessionLockFailure running
2024-01-26T14:48:25.0585689-06:00 [INF] [10] ServiceBus client connected
2024-01-26T14:48:25.1456060-06:00 [INF] [10] Sending partial batch: [1]
2024-01-26T14:48:25.7762166-06:00 [INF] [10] Sent [1] messages in [0.72] seconds (1.39 msg/s).
2024-01-26T14:48:25.8877994-06:00 [INF] [10] AcceptNextSession: SessionId:[0], LockedUntil:[2024-01-26T14:48:40.7684016-06:00]
2024-01-26T14:48:25.9198234-06:00 [INF] [10] Delay:[00:00:14.8487970] to allow session lock to expire
2024-01-26T14:48:40.7726083-06:00 [INF] [10] Delay:[00:30:00] for extra measure
2024-01-26T15:18:40.7755738-06:00 [INF] [137] CompleteMessage: SessionId:[0], SequenceNumber:[1]
2024-01-26T15:18:41.9229045-06:00 [ERR] [138] FAILURE: The sesion lock *should* have been lost, but was not
@michaelmcmaster michaelmcmaster changed the title Session lock does not expire if receiver's PrefetchSize is larger than the number of messages in the session Session lock does not expire if SessionReceiver's PrefetchSize is larger than the number of messages in the session Jan 26, 2024
@EldertGrootenboer
Copy link
Contributor

Thank you for your feedback. We have opened an investigation task for this in our backlog, and will update this issue when we have more information.

@EldertGrootenboer
Copy link
Contributor

This item in our backlog, however we currently don't have an ETA on when development might start on this. For now, to help us give this the right priority, it would be helpful to see others vote and support this item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants