Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channels that have had exceptions but not explicitly closed can be recovered via connection recovery #1085

Open
lukebakken opened this issue Aug 3, 2023 · 3 comments
Assignees
Labels

Comments

@lukebakken
Copy link
Contributor

Describe the bug

Reported here: https://groups.google.com/g/rabbitmq-users/c/lhLt0pkCFv4


Observed the following and wondering if it is expected or considered to be a possible issue.

Using Java 5.13.1 rabbitmq client library against RabbitMQ 3.11.18 Erlang 25.3.2.2 cluster.

Testing several recovery scenarios, specifically the documented behavior "Channel-level exceptions will not trigger any kind of recovery as they usually indicate a semantic issue in the application (e.g. an attempt to consume from a non-existent queue)."

In my test:

  • Allocate auto-recovering connection
  • Allocate channel from the connection
  • basicConsume() from a busy queue
  • Deliberately app-error the channel closed so it won't autorecover (in my case I called basicAck() with an invalid msg id on the channel)
  • As expected, consumption stops, incoming events to the queue do not fire consumer callbacks

What surprised me:

With my app in the above state, in the Rabbit UI, I performed a "Force Close" of the connection. When the connection recovered, it also recovered the channel and consumer callbacks started firing again with new events.

I had not explicitly closed the channel after the application error since I assumed it to be closed "underneath" me, and it did not recover as long as its parent connection remained stable. But when the parent connection was forced to recover it did recover the channel closed by protocol error which seems to go against the documentation.

Reproduction steps

See above.

Expected behavior

The documentation should be changed to reflect this edge case.

Additional context

No response

@lukebakken lukebakken added the bug label Aug 3, 2023
@michaelklishin
Copy link
Member

Channel exceptions could delete the channel from its connection's known channel list.

I cannot immediately think of any risks around doing so.

@rbramante
Copy link

I think deleting the channel from the connection's known channel list in this case is the best fix if possible as otherwise I think this runs the risk of channel leaks.

I tried to explicitly close the channel with the protocol error but you cannot close it because:

com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 9, class-id=60, method-id=80)
	at com.rabbitmq.client.impl.AMQChannel.processShutdownSignal(AMQChannel.java:401) ~[amqp-client-5.13.1.jar:5.13.1]
	at com.rabbitmq.client.impl.ChannelN.startProcessShutdownSignal(ChannelN.java:287) ~[amqp-client-5.13.1.jar:5.13.1]
	at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:608) ~[amqp-client-5.13.1.jar:5.13.1]
	at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:542) ~[amqp-client-5.13.1.jar:5.13.1]
	at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:535) ~[amqp-client-5.13.1.jar:5.13.1]

I also tried to abort the channel which does not result in AlreadyClosedException but does not otherwise change behavior -- if the underlying connection were to error and recover, this channel is "resurrected".

If the application wants to try and recover from its error it must allocate a new channel since this channel is currently defunct. It cannot close the current channel. If the connection has a recovery event later then this old channel will reactivate. This seems like a leak? The only way I currently could see around this would be for the application to track all channels with protocol errors and track recovery events for these channels so that if they recover at a later date it can close them then to avoid a leak (and connection recovery will never come if the environment is otherwise stable).

I'm not sure most SDK users would expect this. I think the assumption would be the tracking of Channels for the Connection in the SDK layers would remove application-errored Channels from its lists of Recoverables.

@michaelklishin
Copy link
Member

Making Channel#close idempotent also seems to be a good idea. Connection#close is, for example. I see some logic to reporting such asynchronously reported errors but it may be doing more harm than good.

@lukebakken lukebakken self-assigned this Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants