Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client will reopen secure channel with new ID without reactivating session #2538

Open
1 of 5 tasks
einarmo opened this issue Feb 26, 2024 · 6 comments · May be fixed by #2577
Open
1 of 5 tasks

Client will reopen secure channel with new ID without reactivating session #2538

einarmo opened this issue Feb 26, 2024 · 6 comments · May be fixed by #2577
Assignees
Labels
bug A bug was identified and should be fixed. compliance An issue was found which is not compliant with the OPC UA specification.
Milestone

Comments

@einarmo
Copy link
Contributor

einarmo commented Feb 26, 2024

Type of issue

  • Bug
  • Enhancement
  • Compliance
  • Question
  • Help wanted

Current Behavior

If the TCP connection is unexpectedly killed, the client will open a new secure channel with SecureChannelId 0, but it will not follow that up with ActivateSession. From what I can tell this is in violation of the standard.

What will eventually happen is that the new conection will fail, and the client will open another new secure channel where it will attempt to reactivate the session, this will only happen after keep alive timed out on the half-opened session.

Expected Behavior

The client should always reactivate sessions when creating a new secure channel before sending normal OPC UA requests on the new channel.

Steps To Reproduce

This is reproducible with a local server and a local client, including the sample client/server.

  1. Connect to an OPC-UA server.
  2. Monitor traffic in a tool like wireshark.
  3. Use tcpkill or a similar tool to send RST packets on the open connection, using the outgoing port for the client as the target, so that you only kill one connection, and not any subsequent connections the client creates.
  4. Observe the OPC-UA packets sent immediately after.

Environment

- OS: Fedora 38 (6.6.14), the original issue happened on Windows.
- Environment: Command line
- Runtime: .NET 8.0.101
- Nuget Version: 1.5.373.121
- Component: Opc.Ua.Client
- Server: ConsoleReferenceServer
- Client: ConsoleReferenceClient

Anything else?

Here's a screenshot from wireshark:

image

Note how it takes several seconds after opening a new channel before the client discovers that the channel is bad and recreates it.

@einarmo
Copy link
Contributor Author

einarmo commented Feb 26, 2024

It doesn't look to me like there is any mechanism to recover after a loss of connection at all? Presumably this happening hits here:

// check if reconnecting after a socket failure.

but there is no mechanism to communicate back to the client that the channel was reopened, from what I can tell, so the client would only be reactivated when using the SessionReconnectHandler later on. Should there be a mechanism to let the channel inform the client that it must reactivate the session? The current reconnect mechanism in the channel implementation seems to be somewhere between useless and actively harmful.

@KircMax
Copy link
Contributor

KircMax commented Mar 8, 2024

I also added some Logs for the Client in the related #2544 that I created which is a duplicate of this one.

@audunmidttunsystad
Copy link

Hi @mregen, thank you for getting back to us on this! How do you see the way forward here – and would you have any estimates or updates?

@KircMax
Copy link
Contributor

KircMax commented Mar 22, 2024

If you could maybe point out some things where I could do some pre-work for you I'd like to try to help @mregen - currently I am on vacation but afterwards I'd happily try.

@mregen mregen added bug A bug was identified and should be fixed. compliance An issue was found which is not compliant with the OPC UA specification. and removed needs investigation labels Apr 2, 2024
@mregen
Copy link
Contributor

mregen commented Apr 2, 2024

Hi @einarmo, @KircMax, @audunmidttunsystad, I was able to reproduce the issue with the sysinternals tcpview tool, close the connection and I see it happening. Thanks for reporting this special case!

I think currently of multiple ways to fix this:
i) when we go into reconnect and there is no valid service call in the beginning (e.g. CreateSession, ActivateSession, GetEndpoints etc.) fail the service calls which are currently queued with BadConnectionClosedetc. Proceed only to TCP BeginConnect once a compliant service call is made.
ii) when we go into reconnect and there is no valid service call in the beginning (e.g. CreateSession, ActivateSession, GetEndpoints etc.) queue the service calls. If there is a valid service call, queue it in the beginning and start with TCP BeginConnect. If the connect attempt times out, all queued servcie calls time out.

I think the latter option might be smoother if there is really just a small network interruption. Problem in both cases is that the reconnect handler must trigger a reconnect to get the Activate message sent and queued before the TCP BeginConnect starts.

So better were the reconnect handler is immediately triggered when the socket closes, it may also need some additional wiring. Looking into the fix now...

@mregen mregen added this to the April Update milestone Apr 3, 2024
@mregen mregen linked a pull request Apr 3, 2024 that will close this issue
13 tasks
@KircMax
Copy link
Contributor

KircMax commented Apr 16, 2024

Hi @mregen
Thanks a lot for tackling this issue - it will be really helpful for short network interruptions
Or - to give you another direct use case - when e.g. a failover of a redundant system takes place ;).

Both ways seem like valid fixes to me - From my side I'd leave this up to you to decide, you clearly have more experience and knowledge of the codebase.
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug was identified and should be fixed. compliance An issue was found which is not compliant with the OPC UA specification.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants