Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation server does not reconnect to downstream servers when server is restarted or network connection is lost #312

Open
pc-avatar-7076 opened this issue Apr 12, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@pc-avatar-7076
Copy link

pc-avatar-7076 commented Apr 12, 2022

Scenario 1: Downstream Server Disable Network Adapter

Test:

  1. Configure Aggregator (dev host machine) to connect to the OPCF reference server (VM running on dev host)
  2. Start Aggregator (Debug with IDE) and verify connection/steady state behavior of “metadata” session
  3. Disable VM network adapter and observe behavior of metadata session
  4. Wait until the Aggregator keep alive status is “late”
  5. Enable VM network adapter and observe behavior of metadata session

Results:
TL;DR - Once the adapter is reenabled, the Aggregator continues to attempt to “renew” the secure channel. However, the OpenSecureChannelRequest fails with a ServiceFault (BadTcpSecureChannelUnknown). See “opcf-reference-aggregator-southbound-disable-adapter-opcf-reference-server.pcapng”, attached (as zip).

Note: Using UA-.NETStandard tag 1.4.368.53 and UA-.NETStandard-Samples tag preview on commit e17387d

Wireshark Capture:
opcf-reference-aggregator-southbound-disable-adapter-opcf-reference-server.zip

Scenario 2: Downstream Server Stop/Start
Test:

  1. Configure Aggregator (dev host machine) to connect to the OPCF reference server (VM running on dev host)
  2. Start Aggregator (Debug with IDE) and verify connection/steady state behavior of “metadata” session
  3. Stop OPCF reference server and observe behavior of metadata session
  4. Wait until the Aggregator keep alive status is “late”
  5. Start OPCF reference server and observe behavior of metadata session

Preliminary Note:
This investigation required a OPCF UA stack bugfix for the following issues:

The bug fix is available in UA-.NETStandard tag 1.4.368.53 and UA-.NETStandard-Samples tag preview on commit e17387d.

Results:
TL;DR - Aggregator has a keep alive handler with logic to reestablish the Session once the keep alive is “late”. The logic is never executed because the secure channel cannot be reestablished when the downstream server is restarted. This seems to be because the aggregator attempts to ‘renew’ the channel instead of recreating it.

Technical Details:
In steady state (Aggregator connected to OPCF reference server), the keep alive timer calls BeginRead, which increments m_outstandingRequests to 1. The counter is decremented to 0 when the read response is successfully received.

When the downstream server is shutdown, Session.OnKeepAlive calls BeginRead and the operation fails with an exception ("Could not send keep alive request: Opc.Ua.ServiceResultException BadConnectionClosed). AsyncRequestStarted is never called, so m_outstandingRequests is never incremented.

After the downstream has been restarted, the channel cannot re-establish a secure channel (it's trying to 'renew' rather than create a fresh secure channel). Session.OnKeepAlive continues to call BeginRead, which continues to throw the same exception: Could not send keep alive request: Opc.Ua.ServiceResultException BadConnectionClosed. Again, AsyncRequestStarted is never called, and m_outstandingRequests is never incremented.

@mregen
Copy link
Contributor

mregen commented May 9, 2022

This issue may be related to OPCFoundation/UA-.NETStandard#1802. Please recheck if fix is available.

@mregen mregen added the bug Something isn't working label May 9, 2022
@mregen
Copy link
Contributor

mregen commented May 9, 2022

Hi @pcameron-ptc, I think the case you describe may not be covered by the fix #1802.
Thanks for the detailed writeup to repro, we will check if this is fixed.

@pc-avatar-7076
Copy link
Author

Thank you @mregen . I pulled down UA-.NETStandard branch release/1.4.368 with the recent reconnect changes (i.e. #1802) and updated the Aggregation sample to reference my local build of the stack. I did not observe any change in behavior in regards to this defect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants