Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOC receives camera connection failed occasionally #4

Open
AbdallaDalleh opened this issue Dec 10, 2023 · 23 comments
Open

IOC receives camera connection failed occasionally #4

AbdallaDalleh opened this issue Dec 10, 2023 · 23 comments

Comments

@AbdallaDalleh
Copy link

We have the following setup:

Camera Model: Basler acA1300
Pylon SDK 7.3
Docker-based IOC (with the minimal alpine Linux)

The IOC every couple of days receives the following error:
basler-error

This is exactly the same error I get from the Pylon Viewer when connecting to a camera that is being controlled somewhere else. The problem is that it happens every few days, to resolve it I have to stop and start the acquisition. It starts working and after few days I get the same error.

@xiaoqiangwang
Copy link
Collaborator

It would be helpful to attach the full log in text.

@AbdallaDalleh
Copy link
Author

Here is a sample log output from the IOC shell:

2023/12/10 13:18:54.400 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/10 13:18:54.400 ADPylon:connect:  camera connection failed (3)
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolUsedMem devAsynFloat64::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolAllocBuffers devAsynInt32::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:18:54.532 SRC16-DI-PNHL:CAM:PoolFreeBuffers devAsynInt32::reportQueueRequestStatus queueRequest error port pinhole not connected
2023/12/10 13:19:14.763 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber
2023/12/11 00:50:16.640 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/11 00:50:16.640 ADPylon:connect:  camera connection failed (3)
2023/12/11 00:50:36.999 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber
2023/12/11 02:10:26.165 ADPylon::connectCamera error opening camera 23186682: Failed to open 'Basler acA1300-30gm#003053309FFA#10.2.4.70:3956'. The device is controlled by another application. Err: An attempt was made to access an address location which is currently/momentary not accessible. (0xE1018006)

2023/12/11 02:10:26.165 ADPylon:connect:  camera connection failed (3)
2023/12/11 02:10:46.524 PylonFeature::initialize error input feature type=6 != Pylon feature type=0 for featurename=DeviceSerialNumber

@xiaoqiangwang
Copy link
Collaborator

For the message timestamp, it looks like the docker container network has intermittent interrupt.

  • The network disruption triggers the camera lost event.
  • The EPICS driver will try to reconnect.
    • The first time it fails with error code "0xE1018006" while maybe the camera hardware has not freed up the control yet.
    • The 2nd time, 20 seconds later, the connection is resumed.

@AbdallaDalleh
Copy link
Author

Thanks for the feedback, the issue started to appear around the time we migrated the IOC to docker, for testing purposes we switched back to the standard IOC setup along with the entire path MTU set to 9000. I'll provide you with feedback soon.

@AbdallaDalleh
Copy link
Author

AbdallaDalleh commented Dec 13, 2023

The issue happened again outside of docker, we ran a basic IOC with ADPylon integrated, it was running fine until few hours ago I got the same error, notice that it did not resume so I had to stop/start acquiring. Could there be some parameters in the OS or in the features GUI that I need to modify?

@xiaoqiangwang
Copy link
Collaborator

During the operation, anything unusual related to buffer/packets in the status section?
Screenshot 2023-12-13 at 08 26 43

@AbdallaDalleh
Copy link
Author

I haven't checked the status section recently but I don't remember getting any failed buffers or packets.

@AbdallaDalleh
Copy link
Author

I just remembered that we are running the ADPylon with Pylon SDK 7.3, the tools in 7.3 do not run under rocky linux 8 because they require a higher version of the libstdc++, could this be an issue for the IOC?

@xiaoqiangwang
Copy link
Collaborator

Pylon SDK 7.3 does not run on RHEL/Centos 7 because of libc and libstdc++ versions.

For Rocky Linux 8, there is only this crash-on-exit issue #1, but it has been fixed.

@AbdallaDalleh
Copy link
Author

AbdallaDalleh commented Dec 14, 2023

Actually Pylon SDK 7.3 does not work with rocky Linux 8 due to libstdc++ version, the only latest functional SDK on Rocky Linux 8 is 7.2.1, SDK 7.2.1 has been working fine few months ago. Yesterday, we installed the IOC with SDK 7.2.1 on a laptop on the same switch as the camera, with the MTU on all nodes set to 9000 and just this morning I got the same error. We are suspecting an issue with the camera itself, we will try rebooting it through PoE and test again. What do you think?

@xiaoqiangwang
Copy link
Collaborator

I am able to build and run ADPylon IOC using Pylon SDK 7.3 on RHEL8.9 and Rocky Linux8.7. What is not working is the pylonviewer client program, requiring libc>=2.29.

If you could run two cameras on the same host, that would be a definite proof. But I suspect, as much as you do, that the camera is failing.

@AbdallaDalleh
Copy link
Author

We added a 2nd camera on the same switch and failed with the same error, we are suspecting with the PoE on the switch, we will setup an external power while turning off PoE for both cameras and test again.

@AbdallaDalleh
Copy link
Author

With all cameras set to the same acquisition settings, I got the same error on a test camera connected directly on the same switch with MTU set to 9000 and with external power supplies, I also got the same error on a different camera running on a different switch with MTU set to 1500 but after like 700K frames. I am thinking of two things:

  • Upgrade the SDK to 7.4 and test again.
  • Play with the transport layer 1 parameters on the features-3 GUI.

What do you think?

@xiaoqiangwang
Copy link
Collaborator

So far all tests involve network switches, would it be possible to test with a direct connection between PC and camera? See Peer-to-Peer Network Architecture and Changing the Network Adapter Properties (Linux) in https://docs.baslerweb.com/network-configuration-(gige-cameras)

@AbdallaDalleh
Copy link
Author

Peer-to-peer was tested multiple times but with the Pylon Viewer, one time it acquired 7M+ frames but can't recall any failed frames if any.

@xiaoqiangwang
Copy link
Collaborator

For comparison, it would still be good

  • either run EPICS IOC with a peer-to-peer connection.
  • or run Pylon Viewer in the current setup.

In both ways, one would identify whether the network switch is part of the problem.

@AbdallaDalleh
Copy link
Author

Not all parameters mentioned in the network configuration page are supported in RL8 because of a mix between kernel version and NIC driver support for these parameters. Looks like we might need an RL9-based setup ....

@xiaoqiangwang
Copy link
Collaborator

Not all parameters mentioned in the network configuration page are supported in RL8 because of a mix between kernel version and NIC driver support for these parameters. Looks like we might need an RL9-based setup ....

What error messages do you observe?

@AbdallaDalleh
Copy link
Author

The ethtool command reports the following errors on different PCs:

netlink error: cannot modify an unsupported parameter (offset xx)
netlink error: Invalid arguments

Where the offset is just the location of the parameter in the command. I searched over the internet, it seems that ethtool command have a list of parameters but the support depends on the NIC model and Linux driver, even this list of parameter in the tool itself varies from kernel version to another version. Looks like we might need to try an RL-9-based setup.

@xiaoqiangwang
Copy link
Collaborator

We have used NICs with Intel and Broadcom chipsets and RHEL has good support of them.

@AbdallaDalleh
Copy link
Author

We are still investigating the issue on our side, but I have a question, the driver won't acquire anything if the MTU is set to 9000 on the camera through features GUI, this will only happen if it is connected to any switch. In the case of P2P, the MTU is working fine.

@xiaoqiangwang
Copy link
Collaborator

xiaoqiangwang commented Jan 22, 2024

this will only happen if it is connected to any switch

Is the network switch configured with Jumbo Frame?

@AbdallaDalleh
Copy link
Author

Hi Wang, sorry I forgot about this issue but as we agreed it seems it is more of a networking issue, we moved the camera's Ethernet cable to another switch and we get much better performance, millions of frames captured continuously on 10 FPS before any disconnection, for the time being this is acceptable and we can deal with it because of the busy operation, we will test few cameras soon on a 10G switch supporting MTU up to 10000 I think and provide you with feedback. Thanks!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants