Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeamNGpy blocks indefintely while communicating with the simulator #249

Open
leonardo-panseri opened this issue Feb 20, 2024 · 4 comments
Open

Comments

@leonardo-panseri
Copy link

Dear BeamNG team,

I am encountering network issues that prevents me from continuing a case study using BeamNG for my research project.
During long experiments that involve loading and simulating various scenarios it often happens that the BeamNGpy library blocks indefintely while sending messages to or waiting acks from the simulator.

I have looked at the source code and I found out that the library communicates with the simulator through the PrefixedLengthSocket class. When the OS socket that is managed by this class is initialized, the timeout is set to None, putting the socket in blocking mode.

I think it would be good practice to let the user of the library handle the timeout for socket operations. It would make sense if I could choose a timeout value while creating the BeamNGpy instance. Then, I would be able to properly handle cases in which the simulator is not responding or some other unforeseen event happens during network communication.

Please let me know if there is any other way to handle this that I have missed.

Thank you for your help.

@aivora-beamng
Copy link
Contributor

Hi, the current BeamNGpy protocol is designed to be a synchronous request-response protocol, and exposing the timeout would not make the communication strictly better. In the end, there is a single TCP connection maintained between BeamNG and BeamNGpy and all the data need to pass through it anyways.

I'm curious whether setting a socket timeout on your side fixed the issues you had? Do you have a concrete example where the timeout can improve the experience?

What you could try is to wrap BeamNGpy in Python's built-in process-based parallelism library if you need to do other operations while waiting for the response from BeamNG.

@leonardo-panseri
Copy link
Author

Here is an example of when having a convenient way to set the timeout is useful:

I am running one scenario after the other in an instance of the simulator, it usually works fine, but sometimes I encounter a bug for which after the CreateScenario message, the scenario start overlay goes away, but the timer does not start and the simulator never replies to the python library with the "everything ok" message. In a case like this my program waits forever for the simulator's answer.

I could wrap every call to beamngpy in an async function and implement a timeout myself, but that seems like a bad solution.
For now I am modifying the timeout value directly in the code of BeamNGpy and I am restarting the instance of the simulator whenever I get a timeout.

@leonardo-panseri
Copy link
Author

Update on this problem

I have understood exactly what is happening. It is not a problem of the library or my algorithm, it is most likely a bug of BeamNG.tech. Sometimes, unpredictably, the module of the simulator that handles communication with the library breaks. It always happens when sending the StartScenario message. The server socket inside the simulator is still open, can accept connections, and sends TCP ACKs, but it no longer answers to messages, so the library waits forever.
I will describe my setup so it may be easier to reproduce this: I am running 11 parallel instances of BeamNG, and running quick scenarios (~30s) where a single car drives from start to end of a road generated at runtime on a flat map. There is a camera sensor mounted on the car and it is in shared memory mode. If the team needs more details I am available to provide them.
For now I have implemented a workaround by modifying BeamNGpy such that it sets a timeout on the socket before sending the StartScenario message and it does not attempt to reconnect if the timeout expires, raising the exception to the caller instead. In my code I catch the timeout exception and restart the BeamNG instance that raised it.

@aivora-beamng
Copy link
Contributor

Hi,
to hunt this bug, it would greatly help if you give us some minimal code example to reproduce the bug, since we do not have a reproducible example of this behaviour happening. Also, BeamNGpy has not been designed with parallelism in mind, so there may be some issues which were not explored before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants