Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Pylot in an RL training process #293

Open
Morphlng opened this issue May 8, 2023 · 1 comment
Open

Integrating Pylot in an RL training process #293

Morphlng opened this issue May 8, 2023 · 1 comment

Comments

@Morphlng
Copy link

Morphlng commented May 8, 2023

Hi! I want to train NPC vehicles in an autonomous driving scenario, using RL algorithms to discover the potential weaknesses of the Ego vehicle. I'm currently trying to use Pylot as the "system under test" to control the Ego and integrate it into the entire RL training process. I have some questions about Pylot's "reusability":

  1. As an initial test, I'm currently restarting the Pylot process for each episode in the RL training, by shutting down the process and then restarting it, allowing Pylot to reconnect to the entire workflow. This is quite inefficient, as timing statistics show that a complete restart of Pylot takes an average of about 20 seconds.

  2. In reinforcement learning, the reset of each episode is achieved by moving the vehicle's position (set_transform). If Pylot is not restarted, it will be "at a loss" after moving the position. I understand that this might be due to an error in its route planning. Is there a way to make a specific module in Pylot work again independently?

  3. Since we also record necessary information for the Ego vehicle in the reinforcement learning environment, I think theoretically Pylot does not need to perform redundant perception tasks. If we only target the simulator environment, what kind of data content and format does Pylot require to complete path planning?

@pschafhalter
Copy link
Member

Hi, I've looked modifying Pylot to do RL work in the past, and I'm happy to share my findings. Pylot wasn't designed with RL in mind, so some of these changes might be complex:

  1. Pylot was designed to run as a real-world AV pipeline. To provide real-time execution, components execute in separate processes to exploit parallelism for real-time execution (Python offers limited intra-process parallelism due to the global interpreter lock). As such, starting up this collection processes takes some time. Further slowdowns occur during setup because some components initialize complex libraries like Tesnorflow.
  2. It might be possible to "soft-reset" Pylot, but this would require modifications to Pylot's operators. This would require some mechanism to notify the operators to reset their state. Also, the planning operator would need to change its destination waypoint.
  3. You can execute Pylot without perception using ground-truth information extracted from the simulator by setting the following flags:
--simulator_obstacle_detection
--simulator_traffic_light_detection
--perfect_obstacle_tracking
--perfect_localization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants