Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very Slow Chunking #861

Open
dhensle opened this issue Apr 30, 2024 · 1 comment
Open

Very Slow Chunking #861

dhensle opened this issue Apr 30, 2024 · 1 comment
Labels
Bug Something isn't working/bug f

Comments

@dhensle
Copy link
Contributor

dhensle commented Apr 30, 2024

Describe the bug
Chunk training takes a VERY long time.

Performed on a SANDAG server with 1 TB of RAM (chunk_size was set to 450GB), ran with only 64k households (~5%) and 5 cores. Run time was 66.85 hours, or 2.78 days!

To Reproduce
Run the SANDAG ABM3 model in chunk training mode. This was performed with the BayDAG_estimation branch which is based off ActivitySim version 1.2.

Expected behavior
Chunk training shouldn't take all that much longer than actually running the model. We have not seen this long of chunk training behavior before. Is there something about the SANDAG model that takes a long time? (e.g. two-zones?) Is the problem a dependency was updated that really hit the performance?

Additional context
Log files can be seen here: training_log.zip

Running in production mode also took an extremely long time (again > 2.5 days!). Part of the problem may be that the num_processors setting was set to 40, but the machine only had 32, but this shouldn't make that big of a deal.

Looking at the production logs shows that about 700 minutes(!) of run time was in the parking location choice model. This looks to be due to ActivitySim creating a chunk for every single chooser in that model (hence the statements like Running chunk 10450 of 10456 with 1 of 10456 choosers in the log.) The chunk_cache.csv (found in the training_log above) certainly shows that more than one row should be allowed per chunk when the chunk_size is set to 450GB.
production_log_subset.zip

Is this behavior related to #860?

(Currently working on reproducing with the main branch, but run is not yet complete. I will update once complete...)

@dhensle dhensle added the Bug Something isn't working/bug f label Apr 30, 2024
@dhensle dhensle changed the title Very Slow Chunk Trianing Very Slow Chunking Apr 30, 2024
@dhensle
Copy link
Contributor Author

dhensle commented May 1, 2024

As mentioned above, I tested with the current main branch of the code and the sandag-abm3-example. The results were very similar.

I ran with 100k households in chunk_training mode without sharrow and with 10 cores. The chunk training run took about 24 hours!

Log files are attached:
log_abm3_chunk_train_100k.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working/bug f
Projects
None yet
Development

No branches or pull requests

1 participant