Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting progress of get_significant_points_gdf #361

Open
marklit opened this issue Jan 25, 2024 · 5 comments
Open

Reporting progress of get_significant_points_gdf #361

marklit opened this issue Jan 25, 2024 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@marklit
Copy link

marklit commented Jan 25, 2024

If I feed up to 100K points into MP, get_significant_points_gdf will finish in a few minutes. I'm keen to potentially feed upwards 40M points. Is there some way that get_significant_points_gdf could be passed a function and it would report its progress?

I usually use https://rich.readthedocs.io/en/stable/progress.html for tracking progress in long-running Python scripts.

@marklit marklit added the enhancement New feature or request label Jan 25, 2024
@bamacgabhann
Copy link
Contributor

Use tqdm?

https://github.com/tqdm/tqdm

@marklit
Copy link
Author

marklit commented Jan 26, 2024

tqdm's API looks much like rich's. The issue is how to tie this into MP's aggregation calls. There is no iterator exposed that I could use to keep track of what's happening and I can't pass an object into the aggregation call to report on its progress either.

It took all night for my 2020 MBP to generate the generalised route maps for the top 25 airlines in ADSB.lol's dataset. It would be great to know roughly how long is left on jobs like this. https://tech.marksblogg.com/global-flight-tracking-adsb.html#generalising-routes

@anitagraser
Copy link
Collaborator

Nice blog post, @marklit.

I'm keen to potentially feed upwards 40M points.

Heads up: The TrajectoryCollection from these 40M points has to fit into your RAM. The script will crash otherwise.

Looking at the current implementation, a meaningful progress indicator will be challenging.
The first 50% could be showing the progress of

self.significant_points = self._extract_significant_points()

the remaining 50% would be
self.flows = self._compute_flows_between_clusters()

Both of these happen on init

@marklit
Copy link
Author

marklit commented Jan 26, 2024

Is there anywhere deeper when records are iterated over one at a time? This could be a place to add a hook to a progress counter.

@anitagraser
Copy link
Collaborator

Records are iterated for each trajectory individually. It would be hard to keep track of the overall progress for the whole trajectory collection

@anitagraser anitagraser added the help wanted Extra attention is needed label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants