Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point access speedup #10

Conversation

richardt94
Copy link
Contributor

This will address the issue raised in #8. The problem there is that netCDF really doesn't like "fancy indexing" (see e.g. here), where an attempt is made to access a bunch of separate small areas in the file (e.g. a list of separated points). This is only going to be more of a problem when the file is being supplied by a remote THREDDS server as reported in the issue - if max_bytes is set large enough in the call to get_value_at_coords then the server will time out attempting to do the fancy indexing, and the only alternative is to set a small maximum request size that gets a few points each time, which is also quite slow.

This PR changes this by indexing the dataset with contiguous slices if max_bytes allows, which is processed much, much faster. I tested with the notebook in examples/2_geophys_netcdf_grid_utils_demo.ipynb (this uses the same dataset referenced in #8) and was able to retrieve 466 points at 10 km spacing in less than 2 seconds with a fast connection to NCI and max_bytes=50000000 (50 MB), compared to a minimum of 6.9 seconds with the current implementation using the minimum request size of max_bytes=1.

The changes aren't quite ready yet because the computation of the slice indices assumes that the list of points is "sorted" in a way that the rectangle bounded by the ith point and jth point is entirely contained in the rectangle bounded by the ith point and j+1th point. This will probably hold for lists of points that are along an almost straight line, but not otherwise.

@richardt94
Copy link
Contributor Author

This should now work for any list of points, though it will be faster for lists where successive points are close to each other. I also improved the handling of passing a single point to get_value_at_coords (this is now explicitly checked instead of waiting for an error to be thrown by the logic that handles lists of points) and added a couple more tests for the function using different max_bytes.

@richardt94 richardt94 changed the title [WIP] Point access speedup Point access speedup Apr 7, 2022
@richardt94 richardt94 changed the base branch from master to develop April 12, 2022 01:36
@andrew-j-turner-000 andrew-j-turner-000 merged commit 14cb960 into GeoscienceAustralia:develop Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants