Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing of different partition options #17

Open
cholmes opened this issue Aug 31, 2023 · 0 comments
Open

Performance testing of different partition options #17

cholmes opened this issue Aug 31, 2023 · 0 comments
Labels
data Issues related to creating or updating data, usually on source.coop get_buildings Issues related to the get_buildings operations help wanted Extra attention is needed

Comments

@cholmes
Copy link
Collaborator

cholmes commented Aug 31, 2023

In making the get-buildings command I went through a couple of iterations of trying out different formatting - definitely realizing that more row groups than gpq makes by default is better. And with the latest scripts I have a way to set the 'max number of rows' per file and also the number of row groups. But I have no idea if things could be lots faster if we increased or decreased row group size, and/or increased / decreased number of files. The 'defaults' I used were max 10 million rows per file and 20000 rows per group. It'd be great to try out some variations on that. And ideally experiment on the tradeoffs between 'legibility for download' (like use country then admin level 1 like the google buildings data does) vs 'balance of spatial size' (like use the quadkey max size algorithm entirely, instead of country then quadkey, so we'd have much fewer files over all, but each file would be meaningless to users - they'd need to use the 'tool' to download).

The performance I was getting to was 20-30 seconds to download a small number of buildings. But it was just a handful of tests.

Ideally we'd have a command that would run a 'benchmark' that would have 20-30 locations globally and get the performance for each of them and report that out, so we can easily compare how tweaks to the data work.

@cholmes cholmes added data Issues related to creating or updating data, usually on source.coop get_buildings Issues related to the get_buildings operations help wanted Extra attention is needed labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Issues related to creating or updating data, usually on source.coop get_buildings Issues related to the get_buildings operations help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant