Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller timeseries partitions #247

Merged
merged 2 commits into from Sep 13, 2021
Merged

Smaller timeseries partitions #247

merged 2 commits into from Sep 13, 2021

Conversation

nmerket
Copy link
Member

@nmerket nmerket commented Sep 8, 2021

Pull Request Description

The output partition size of 4GB was making downstream data processing difficult. Both spark and dask clusters were failing due to out of memory errors. I'm changing it back to 1GB, which will make more files, but each will be more manageable.

Checklist

Not all may apply

  • Code changes (must work)
  • Tests exercising your feature/bug fix (check coverage report on CircleCI build -> Artifacts)
  • All other unit tests passing
  • Update validation for project config yaml file changes
  • ~~ Update existing documentation~~
  • Run a small batch run to make sure it all works (local is fine, unless an Eagle specific feature)
  • Add to the changelog_dev.rst file and propose migration text in the pull request

@nmerket nmerket requested a review from rajeee September 8, 2021 19:36
@nmerket nmerket self-assigned this Sep 8, 2021
@rajeee
Copy link
Contributor

rajeee commented Sep 8, 2021

Looks like another case of "unintended consequences" :)
This will probably make the upload time a little worse, but nothing too bad.

image

@nmerket nmerket merged commit 1f03a84 into develop Sep 13, 2021
@nmerket nmerket deleted the smaller_partitions branch September 13, 2021 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants