New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply correct schema for all upgrades #271
Conversation
Minimum allowed coverage is Generated by 🐒 cobertura-action against 6659280 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One minor suggestion.
logger.info(f"Got {len(all_schema_dict)} columns") | ||
all_results_cols = list(all_schema_dict.keys()) | ||
all_schema_dict = {to_camelcase(key): value for key, value in all_schema_dict.items()} | ||
logger.info(f"Got this schema: {all_schema_dict}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary to print this out. Maybe helpful for debugging, but now that it's working you're probably good to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated keeping it or removing it. But I kept it because over information (upto a limit!) is better than under information for debugging, and postprocessing.out is kinda used mostly for debugging, so may it's fine to leave? Most users probably don't need to read the postprocessing.out unless they can't find their table in Athena.
logger.info(f"Got {len(all_schema_dict)} columns") | ||
all_results_cols = list(all_schema_dict.keys()) | ||
all_schema_dict = {to_camelcase(key): value for key, value in all_schema_dict.items()} | ||
logger.info(f"Got this schema: {all_schema_dict}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary to print this out. Maybe helpful for debugging, but now that it's working you're probably good to remove this.
logger.info(f"Got this schema: {all_schema_dict}\n") |
Pull Request Description
When different upgrades have different set of columns with some some columns having all empty values, the datatype for the columns will not be properly set. This PR fixes the issue by scanning all the results_jobX.json files in the beginning and determines the datatype for all columns.
Checklist
Not all may apply
minimum_coverage
in.github/workflows/ci.yml
as necessary.