Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedules are no longer being processed into timeseries output #352

Closed
nmerket opened this issue Feb 23, 2023 · 2 comments
Closed

Schedules are no longer being processed into timeseries output #352

nmerket opened this issue Feb 23, 2023 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@nmerket
Copy link
Member

nmerket commented Feb 23, 2023

Describe the bug

The timeseries output isn't including the schedules even though it should. I've tracked it down to this line of code:

if file.endswith('schedules.csv'):

The schedule files have now changed and are named like schedules20230221-10641-yoga5e.csv, so that doesn't pick them up. When I changed that line to accept the new format for a schedule file name, I got some inconsistencies in the output parquet files on the combining step in postprocessing:

INFO:2023-02-23 13:19:13:buildstockbatch.postprocessing:Gathering all the parquet files in /Users/nmerket/projects/resstock/resstock/project_national/national_upgrades/parquet/timeseries/up*/*.parquet
INFO:2023-02-23 13:19:13:buildstockbatch.postprocessing:Gathered 14 files. Now writing _metadata
2023-02-23 13:19:13,814 - distributed.worker - WARNING - Compute Failed
Key:       gen-metadata-2ffd34d5b9f25259ded750abaa6a4533
Function:  aggregate_metadata
args:      ([<pyarrow._parquet.FileMetaData object at 0x11abafbf0>
  created_by: parquet-cpp-arrow version 11.0.0
  num_columns: 66
  num_rows: 2864520
  num_row_groups: 2
  format_version: 2.6
  serialized_size: 46791, <pyarrow._parquet.FileMetaData object at 0x118352ac0>
  created_by: parquet-cpp-arrow version 11.0.0
  num_columns: 66
  num_rows: 1427880
  num_row_groups: 1
  format_version: 2.6
  serialized_size: 45075, <pyarrow._parquet.FileMetaData object at 0x11abae610>
  created_by: parquet-cpp-arrow version 11.0.0
  num_columns: 66
  num_rows: 1427880
  num_row_groups: 1
  format_version: 2.6
  serialized_size: 46617, <pyarrow._parquet.FileMetaData object at 0x11a54f240>
  created_by: parquet-cpp-arrow version 11.0.0
  num_columns: 66
  num_rows: 1419120
  num_row_groups: 1
  format_version: 2.6
  serialized_size: 45034, <pyarrow._parquet.FileMetaData object at 0x11abaf420>
  created_by: parquet-cpp-arrow version 11.0.0
  num_columns: 66
  num_rows: 1410360
  num_row_groups: 1
  format_ve
kwargs:    {}
Exception: 'RuntimeError(\'Schemas are inconsistent, try using `to_parquet(..., schema="infer")`, or pass an explicit pyarrow schema. Such as `to_parquet(..., schema={"column1": pa.string()})`\')'

Traceback (most recent call last):
  File "/Users/nmerket/mambaforge/envs/buildstock/lib/python3.11/site-packages/dask/dataframe/io/parquet/arrow.py", line 76, in _append_row_groups
    metadata.append_row_groups(md)
  ^^^^^^^^^^^^^^^^^
  File "pyarrow/_parquet.pyx", line 799, in pyarrow._parquet.FileMetaData.append_row_groups
RuntimeError: AppendRowGroups requires equal schemas.
The two columns with index 64 differ.
column descriptor = {
  name: schedules_vacancy,
  path: schedules_vacancy,
  physical_type: DOUBLE,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}
column descriptor = {
  name: schedules_vacancy,
  path: schedules_vacancy,
  physical_type: INT64,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}

@rajeee

@nmerket nmerket added the bug Something isn't working label Feb 23, 2023
@nmerket
Copy link
Member Author

nmerket commented Feb 23, 2023

@rajeee It seems I was not using any partitioning on those outputs.

@rajeee
Copy link
Contributor

rajeee commented Apr 5, 2023

Addressed by #355

@rajeee rajeee closed this as completed Apr 5, 2023
@nmerket nmerket added this to the v2023.04.0 milestone Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants