You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The schedule files have now changed and are named like schedules20230221-10641-yoga5e.csv, so that doesn't pick them up. When I changed that line to accept the new format for a schedule file name, I got some inconsistencies in the output parquet files on the combining step in postprocessing:
INFO:2023-02-23 13:19:13:buildstockbatch.postprocessing:Gathering all the parquet files in /Users/nmerket/projects/resstock/resstock/project_national/national_upgrades/parquet/timeseries/up*/*.parquet
INFO:2023-02-23 13:19:13:buildstockbatch.postprocessing:Gathered 14 files. Now writing _metadata
2023-02-23 13:19:13,814 - distributed.worker - WARNING - Compute Failed
Key: gen-metadata-2ffd34d5b9f25259ded750abaa6a4533
Function: aggregate_metadata
args: ([<pyarrow._parquet.FileMetaData object at 0x11abafbf0>
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 66
num_rows: 2864520
num_row_groups: 2
format_version: 2.6
serialized_size: 46791, <pyarrow._parquet.FileMetaData object at 0x118352ac0>
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 66
num_rows: 1427880
num_row_groups: 1
format_version: 2.6
serialized_size: 45075, <pyarrow._parquet.FileMetaData object at 0x11abae610>
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 66
num_rows: 1427880
num_row_groups: 1
format_version: 2.6
serialized_size: 46617, <pyarrow._parquet.FileMetaData object at 0x11a54f240>
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 66
num_rows: 1419120
num_row_groups: 1
format_version: 2.6
serialized_size: 45034, <pyarrow._parquet.FileMetaData object at 0x11abaf420>
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 66
num_rows: 1410360
num_row_groups: 1
format_ve
kwargs: {}
Exception: 'RuntimeError(\'Schemas are inconsistent, try using `to_parquet(..., schema="infer")`, or pass an explicit pyarrow schema. Such as `to_parquet(..., schema={"column1": pa.string()})`\')'
Traceback (most recent call last):
File "/Users/nmerket/mambaforge/envs/buildstock/lib/python3.11/site-packages/dask/dataframe/io/parquet/arrow.py", line 76, in _append_row_groups
metadata.append_row_groups(md)
^^^^^^^^^^^^^^^^^
File "pyarrow/_parquet.pyx", line 799, in pyarrow._parquet.FileMetaData.append_row_groups
RuntimeError: AppendRowGroups requires equal schemas.
The two columns with index 64 differ.
column descriptor = {
name: schedules_vacancy,
path: schedules_vacancy,
physical_type: DOUBLE,
converted_type: NONE,
logical_type: None,
max_definition_level: 1,
max_repetition_level: 0,
}
column descriptor = {
name: schedules_vacancy,
path: schedules_vacancy,
physical_type: INT64,
converted_type: NONE,
logical_type: None,
max_definition_level: 1,
max_repetition_level: 0,
}
Describe the bug
The timeseries output isn't including the schedules even though it should. I've tracked it down to this line of code:
buildstockbatch/buildstockbatch/base.py
Line 199 in acff9e6
The schedule files have now changed and are named like
schedules20230221-10641-yoga5e.csv
, so that doesn't pick them up. When I changed that line to accept the new format for a schedule file name, I got some inconsistencies in the output parquet files on the combining step in postprocessing:@rajeee
The text was updated successfully, but these errors were encountered: