Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI issues with newer version of pandas and existing parquet files in repo #385

Closed
nmerket opened this issue Aug 9, 2023 · 0 comments · Fixed by #387
Closed

CI issues with newer version of pandas and existing parquet files in repo #385

nmerket opened this issue Aug 9, 2023 · 0 comments · Fixed by #387
Assignees
Labels
bug Something isn't working

Comments

@nmerket
Copy link
Member

nmerket commented Aug 9, 2023

Describe the bug

The CI is returning errors on all runs in this test for python > 3.8. The old parquet files store an object datatype while the newer ones have a python[string] datatype.

To Reproduce
Steps to reproduce the behavior:

  1. Happens on any CI run.

Expected behavior

Tests pass

Logs

From the CI logs:

        # results parquet
        test_pq = pd.read_parquet(os.path.join(test_path, 'baseline', 'results_up00.parquet')).sort_values('building_id')\
            .reset_index().drop(columns=['index'])
        reference_pq = pd.read_parquet(os.path.join(reference_path, 'baseline', 'results_up00.parquet'))\
            .sort_values('building_id').reset_index().drop(columns=['index'])
>       pd.testing.assert_frame_equal(test_pq, reference_pq)
E       AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="completed_status") are different
E       
E       Attribute "dtype" are different
E       [left]:  string[python]
E       [right]: object

Platform (please complete the following information):

  • Simulation platform: ubuntu-latest on GitHub Actions
  • BuildStockBatch version, branch, or sha: develop
  • resstock or comstock repo version, branch, or sha: develop
  • Local Desktop OS: [e.g. Windows, Mac, Linux, especially important if running locally]

Additional context

Two ideas for how to address this:

  1. (easy but could break again) Open the testing parquet files in the repo in a newer version of pandas, convert the columns to string and save them back. This should solve the error, but something like this may happen again.
  2. (harder but more maintainable in the long run) Change these tests to instead of comparing an expected parquet to a generated one, use the newer integration test framework where you can actually run ResStock and generate results. Then you'd check that those results have expected columns and such without comparing two dataframes directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants