Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test against pandas 2.0 #10893

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Test against pandas 2.0 #10893

wants to merge 1 commit into from

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 5, 2024

Now that the 3.12 environment is complete of all its dependencies; we can use the 3.11 one to pin numpy, pandas, and pyarrow in the

@crusaderky crusaderky self-assigned this Feb 5, 2024
Copy link
Contributor

github-actions bot commented Feb 5, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ±0       15 suites  ±0   3h 23m 21s ⏱️ + 4m 34s
 13 104 tests ±0   12 173 ✅  -  2     929 💤 ± 0  2 ❌ +2 
162 287 runs  ±0  145 793 ✅ +34  16 490 💤  - 38  4 ❌ +4 

For more details on these failures, see this check.

Results for commit 9ea2a49. ± Comparison against base commit 9ba7a97.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator Author

crusaderky commented Feb 5, 2024

There seem to be two genuine failures where tests are green on pandas 1.5/pyarrow 10 and on pandas 2.2/pyarrow 14, but fail on pandas 2.0/pyarrow 12.

https://github.com/dask/dask/actions/runs/7973508824/job/21767509851?pr=10893

FAILED dask/dataframe/io/tests/test_parquet.py::test_arrow_to_pandas[pyarrow] - AssertionError: assert string[pyarrow] == dtype('O')
 +  where string[pyarrow] = Dask Series Structure:\nnpartitions=1\n    string\n       ...\nName: A, dtype: string\nDask Name: to_pyarrow_string, 3 graph layers.dtype
 +    where Dask Series Structure:\nnpartitions=1\n    string\n       ...\nName: A, dtype: string\nDask Name: to_pyarrow_string, 3 graph layers = Dask DataFrame Structure:\n                    A\nnpartitions=1        \n               object\n                  ...\nDask Name: read-parquet, 1 graph layer.A
 +  and   dtype('O') = 0    2000-01-01 00:00:00\nName: A, dtype: object.dtype
 +    where 0    2000-01-01 00:00:00\nName: A, dtype: object =                      A\n0  2000-01-01 00:00:00.A
 +      where                      A\n0  2000-01-01 00:00:00 = <bound method DaskMethodsMixin.compute of Dask DataFrame Structure:\n                    A\nnpartitions=1        \n               object\n                  ...\nDask Name: read-parquet, 1 graph layer>()
 +        where <bound method DaskMethodsMixin.compute of Dask DataFrame Structure:\n                    A\nnpartitions=1        \n               object\n                  ...\nDask Name: read-parquet, 1 graph layer> = Dask DataFrame Structure:\n                    A\nnpartitions=1        \n               object\n                  ...\nDask Name: read-parquet, 1 graph layer.compute

FAILED dask/dataframe/io/tests/test_parquet.py::test_roundtrip_date_dtype - ValueError: Failed to convert partition to expected pyarrow schema:
    `ArrowNotImplementedError('Unsupported cast from string to date32 using function cast_date32')`

Expected partition schema:
    ts: timestamp[ns, tz=UTC]
    col1: date32[day]
    __null_dask_index__: int64

Received partition schema:
    ts: timestamp[ns, tz=UTC]
    col1: string
    __null_dask_index__: int64

This error *may* be resolved by passing in schema information for
the mismatched column(s) using the `schema` keyword in `to_parquet`.

@crusaderky
Copy link
Collaborator Author

@phofl how much do we care about these failures?

@phofl
Copy link
Collaborator

phofl commented Feb 20, 2024

I think I care at least a little, but it's not totally clear what is off here, so wouldn't spend that much time investigating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants