Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for writing-then-reading randomly-generated dataframes #16121

Open
DeflateAwning opened this issue May 8, 2024 · 3 comments
Open
Assignees
Labels
A-io Area: reading and writing data accepted Ready for implementation test Related to the test suite

Comments

@DeflateAwning
Copy link
Contributor

Description

Related to Issue #16109 (very broken parquet files).

Can we please add "unit" tests (or rather integration tests) like this for every reader/writer (e.g., read/write_parquet, read/write_ndjson, etc.)? Ideally they'll run >10 times each with >10 different random generations, and perhaps a few different structures (some datetimes, etc.).

The non-deterministic failures in the write_parquet function could have been caught with this test, and it's so basic to implement and so useful in checking that the entire write-to-read path works properly.

import tempfile
import polars as pl

with tempfile.NamedTemporaryFile() as f:
    for n in range(10):
        print(f"Run #{n + 1}: ", end="")

        df = pl.DataFrame({
            "a": pl.Series(["123", "abc", "xyz"]).sample(50_000, with_replacement=True)
        }).with_row_index()

        df.write_parquet(f.name)
        f.seek(0)

        assert df.equals(pl.read_parquet(f.name))
@DeflateAwning DeflateAwning added the enhancement New feature or an improvement of an existing feature label May 8, 2024
@ritchie46
Copy link
Member

Yes, I think we need an hypothesis test for this one. Creating different data-types, nesting types and file formats and see if we can round-trip them.

Pinging @stinodego as he is just working on this.

@stinodego
Copy link
Member

I'll add these when #16062 is merged.

@stinodego stinodego self-assigned this May 9, 2024
@stinodego stinodego added test Related to the test suite A-io Area: reading and writing data and removed enhancement New feature or an improvement of an existing feature labels May 9, 2024
@DeflateAwning
Copy link
Contributor Author

Looks like that one's merged now! Curious if there's any progress on this otherwise?

@stinodego stinodego added the accepted Ready for implementation label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: reading and writing data accepted Ready for implementation test Related to the test suite
Projects
Status: Next
Development

No branches or pull requests

3 participants