Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run python tests in Pyodide build #11914

Merged
merged 6 commits into from May 17, 2024
Merged

Conversation

cpcloud
Copy link
Contributor

@cpcloud cpcloud commented May 2, 2024

This PR now runs as much of the Python test suite under Pyodide as possible, to give some confidence about its functionality.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 2, 2024 22:17
@cpcloud cpcloud marked this pull request as ready for review May 2, 2024 22:21
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 2, 2024 23:56
@cpcloud cpcloud marked this pull request as ready for review May 3, 2024 01:19
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 3, 2024 10:13
@cpcloud cpcloud marked this pull request as ready for review May 3, 2024 10:34
@cpcloud cpcloud changed the title Run python fast test suite in Pyodide build Run python tests in Pyodide build May 3, 2024
pytestmark = pytest.mark.skipif(
platform.system() == "Emscripten",
reason="Not supported on Emscripten",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is skipped even though technically it shouldn't be. Without the skipping this actually crashes the interpreter with a funky WASM stack trace about BigInt. I can dig around in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is a clear improvement on current status, tests are run and failure are mapped.

@@ -64,6 +64,7 @@ def test_timestamp_timedelta(self):
df_from_duck = duckdb.from_df(df).df()
assert df_from_duck.equals(df)

@pytest.mark.xfail(condition=platform.system() == "Emscripten", reason="time zones not working")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should pass, happy to dig into it in a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like timezones other than UTC don't seem to work, not sure why yet.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 3, 2024 11:13
@cpcloud cpcloud force-pushed the run-pyodide-tests branch 2 times, most recently from 9e81771 to 99acc87 Compare May 3, 2024 11:48
@cpcloud cpcloud marked this pull request as ready for review May 3, 2024 11:50
@cpcloud
Copy link
Contributor Author

cpcloud commented May 3, 2024

@Tishj Were there some spark-shim-api union tests added recently? Not sure why they are suddenly failing.

@cpcloud
Copy link
Contributor Author

cpcloud commented May 3, 2024

Nope, looks like the last changes there were 8 months ago

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 3, 2024 12:51
@cpcloud cpcloud force-pushed the run-pyodide-tests branch 5 times, most recently from a7fed18 to 4161683 Compare May 3, 2024 14:03
@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

@carlopi Would you mind giving this a review?

@cpcloud cpcloud marked this pull request as ready for review May 7, 2024 16:53
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 7, 2024 17:16
@cpcloud cpcloud marked this pull request as ready for review May 7, 2024 17:17
@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Not sure why some multithread tests are failing, I only changed whether the test skips instead of checking first.

@carlopi
Copy link
Contributor

carlopi commented May 7, 2024

Not sure why some multithread tests are failing, I only changed whether the test skips instead of checking first.

Failure might be related to the fact that import pyarrow as pa is now simply imported as pyarrow. I will check whether that's it, but seems plausible.

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Oh, I can look into it

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Hm, I don't see where pa is in use such that it would cause that failure

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Ah, found it

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Probably time someone added some basic static checks (other than formatting) to the python codebase 😅

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 7, 2024 20:29
@cpcloud cpcloud marked this pull request as ready for review May 7, 2024 20:29
Copy link
Contributor

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concern is non changing behaviour of Python regular tests, if those pass, I would mostly check with @Tishj on what he prefers for skipping the tests.

@@ -47,6 +47,7 @@ def test_timedelta_negative(self, duckdb_cursor):
@pytest.mark.parametrize('minutes', [0, 60])
@pytest.mark.parametrize('hours', [0, 24])
@pytest.mark.parametrize('weeks', [0, 51])
@pytest.mark.skipif(platform.system() == "Emscripten", reason="Bind parameters are broken when running on Pyodide")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this?

It seems to be working on https://duckdb.github.io/duckdb-pyodide/console by doing:

import duckdb
duckdb.execute("SELECT $2::date - $1::date", ['2024-01-02', '2024-03-04'])

unsure if the problem is in the testing (possibly due to handling of timezones / else) or this is actually broken.

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

Main concern is non changing behaviour of Python regular tests,

Which tests are you referring to?

I don't think I changed any actual material test function bodies, I'm just skipping based some tests known to not work on Emscripten builds, unless you consider the pyarrow imports, which @Tishj had already said was desirable (replacing the top level import with an importorskip call).

@carlopi
Copy link
Contributor

carlopi commented May 7, 2024

I meant that once regular Python CI is green (unsure if there might be other unforeseen surprises like pyarrow -> pa, it's basically good to go.

@cpcloud
Copy link
Contributor Author

cpcloud commented May 7, 2024

I meant that once regular Python CI is green (unsure if there might be other unforeseen surprises like pyarrow -> pa, it's basically good to go.

Ah ok, great!

Copy link
Contributor

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks

@carlopi
Copy link
Contributor

carlopi commented May 8, 2024

I would say if @Tishj can give a second look to the tests, but if it makes sense to him this looks great.

@@ -469,10 +469,12 @@ def test_read_csv_combined(self, duckdb_cursor):
assert rel.columns == rel2.columns
assert rel.types == rel2.types

def test_read_csv_names(self):
def test_read_csv_names(self, tmp_path):
Copy link
Contributor

@Tishj Tishj May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely confident this will be unique for every test

I have more faith in this fixture, do you mind using this?

@pytest.fixture(scope="function")
def temp_file_name(request, tmp_path_factory):
    return str(tmp_path_factory.mktemp(request.function.__name__, numbered=True) / 'file.csv')

tmp_path_factory creates unique paths, and numbers them, + for good measure I included the name of the test as another unique token

We are creating more than one files in certain places, perhaps this fixture should return a generator then

Copy link
Contributor Author

@cpcloud cpcloud May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely confident this will be unique for every test

The pytest documentation for tmp_path says almost verbatim exactly that, which is why I chose it :)

image

Copy link
Contributor Author

@cpcloud cpcloud May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of tmp_path is very similar to your proposed implementation (I removed the docstring and clean up code that follows the yield):

def _mk_tmp(request: FixtureRequest, factory: TempPathFactory) -> Path:
    name = request.node.name
    name = re.sub(r"[\W]", "_", name)
    MAXVAL = 30
    name = name[:MAXVAL]
    return factory.mktemp(name, numbered=True)


@fixture
def tmp_path(
    request: FixtureRequest, tmp_path_factory: TempPathFactory
) -> Generator[Path, None, None]:
    path = _mk_tmp(request, tmp_path_factory)
    yield path

@cpcloud cpcloud requested a review from Tishj May 8, 2024 16:40
Copy link
Contributor

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes and making it as less invasive as possible, looks good to me 👍

@Mytherin Mytherin merged commit 8e757df into duckdb:main May 17, 2024
23 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 17, 2024
Merge pull request duckdb/duckdb#12081 from Maxxen/type-metadata-redux
Merge pull request duckdb/duckdb#11914 from cpcloud/run-pyodide-tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants