Run python tests in Pyodide build #11914

cpcloud · 2024-05-02T22:02:58Z

This PR now runs as much of the Python test suite under Pyodide as possible, to give some confidence about its functionality.

.github/workflows/Pyodide.yml

tools/pythonpkg/tests/fast/api/test_read_csv.py

tools/pythonpkg/tests/fast/pandas/test_df_object_resolution.py

cpcloud · 2024-05-03T11:00:00Z

tools/pythonpkg/tests/fast/pandas/test_timedelta.py

+pytestmark = pytest.mark.skipif(
+    platform.system() == "Emscripten",
+    reason="Not supported on Emscripten",
+)


This is skipped even though technically it shouldn't be. Without the skipping this actually crashes the interpreter with a funky WASM stack trace about BigInt. I can dig around in a follow-up.

Sure, this is a clear improvement on current status, tests are run and failure are mapped.

cpcloud · 2024-05-03T11:00:24Z

tools/pythonpkg/tests/fast/pandas/test_timestamp.py

@@ -64,6 +64,7 @@ def test_timestamp_timedelta(self):
        df_from_duck = duckdb.from_df(df).df()
        assert df_from_duck.equals(df)

+    @pytest.mark.xfail(condition=platform.system() == "Emscripten", reason="time zones not working")


This should pass, happy to dig into it in a follow up.

Looks like timezones other than UTC don't seem to work, not sure why yet.

tools/pythonpkg/tests/fast/relational_api/test_rapi_query.py

tools/pythonpkg/tests/fast/test_alex_multithread.py

tools/pythonpkg/tests/fast/test_multithread.py

tools/pythonpkg/tests/stubs/test_stubs.py

cpcloud · 2024-05-03T12:43:17Z

@Tishj Were there some spark-shim-api union tests added recently? Not sure why they are suddenly failing.

cpcloud · 2024-05-03T12:49:19Z

Nope, looks like the last changes there were 8 months ago

cpcloud · 2024-05-07T16:53:27Z

@carlopi Would you mind giving this a review?

cpcloud · 2024-05-07T18:19:23Z

Not sure why some multithread tests are failing, I only changed whether the test skips instead of checking first.

carlopi · 2024-05-07T20:21:26Z

Not sure why some multithread tests are failing, I only changed whether the test skips instead of checking first.

Failure might be related to the fact that import pyarrow as pa is now simply imported as pyarrow. I will check whether that's it, but seems plausible.

cpcloud · 2024-05-07T20:24:11Z

Oh, I can look into it

cpcloud · 2024-05-07T20:25:32Z

Hm, I don't see where pa is in use such that it would cause that failure

cpcloud · 2024-05-07T20:28:15Z

Ah, found it

cpcloud · 2024-05-07T20:29:03Z

Probably time someone added some basic static checks (other than formatting) to the python codebase 😅

carlopi

Main concern is non changing behaviour of Python regular tests, if those pass, I would mostly check with @Tishj on what he prefers for skipping the tests.

carlopi · 2024-05-07T20:36:43Z

tools/pythonpkg/tests/fast/pandas/test_timedelta.py

@@ -47,6 +47,7 @@ def test_timedelta_negative(self, duckdb_cursor):
    @pytest.mark.parametrize('minutes', [0, 60])
    @pytest.mark.parametrize('hours', [0, 24])
    @pytest.mark.parametrize('weeks', [0, 51])
+    @pytest.mark.skipif(platform.system() == "Emscripten", reason="Bind parameters are broken when running on Pyodide")


Can you expand on this?

It seems to be working on https://duckdb.github.io/duckdb-pyodide/console by doing:

import duckdb duckdb.execute("SELECT $2::date - $1::date", ['2024-01-02', '2024-03-04'])

unsure if the problem is in the testing (possibly due to handling of timezones / else) or this is actually broken.

cpcloud · 2024-05-07T20:45:26Z

Main concern is non changing behaviour of Python regular tests,

Which tests are you referring to?

I don't think I changed any actual material test function bodies, I'm just skipping based some tests known to not work on Emscripten builds, unless you consider the pyarrow imports, which @Tishj had already said was desirable (replacing the top level import with an importorskip call).

carlopi · 2024-05-07T20:49:15Z

I meant that once regular Python CI is green (unsure if there might be other unforeseen surprises like pyarrow -> pa, it's basically good to go.

cpcloud · 2024-05-07T20:49:31Z

I meant that once regular Python CI is green (unsure if there might be other unforeseen surprises like pyarrow -> pa, it's basically good to go.

Ah ok, great!

carlopi

This looks good to me! Thanks

carlopi · 2024-05-08T07:20:25Z

I would say if @Tishj can give a second look to the tests, but if it makes sense to him this looks great.

Tishj · 2024-05-08T07:29:17Z

tools/pythonpkg/tests/fast/api/test_read_csv.py

@@ -469,10 +469,12 @@ def test_read_csv_combined(self, duckdb_cursor):
        assert rel.columns == rel2.columns
        assert rel.types == rel2.types

-    def test_read_csv_names(self):
+    def test_read_csv_names(self, tmp_path):


I am not entirely confident this will be unique for every test

I have more faith in this fixture, do you mind using this?

@pytest.fixture(scope="function") def temp_file_name(request, tmp_path_factory): return str(tmp_path_factory.mktemp(request.function.__name__, numbered=True) / 'file.csv')

tmp_path_factory creates unique paths, and numbers them, + for good measure I included the name of the test as another unique token

We are creating more than one files in certain places, perhaps this fixture should return a generator then

I am not entirely confident this will be unique for every test

The pytest documentation for tmp_path says almost verbatim exactly that, which is why I chose it :)

The implementation of tmp_path is very similar to your proposed implementation (I removed the docstring and clean up code that follows the yield):

def _mk_tmp(request: FixtureRequest, factory: TempPathFactory) -> Path: name = request.node.name name = re.sub(r"[\W]", "_", name) MAXVAL = 30 name = name[:MAXVAL] return factory.mktemp(name, numbered=True) @fixture def tmp_path( request: FixtureRequest, tmp_path_factory: TempPathFactory ) -> Generator[Path, None, None]: path = _mk_tmp(request, tmp_path_factory) yield path

Tishj

Thanks for the changes and making it as less invasive as possible, looks good to me 👍

Mytherin · 2024-05-17T13:03:44Z

Thanks!

Merge pull request duckdb/duckdb#12081 from Maxxen/type-metadata-redux Merge pull request duckdb/duckdb#11914 from cpcloud/run-pyodide-tests

duckdb-draftbot marked this pull request as draft May 2, 2024 22:17

cpcloud marked this pull request as ready for review May 2, 2024 22:21

cpcloud force-pushed the run-pyodide-tests branch from a1813c1 to c6ce3bf Compare May 2, 2024 23:56

duckdb-draftbot marked this pull request as draft May 2, 2024 23:56

cpcloud marked this pull request as ready for review May 3, 2024 01:19

duckdb-draftbot marked this pull request as draft May 3, 2024 10:13

cpcloud marked this pull request as ready for review May 3, 2024 10:34

cpcloud changed the title ~~Run python fast test suite in Pyodide build~~ Run python tests in Pyodide build May 3, 2024