Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix plotting tests in python covidcast #617

Merged
merged 3 commits into from Mar 29, 2023

Conversation

krivard
Copy link
Contributor

@krivard krivard commented Mar 3, 2023

Fixes #596

Restores passing status to the plotting tests in in covidcast-py, which is needed before we can update the package for API keys.

Changelog:

  • Switch to geopandas.testing.assert_geodataframe_equals for test__join_county_geo_df (see plot samples below)
  • Match R behavior merged in covidcast 0.4.5 #610 by not rotating Puerto Rico (this is unrelated to the failing test but is still a good idea)
  • Refresh reference plots for an un-rotated Puerto Rico (see plot samples below)

Details

This turns out to have been a change in how the MultiPolygon for North Carolina was notated. The geopandas assert_geodataframe_equal correctly identifies the two multipolygons as identical where pandas' assert_frame_equal does not.

If you would like to check their identicalness for yourself:

Original North Carolina multipolygon:
expected__join_county_geo_df_mega expected

New North Carolina multipolygon:
expected__join_county_geo_df_mega actual

I wasn't sure whether it would be appropriate to change the other gpkg tests to use assert_geodataframe_equal, so I did not. We can always change them later if they, too, break.

Along the way, we thought the failure might have something to do with the same Puerto Rico rotation problem we saw in the R package. This turned out to be a red herring, but having the two packages match in their presentation of Puerto Rico seems polite, so I left it in.

New reference plots with unrotated Puerto Rico:

Old state:
state expected

New state:
state actual


Old no-mega-2:
no_mega_2 expected

New no-mega-2:
no_mega_2 actual


Old no-mega-1:
no_mega_1 expected

New no-mega-1:
no_mega_1 actual


Old msa:
msa expected

New msa:
msa actual


Old msa-bubble:
msa_bubble expected

New msa-bubble:
msa_bubble actual


Old mega:
mega expected

New mega:
mega actual

Development history

(for future reference only; probably not needed for review)

Original test failure:

FAILED tests/test_plotting.py::test__join_county_geo_df - AssertionError: ExtensionArray are different
Test failure message
        # test w/ megacounty combine
        mega = plotting._join_county_geo_df(test_input, "county_code", geo_info, "left", True)
        expected_mega = gpd.read_file(
            os.path.join(CURRENT_PATH,
                         "reference_data/expected__join_county_geo_df_mega.gpkg"),
            dtype={"geo_value": str})
>       pd.testing.assert_frame_equal(expected_mega, mega)

tests/test_plotting.py:214:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
env/lib/python3.8/site-packages/pandas/_testing/asserters.py:855: in assert_extension_array_equal
    _testing.assert_almost_equal(
pandas/_libs/testing.pyx:52: in pandas._libs.testing.assert_almost_equal
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   AssertionError: ExtensionArray are different
E   
E   ExtensionArray values are different (33.33333 %)
E   [index]: [0, 1, 2]
E   [left]:  [POLYGON ((-76.71131299999999 39.371933, ... -75.726807 35.935843999999996)))]
E   [right]: [POLYGON ((-76.71131299999999 39.371933, ... -75.662938 35.916166)))]

pandas/_libs/testing.pyx:167: AssertionError
------------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------------
ERROR    fiona._env:geodataframe.py:630 Pointer 'hGeom' is NULL in 'OGR_G_GetGeometryType'.

Hypothesis: We fixed a problem caused by rotations on machines with obscure BLAS configurations in 4835b27 for the R package, but didn't apply the same change to the python package. Try that.

Result: Same failure plus an additional one, though it's probably just that we need to update the reference dfs.

FAILED tests/test_plotting.py::test_plot - assert False
FAILED tests/test_plotting.py::test__join_county_geo_df - AssertionError: ExtensionArray are different
Test failure message for the `assert False`
        # w/o megacounties
        no_mega_fig1 = plotting.plot(test_county,
                                     time_value=date(2020, 8, 4),
                                     combine_megacounties=False)
        # give margin of +-2 for floating point errors and weird variations (1 isn't consistent)
>       assert np.allclose(_convert_to_array(no_mega_fig1), expected["no_mega_1"], atol=2, rtol=0)
E       assert False
E        +  where False = <function allclose at 0x7efd1a7de9d0>(array([255, 255, 255, ..., 255, 255, 255], dtype=uint8), array([255, 255, 255, ..., 255, 255, 255], dtype=uint8), atol=2, rtol=0)
E        +    where <function allclose at 0x7efd1a7de9d0> = np.allclose
E        +    and   array([255, 255, 255, ..., 255, 255, 255], dtype=uint8) = _convert_to_array(<Figure size 1280x960 with 2 Axes>)

tests/test_plotting.py:56: AssertionError

Next steps:

  1. Try fixing the reference dfs
  2. Dmitry tracked down the original error to a specific MultiPolygon which has the megacounty code 37000, which is North Carolina, so maybe there's some transformation there too.

Python version of 4835b27 attempting to fix #605. Still fails tests though.
@krivard krivard changed the title [wip] Fix plotting tests in python covidcast Fix plotting tests in python covidcast Mar 28, 2023
@krivard krivard marked this pull request as ready for review March 28, 2023 17:47
Copy link
Contributor

@capnrefsmmat capnrefsmmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit to me. My guess is that the polygons are semantically equivalent but represented differently; assert_frame_equal wants them to be identically represented, but assert_geodataframe_equal is smart enough to test if the polygons are equivalent, even if they're represented differently.

Seems like a good idea to use it elsewhere when we're comparing geodataframes; maybe file a follow-up issue?

@krivard krivard merged commit 781704c into main Mar 29, 2023
2 of 3 checks passed
@krivard krivard deleted the krivard/fix-covidcast-py-plot-tests branch March 29, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python covidcast client test_plotting is failing
2 participants