Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use pandas function to check for NaN #750

Merged
merged 4 commits into from Jul 12, 2021
Merged

fix: use pandas function to check for NaN #750

merged 4 commits into from Jul 12, 2021

Conversation

LinuxChristian
Copy link
Contributor

Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself results in a pandas.NA value that doesn't support type-casting
to boolean. Using the build-in pandas.isna function handles all pandas supported NaN values (None, pd.NA, np.nan, NaT).

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #729

Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself results in a pandas.NA value that doesn't support type-casting
to boolean. Using the build-in pandas.isna function handles all pandas supported NaN values.
@LinuxChristian LinuxChristian requested a review from a team July 11, 2021 12:12
@LinuxChristian LinuxChristian requested a review from a team as a code owner July 11, 2021 12:12
@LinuxChristian LinuxChristian requested review from loferris and removed request for a team July 11, 2021 12:12
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 11, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 11, 2021
@plamut plamut added the kokoro:run Add this label to force Kokoro to re-run the tests. label Jul 12, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Jul 12, 2021
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and the test, it generally looks good! Just made two improvement suggestions.

tests/unit/test__pandas_helpers.py Show resolved Hide resolved
tests/unit/test__pandas_helpers.py Outdated Show resolved Hide resolved
@plamut
Copy link
Contributor

plamut commented Jul 12, 2021

FWIW, the prerelease-deps failure is flakiness, while Python 3.6 unit tests failure is legitimate - we use the oldest supported depenency versions there, and the new test failed with an attribute error:

 AttributeError: module 'pandas' has no attribute 'NA'

@LinuxChristian
Copy link
Contributor Author

I added one commit per comment so it's easier to review. If you would like me to squash a final version let me know.

@pytest.mark.skipif(pandas is None, reason="Requires `pandas`")
@pytest.mark.skipIf(
pandas is None or PANDAS_INSTALLED_VERSION < PANDAS_MINIUM_VERSION,
reason="Requires `pandas version >= 1.0.0` which introduces pandas.NA",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat to also include the reason why a minimum version is needed. 👍

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick update!

Nothing to worry about squashing the commits, we do that when the PR is merged.

@plamut plamut added kokoro:run Add this label to force Kokoro to re-run the tests. automerge Merge the pull request once unit tests and other checks pass. labels Jul 12, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Jul 12, 2021
@plamut
Copy link
Contributor

plamut commented Jul 12, 2021

It appears that the test was not skipped?

    def test_dataframe_to_json_generator(module_under_test):
        utcnow = datetime.datetime.utcnow()
        df_data = collections.OrderedDict(
            [
>               ("a_series", [pandas.NA, 2, 3, 4]),
                ("b_series", [0.1, float("NaN"), 0.3, 0.4]),
                ("c_series", ["a", "b", pandas.NA, "d"]),
                ("d_series", [utcnow, utcnow, utcnow, pandas.NaT]),
                ("e_series", [True, False, True, None]),
            ]
        )
E       AttributeError: 'NoneType' object has no attribute 'NA'

Also reproducible locally.

Edit: Ah, typo in the decorator name, I actually remember seeing that before. Fixed it myself, should be fine now.

@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 12, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 12, 2021
@plamut plamut merged commit 67bc5fb into googleapis:master Jul 12, 2021
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jul 12, 2021
gcf-merge-on-green bot pushed a commit that referenced this pull request Jul 13, 2021
Supersedes #711.


## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-13)


### Features

* Add max_results parameter to some of the `QueryJob` methods. ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* Add support for decimal target types. ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* Add support for table snapshots. ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* Enable unsetting policy tags on schema fields. ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* Make it easier to disable best-effort deduplication with streaming inserts. ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API. ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* Inserting non-finite floats with `insert_rows()`. ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* Use `pandas` function to check for `NaN`. ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* Add docs for all enums in module. ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* Omit mention of Python 2.7 in `CONTRIBUTING.rst`. ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
gcf-merge-on-green bot pushed a commit that referenced this pull request Jul 14, 2021
🤖 I have created a release \*beep\* \*boop\*
---
## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-14)


### Features

* add always_use_jwt_access ([#714](https://www.github.com/googleapis/python-bigquery/issues/714)) ([92fbd4a](https://www.github.com/googleapis/python-bigquery/commit/92fbd4ade37e0be49dc278080ef73c83eafeea18))
* add max_results parameter to some of the QueryJob methods ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* add support for decimal target types ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* add support for table snapshots ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* enable unsetting policy tags on schema fields ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* make it easier to disable best-effort deduplication with streaming inserts ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* inserting non-finite floats with insert_rows() ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* use pandas function to check for NaN ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* add docs for all enums in module ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* omit mention of Python 2.7 in `CONTRIBUTING.rst` ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: dataframe_to_json_generator doesn't support pandas.NA type
3 participants