Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): unit and system test for dataframe with int column with Nan values #39

Merged
merged 12 commits into from May 13, 2020

Conversation

HemangChothani
Copy link
Contributor

Fixes #22

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 18, 2020
@HemangChothani HemangChothani changed the title Bigquery issue 22 feat(bigquery): unit and system test for dataframe with int column with Nan values Feb 18, 2020
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good, just have two suggestion on improving the skipIf decorators.

tests/system.py Outdated
@@ -736,6 +736,65 @@ def test_load_table_from_dataframe_w_automatic_schema(self):
)
self.assertEqual(table.num_rows, 3)

@unittest.skipIf(
pandas is None or pandas.__version__ < "1.0.1",
"Only `pandas version >=1.0.1` supports",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only `pandas version >=1.0.1` supports",
"Only `pandas version >=1.0.1` supported",

Seems like more correct? (disclaimer: not a native speaker)

tests/system.py Outdated
@@ -736,6 +736,65 @@ def test_load_table_from_dataframe_w_automatic_schema(self):
)
self.assertEqual(table.num_rows, 3)

@unittest.skipIf(
pandas is None or pandas.__version__ < "1.0.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought - I think comparing plain version strings will actually work in this particular case, but cannot be generalized (e.g. somebody copies this pattern over to another test, or if we have to change the version for some reason).

Let's instead add a library like packaging as a test dependency, and use that for parsing and semantically comparing versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why pandas version 1.0.1+, as the code sample from the issue description also seems to works in pandas 1.0.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but need suggestion on version comparison like should i use parse(pandas.__version__) < parse("1.0.0") or should i change the pandas required minimum version in setup.py file and skip that if PY2 or something else?

Copy link
Contributor

@plamut plamut Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the issue was due to a bug in Pandas, it's probably best to directly compare the version of Pandas rather than a proxy value such as the Python version.

Re: bumping the version - if the new Pandas version is still compatible with Python 2.7, we can bump it (we have tests...), otherwise we will have to wait until April or so when we officially drop Python 2 support (IIRC).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version of pandas isn't compatible with Python 2.7, it Requires: Python >=3.6.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We could still conditionally bump the pandas version for Python 3.5+ in setup.py, though, to at least guarantee correct behavior in Python 3, even though we cannot fix the bug in Python 2.7 due to pandas incompatibility.

@tswast What do you think?

P.S.: If there is an efficient way of detecting whether a column contains a NaN value, we could also emit a user warning if too old a pandas version is installed... just a thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast bump

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have setuptools as a required dependency, so why aren't you using pkg_resources?

https://github.com/pydata/pandas-gbq/blob/97e9a9ee3a8f65ede29f25954b1af3f3bc606f58/pandas_gbq/gbq.py#L46

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of places (including Google) are pretty slow to update pandas, so I'd hesitate to require 1.0+ just yet.

tests/system.py Outdated
"Only `pandas version >=1.0.1` is supported",
)
@unittest.skipIf(pyarrow is None, "Requires `pyarrow`")
def test_load_table_from_dataframe_w_none_value(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullable Int64 data type is the real reason for the test. There are plenty of existing tests & samples with loading None values object dtype.

@plamut
Copy link
Contributor

plamut commented Apr 2, 2020

@shollyman Friendly ping. :)

@HemangChothani Please just bring the branch up to date in the meanwhile, thanks!

@HemangChothani
Copy link
Contributor Author

@shollyman PTAL!

@HemangChothani
Copy link
Contributor Author

@shollyman Friendly ping. :)

tests/system.py Outdated
@@ -125,6 +126,9 @@
(TooManyRequests, InternalServerError, ServiceUnavailable)
)

PANDAS_MINIUM_VERSION = pkg_resources.parse_version("1.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MINIUM/MINIMUM/ and all the references

@HemangChothani HemangChothani merged commit 5fd840e into googleapis:master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery : ArrowTypeError when trying to push DataFrame with int columns with NaN values
5 participants