New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): unit and system test for dataframe with int column with Nan values #39
Merged
HemangChothani
merged 12 commits into
googleapis:master
from
MaxxleLLC:bigquery_issue_22
May 13, 2020
Merged
Changes from 2 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
b620f85
feat(bigquery): add unit and system tests for int columns
HemangChothani b906717
feat(bigquery): cosmetic changes
HemangChothani 626cc8c
feat(bigquery): use pkg_resources for comparison
HemangChothani a1be872
Merge remote-tracking branch 'upstream/master' into bigquery_issue_22
HemangChothani b3a9781
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani a3547de
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani 62eb1ad
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani db5fdc4
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani 95ddeb8
feat(bigquery): nit
HemangChothani 734f895
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani d9cd2f1
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani 66cf1fe
Merge branch 'master' of https://github.com/googleapis/python-bigquer…
HemangChothani File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -736,6 +736,66 @@ def test_load_table_from_dataframe_w_automatic_schema(self): | |
) | ||
self.assertEqual(table.num_rows, 3) | ||
|
||
@unittest.skipIf( | ||
pandas is None or pandas.__version__ < "1.0.1", | ||
"Only `pandas version >=1.0.1` is supported", | ||
) | ||
@unittest.skipIf(pyarrow is None, "Requires `pyarrow`") | ||
def test_load_table_from_dataframe_w_none_value(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The nullable |
||
"""Test that a DataFrame containing None-type values can be uploaded | ||
if a BigQuery schema is specified. | ||
|
||
https://github.com/googleapis/python-bigquery/issues/22 | ||
""" | ||
|
||
dataset_id = _make_dataset_id("bq_load_test") | ||
self.temp_dataset(dataset_id) | ||
table_id = "{}.{}.load_table_from_dataframe_w_none_value".format( | ||
Config.CLIENT.project, dataset_id | ||
) | ||
table_schema = (bigquery.SchemaField("x", "INTEGER", mode="NULLABLE"),) | ||
table = retry_403(Config.CLIENT.create_table)( | ||
Table(table_id, schema=table_schema) | ||
) | ||
self.to_delete.insert(0, table) | ||
|
||
df_data = collections.OrderedDict( | ||
[("x", pandas.Series([1, 2, None, 4], dtype="Int64"))] | ||
) | ||
dataframe = pandas.DataFrame(df_data, columns=df_data.keys()) | ||
load_job = Config.CLIENT.load_table_from_dataframe(dataframe, table_id) | ||
load_job.result() | ||
table = Config.CLIENT.get_table(table_id) | ||
self.assertEqual(tuple(table.schema), (bigquery.SchemaField("x", "INTEGER"),)) | ||
self.assertEqual(table.num_rows, 4) | ||
|
||
@unittest.skipIf( | ||
pandas is None or pandas.__version__ < "1.0.1", | ||
"Only `pandas version >=1.0.1` is supported", | ||
) | ||
@unittest.skipIf(pyarrow is None, "Requires `pyarrow`") | ||
def test_load_table_from_dataframe_w_automatic_schema_w_none_value(self): | ||
"""Test that a DataFrame containing None-type values can be uploaded | ||
without specifying a schema. | ||
|
||
https://github.com/googleapis/python-bigquery/issues/22 | ||
""" | ||
|
||
dataset_id = _make_dataset_id("bq_load_test") | ||
self.temp_dataset(dataset_id) | ||
table_id = "{}.{}.load_table_from_dataframe_w_automatic_schema_w_none_value".format( | ||
Config.CLIENT.project, dataset_id | ||
) | ||
df_data = collections.OrderedDict( | ||
[("x", pandas.Series([1, 2, None, 4], dtype="Int64"))] | ||
) | ||
dataframe = pandas.DataFrame(df_data, columns=df_data.keys()) | ||
load_job = Config.CLIENT.load_table_from_dataframe(dataframe, table_id) | ||
load_job.result() | ||
table = Config.CLIENT.get_table(table_id) | ||
self.assertEqual(tuple(table.schema), (bigquery.SchemaField("x", "INTEGER"),)) | ||
self.assertEqual(table.num_rows, 4) | ||
|
||
@unittest.skipIf(pandas is None, "Requires `pandas`") | ||
@unittest.skipIf(pyarrow is None, "Requires `pyarrow`") | ||
def test_load_table_from_dataframe_w_nulls(self): | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought - I think comparing plain version strings will actually work in this particular case, but cannot be generalized (e.g. somebody copies this pattern over to another test, or if we have to change the version for some reason).
Let's instead add a library like packaging as a test dependency, and use that for parsing and semantically comparing versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, why pandas version
1.0.1+
, as the code sample from the issue description also seems to works in pandas 1.0.0?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but need suggestion on version comparison like should i use
parse(pandas.__version__) < parse("1.0.0")
or should i change the pandas required minimum version in setup.py file and skip that if PY2 or something else?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the issue was due to a bug in Pandas, it's probably best to directly compare the version of Pandas rather than a proxy value such as the Python version.
Re: bumping the version - if the new Pandas version is still compatible with Python 2.7, we can bump it (we have tests...), otherwise we will have to wait until April or so when we officially drop Python 2 support (IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version of pandas isn't compatible with Python 2.7, it Requires: Python >=3.6.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We could still conditionally bump the pandas version for Python 3.5+ in
setup.py
, though, to at least guarantee correct behavior in Python 3, even though we cannot fix the bug in Python 2.7 due to pandas incompatibility.@tswast What do you think?
P.S.: If there is an efficient way of detecting whether a column contains a
NaN
value, we could also emit a user warning if too old a pandas version is installed... just a thought.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tswast bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have setuptools as a required dependency, so why aren't you using
pkg_resources
?https://github.com/pydata/pandas-gbq/blob/97e9a9ee3a8f65ede29f25954b1af3f3bc606f58/pandas_gbq/gbq.py#L46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of places (including Google) are pretty slow to update pandas, so I'd hesitate to require 1.0+ just yet.