Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: inserting non-finite floats with insert_rows() #728

Merged
merged 1 commit into from Jul 1, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Jun 29, 2021

Fixes #696.

This PR fixes inserting Nan and non-finite floats with the client.insert_rows() method.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@plamut plamut requested a review from a team June 29, 2021 13:21
@plamut plamut requested a review from a team as a code owner June 29, 2021 13:21
@plamut plamut requested review from stephaniewang526 and removed request for a team June 29, 2021 13:21
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 29, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jun 29, 2021
@plamut
Copy link
Contributor Author

plamut commented Jun 29, 2021

test_insert_rows_from_dataframe() fails, will investigate how the changes affected Nan-handling there.

@stephaniewang526 stephaniewang526 requested review from tswast and removed request for stephaniewang526 June 29, 2021 16:27
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 30, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 30, 2021
@plamut
Copy link
Contributor Author

plamut commented Jun 30, 2021

I could not reproduce the failed test even after multiple trials, running it both in isolation and in the entire test suite. It seems weird why there would be no rows in the query results, NaN conversions didn't seem to play the role here.

        rows = list(
            bigquery_client.query(
                "SELECT * FROM `{}.{}.{}`".format(
                    table.project, table.dataset_id, table.table_id
                )
            )
        )

        sorted_rows = sorted(rows, key=operator.attrgetter("int_col"))
        row_tuples = [r.values() for r in sorted_rows]
        expected = [...]

>       assert len(row_tuples) == len(expected)
E       AssertionError: assert 0 == 6
E        +  where 0 = len([])
E        +  and   6 = len([(1.11, True, 'my string', 10), (2.22, False, 'another string', 20), (3.33, False, 'another string', 30), (4.44, True, 'another string', 40), (5.55, False, 'another string', 50), (6.66, True, None, 60)])

In any case, re-running the CI checks made them pass, and the same result was consistently reproduced locally.

@jimfulton
Copy link
Contributor

I could not reproduce the failed test even after multiple trials, running it both in isolation and in the entire test suite. It seems weird why there would be no rows in the query results, NaN conversions didn't seem to play the role here.

        rows = list(
            bigquery_client.query(
                "SELECT * FROM `{}.{}.{}`".format(
                    table.project, table.dataset_id, table.table_id
                )
            )
        )

        sorted_rows = sorted(rows, key=operator.attrgetter("int_col"))
        row_tuples = [r.values() for r in sorted_rows]
        expected = [...]

>       assert len(row_tuples) == len(expected)
E       AssertionError: assert 0 == 6
E        +  where 0 = len([])
E        +  and   6 = len([(1.11, True, 'my string', 10), (2.22, False, 'another string', 20), (3.33, False, 'another string', 30), (4.44, True, 'another string', 40), (5.55, False, 'another string', 50), (6.66, True, None, 60)])

In any case, re-running the CI checks made them pass, and the same result was consistently reproduced locally.

That failed for me too yesterday. I'll create a flaky-test issue.

Eventual consistency?

@plamut
Copy link
Contributor Author

plamut commented Jun 30, 2021

@jimfulton Can't say, it's the first time I'm seeing this failure in 2+ years. Something on the backend might have changed related to when the streamed data is available. It should be as soon as after a few seconds, but even this might be too late in certain circumstances, just enough for the test to fail.

@tseaver
Copy link
Contributor

tseaver commented Jun 30, 2021

@plamut, @jimfulton For the flaky test, the test_utils.retry.RetryResult wrapper might help. Something like:

from test_utils.retry import RetryResult

def _list_query(client, query):
    return list(client.query(query)

def _check_result(result, expected_length)
     return len(result) == expected_length

retry = RetryResult(functools.partial(_check_result, expected_length=len(expected))
retry(_list_query)(bigquery_client, query)

@plamut plamut merged commit d047419 into googleapis:master Jul 1, 2021
@plamut plamut deleted the iss-696 branch July 1, 2021 07:24
gcf-merge-on-green bot pushed a commit that referenced this pull request Jul 13, 2021
Supersedes #711.


## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-13)


### Features

* Add max_results parameter to some of the `QueryJob` methods. ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* Add support for decimal target types. ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* Add support for table snapshots. ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* Enable unsetting policy tags on schema fields. ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* Make it easier to disable best-effort deduplication with streaming inserts. ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API. ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* Inserting non-finite floats with `insert_rows()`. ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* Use `pandas` function to check for `NaN`. ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* Add docs for all enums in module. ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* Omit mention of Python 2.7 in `CONTRIBUTING.rst`. ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
gcf-merge-on-green bot pushed a commit that referenced this pull request Jul 14, 2021
🤖 I have created a release \*beep\* \*boop\*
---
## [2.21.0](https://www.github.com/googleapis/python-bigquery/compare/v2.20.0...v2.21.0) (2021-07-14)


### Features

* add always_use_jwt_access ([#714](https://www.github.com/googleapis/python-bigquery/issues/714)) ([92fbd4a](https://www.github.com/googleapis/python-bigquery/commit/92fbd4ade37e0be49dc278080ef73c83eafeea18))
* add max_results parameter to some of the QueryJob methods ([#698](https://www.github.com/googleapis/python-bigquery/issues/698)) ([2a9618f](https://www.github.com/googleapis/python-bigquery/commit/2a9618f4daaa4a014161e1a2f7376844eec9e8da))
* add support for decimal target types ([#735](https://www.github.com/googleapis/python-bigquery/issues/735)) ([7d2d3e9](https://www.github.com/googleapis/python-bigquery/commit/7d2d3e906a9eb161911a198fb925ad79de5df934))
* add support for table snapshots ([#740](https://www.github.com/googleapis/python-bigquery/issues/740)) ([ba86b2a](https://www.github.com/googleapis/python-bigquery/commit/ba86b2a6300ae5a9f3c803beeb42bda4c522e34c))
* enable unsetting policy tags on schema fields ([#703](https://www.github.com/googleapis/python-bigquery/issues/703)) ([18bb443](https://www.github.com/googleapis/python-bigquery/commit/18bb443c7acd0a75dcb57d9aebe38b2d734ff8c7))
* make it easier to disable best-effort deduplication with streaming inserts ([#734](https://www.github.com/googleapis/python-bigquery/issues/734)) ([1246da8](https://www.github.com/googleapis/python-bigquery/commit/1246da86b78b03ca1aa2c45ec71649e294cfb2f1))
* Support passing struct data to the DB API ([#718](https://www.github.com/googleapis/python-bigquery/issues/718)) ([38b3ef9](https://www.github.com/googleapis/python-bigquery/commit/38b3ef96c3dedc139b84f0ff06885141ae7ce78c))


### Bug Fixes

* inserting non-finite floats with insert_rows() ([#728](https://www.github.com/googleapis/python-bigquery/issues/728)) ([d047419](https://www.github.com/googleapis/python-bigquery/commit/d047419879e807e123296da2eee89a5253050166))
* use pandas function to check for NaN ([#750](https://www.github.com/googleapis/python-bigquery/issues/750)) ([67bc5fb](https://www.github.com/googleapis/python-bigquery/commit/67bc5fbd306be7cdffd216f3791d4024acfa95b3))


### Documentation

* add docs for all enums in module ([#745](https://www.github.com/googleapis/python-bigquery/issues/745)) ([145944f](https://www.github.com/googleapis/python-bigquery/commit/145944f24fedc4d739687399a8309f9d51d43dfd))
* omit mention of Python 2.7 in `CONTRIBUTING.rst` ([#706](https://www.github.com/googleapis/python-bigquery/issues/706)) ([27d6839](https://www.github.com/googleapis/python-bigquery/commit/27d6839ee8a40909e4199cfa0da8b6b64705b2e9))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot insert NaN and non-finite float values using Client.insert_rows()
5 participants