Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: add missing latency check #763

Merged
merged 11 commits into from
May 8, 2024
13 changes: 10 additions & 3 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ def system(session):

install_systemtest_dependencies(session, "-c", constraints_path)

# Print out package versions.
session.run("python", "-m", "pip", "freeze")

# Run py.test against the system tests.
Expand Down Expand Up @@ -347,17 +348,21 @@ def prerelease(session):
session.run("python", "-m", "pip", "freeze")

# Run all tests, except a few samples tests which require extra dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the added empty line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if it passes linting and black... I am fine with it being as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this - if there isn't a good reason to change it, I don't think we should.

chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
session.run(
"py.test",
"--quiet",
f"--junitxml=prerelease_unit_{session.python}_sponge_log.xml",
os.path.join("tests", "unit"),
# os.path.join("tests", "unit"),
chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
*session.posargs,
chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
)

session.run(
"py.test",
"--quiet",
f"--junitxml=prerelease_system_{session.python}_sponge_log.xml",
os.path.join("tests", "system"),
# os.path.join("tests", "system"),
chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
*session.posargs,
)


Expand Down Expand Up @@ -515,7 +520,9 @@ def prerelease_deps(session):
session.install(*other_deps)
session.run("python", "-m", "pip", "freeze")

# Print out prerelease package versions
# Print out package versions.
session.run("python", "-m", "pip", "freeze")

session.run(
"python", "-c", "import google.protobuf; print(google.protobuf.__version__)"
)
Expand Down
15 changes: 14 additions & 1 deletion pandas_gbq/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,20 @@ def run_query(self, query, max_results=None, progress_bar_type=None, **kwargs):
timeout_ms = job_config_dict.get("jobTimeoutMs") or job_config_dict[
"query"
].get("timeoutMs")
timeout_ms = int(timeout_ms) if timeout_ms else None

if timeout_ms:
timeout_ms = int(timeout_ms)
# Having too small a timeout_ms results in individual
# API calls timing out before they can finish.
# ~300 milliseconds is rule of thumb for bare minimum
# latency from the BigQuery API.
minimum_latency = 400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can define minimum_latency as a constant at top of the file, for easy access and reuse in the future (see example).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to leave this as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why? Thank you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Linchin

I don't see this particular variable being used widely throughout the code (now or in the future).

I appreciate having the variable and the comment about the variable close together and in this case close to where they are being used. I feel it improves the readability/maintainability of the code. Especially since this edit is in response to a flakybot failing test. If it continues to fail, then whoever troubleshoots this is likely to see the comment in situ and may then tweak the duration to better match real world conditions.

if timeout_ms < minimum_latency:
raise QueryTimeout(
f"Query timeout must be at least 400 milliseconds: timeout_ms equals {timeout_ms}."
)
else:
timeout_ms = None

self._start_timer()
job_config = bigquery.QueryJobConfig.from_api_repr(job_config_dict)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"google-auth-oauthlib >=0.7.0",
# Please also update the minimum version in pandas_gbq/features.py to
# allow pandas-gbq to detect invalid package versions at runtime.
"google-cloud-bigquery >=3.3.5,<4.0.0dev",
"google-cloud-bigquery >=3.3.5,!=3.21.0,<4.0.0dev",
chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
"packaging >=20.0.0",
]
extras = {
Expand Down
7 changes: 5 additions & 2 deletions tests/system/test_gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,11 +474,14 @@ def test_timeout_configuration(self, project_id):
select count(*) from unnest(generate_array(1,1000000)), unnest(generate_array(1, 10000))
"""
configs = [
# we have a minimum limit on the timeout_ms being 400 milliseconds
# see pandas-gbq/gbq.py/GbqConnector/run_query docstring
# for more details.
# pandas-gbq timeout configuration. Transformed to REST API compatible version.
{"query": {"useQueryCache": False, "timeoutMs": 1}},
{"query": {"useQueryCache": False, "timeoutMs": 401}},
# REST API job timeout. See:
# https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration.FIELDS.job_timeout_ms
{"query": {"useQueryCache": False}, "jobTimeoutMs": 1},
{"query": {"useQueryCache": False}, "jobTimeoutMs": 401},
chalmerlowe marked this conversation as resolved.
Show resolved Hide resolved
]
for config in configs:
with pytest.raises(gbq.QueryTimeout):
Expand Down