Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking Change] Don't remove destinationTable attribute in create_job #483

Closed
dinigo opened this issue Jan 24, 2021 · 9 comments
Closed
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. semver: major Hint for users that this is an API breaking change. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@dinigo
Copy link

dinigo commented Jan 24, 2021

What happened:

I run a job with a query and a destination table and the table is not generated from the file.

bq_client.create_job({
    'query': {
        'query': rendered_query,
        'destinationTable': {
            'projectId': GCP_PROJECT,
            'datasetId': DATASET_ID,
            'tableId': TRANSFORMED_TABLE_ID
        },
        'writeDisposition': 'WRITE_TRUNCATE'
})

My code executes successfully and produces the following job in the console
image

How to fix it

There is a line where the destinationTable is removed from the job config. Removing the line that deletes the property in the job configuration will do.

_del_sub_prop(copy_config, ["query", "destinationTable"])

Can anyone explain why is the destinationTable property removed?

In the meanwhile I'm opening a PR

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jan 24, 2021
dinigo added a commit to dinigo/python-bigquery that referenced this issue Jan 24, 2021
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 24, 2021
@plamut
Copy link
Contributor

plamut commented Jan 25, 2021

From the description of the ticket when this was added:

For example, BigQuery automatically populates a destination table if not set. In this case, the destination table should not be part of the retried request. In my opinion, this kind of logic is outside the scope of the client libraries, as it's not clear when destination table needs to be cleared just from the job resource.

While the client library always removes the destination table, that might not be the correct in all cases, but then again, it might be difficult to determine in every case what action to take.

@tswast Do you remember the details and what the backend expects? Is it possible/feasible for the client to be in sync with the backend without duplicating the latter's logic?

@plamut plamut added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Jan 25, 2021
@tswast
Copy link
Contributor

tswast commented Jan 25, 2021

@tswast Do you remember the details and what the backend expects? Is it possible/feasible for the client to be in sync with the backend without duplicating the latter's logic?

The main problem is permissions. When the destination table is automatically populated, it creates a table in a special dataset where data automatically expires and the customer is not charged for storage. The customer does not have permission to write directly to such a table.

For example, I just ran a query and it wrote the results to the temporary table swast-scratch:_63cfa399614a54153cc386c27d6c0c6fdb249f9e.anonb14b7884b8147f85ae0731c3a6f8d0026af97ab

In every case I've seen, the temporary table dataset_id begins with _ and the table name begins with anon.

We could possibly only strip the destination table in these cases. There's still a risk of false positives, but it should be much less likely.

@tswast tswast added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: question Request for information or clarification. Not an issue. labels Jan 25, 2021
@plamut
Copy link
Contributor

plamut commented Jan 25, 2021

OK thanks, this sounds like a better heuristic. If it turns out that it doesn't work well enough in practice, it should be easy to modify/remove, if necessary.

@dinigo
Copy link
Author

dinigo commented Jan 26, 2021

I might be failing to fully understand why the python API/library works like it does because I lack this context you are talking about.

What I want to achieve is the same functionality as expected and provided by other ways I interact with bigquery:

@tswast can you help me understand this use-case with the temp table? It looks to me like a client logic. But again, I think I'm failing to fully grasp the example.

I can seamlessly migrate mi code between several languages simply moving the job configuration (dictionary/json/struct) and I don't understand why the python library is an exception.

I don't want to be rude by any means. And I respect your work, which is fantastic.

@tswast
Copy link
Contributor

tswast commented Jan 26, 2021

@dinigo Full context is here: googleapis/google-cloud-python#5555 and here: #14

You're probably right that the correct choice back then would have been to keep create_job simple and perhaps had a helper to "reset" any server-side job configuration so that it can retried. Unfortunately we did not make that choice, so we need to avoid breaking that "retry this job, please" use-case.

@dinigo
Copy link
Author

dinigo commented Jan 26, 2021

Thanks @tswast ! I get it now. Also You have to keep the API backwards compatible ;)

No problem. We will use a fork and rebase regularly since we need and expect this funcionality.

I would like though to have this feature integrated at some point in the future. If you could tag it as such. Thanks again

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Apr 27, 2021
@meredithslota meredithslota added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 10, 2021
@tswast tswast changed the title Query jobs destinationTable attribute is explicitly removed [Breaking Change] Don't remove destinationTable attribute in create_job Jul 26, 2021
@tswast
Copy link
Contributor

tswast commented Jul 26, 2021

@jimfulton I think if we get retryable queries working in #539, we should be able to remove this odd behavior of create_job.

Also, perhaps if create_job is called with any of our *Job classes, it uses the original request that we cache rather than trying to figure out which properties on the response resource we can send back?

@tswast tswast added the semver: major Hint for users that this is an API breaking change. label Jul 27, 2021
@jimfulton jimfulton added the status: blocked Resolving the issue is dependent on other work. label Jul 30, 2021
@jimfulton
Copy link
Contributor

Some notes:

@jimfulton jimfulton removed the status: blocked Resolving the issue is dependent on other work. label Aug 20, 2021
@tswast
Copy link
Contributor

tswast commented Oct 6, 2021

Closed by #891

Will be released in 3.0.0

@tswast tswast closed this as completed Oct 6, 2021
google-cloud-bigquery 3.0.0 automation moved this from To do to Done Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. semver: major Hint for users that this is an API breaking change. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants