Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GCSToBQLoadRunnable aware of which target BQ tables it should ma… #292

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

anderseriksson
Copy link
Contributor

For every connector using batch load to BigQuery the GCSToBQLoadRunnable class is responsible to look through the blobs in Google Cloud Storage and trigger jobs to load them into BigQuery.

Until now the GCSToBQLoadRunnable tried to load all blobs it could find in the bucket. If buckets are shared between connectors the GCSToBQLoadRunnable would still to load each blob into its target table. This caused blobs to be loaded into the target table several times, if there where several batch enabled BigQuery connectors running.

This PR suggests that the GCSToBQLoadRunnable only trigger load jobs for the blobs that are going to the target tables of its connector.

…ke jobs to write to

(cherry picked from commit 3e3d7ff)
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link

codecov-commenter commented Aug 20, 2020

Codecov Report

Merging #292 into master will decrease coverage by 0.18%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##             master     #292      +/-   ##
============================================
- Coverage     70.87%   70.68%   -0.19%     
  Complexity      301      301              
============================================
  Files            32       32              
  Lines          1538     1542       +4     
  Branches        164      167       +3     
============================================
  Hits           1090     1090              
- Misses          390      394       +4     
  Partials         58       58              
Impacted Files Coverage Δ Complexity Δ
...wepay/kafka/connect/bigquery/BigQuerySinkTask.java 58.94% <0.00%> (ø) 27.00 <0.00> (ø)
...ay/kafka/connect/bigquery/GCSToBQLoadRunnable.java 15.06% <0.00%> (-0.43%) 5.00 <0.00> (ø)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants