Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README improvement for chapters 2 and 3 regarding upload to BQ #159

Open
jgammerman opened this issue Dec 17, 2022 · 4 comments
Open

README improvement for chapters 2 and 3 regarding upload to BQ #159

jgammerman opened this issue Dec 17, 2022 · 4 comments

Comments

@jgammerman
Copy link

Hello,

Excellent book so far, but a problem I've been having is uploading the 2015 CSVs from my cloud storage bucket to BigQuery.

Both the ch2 and ch3 READMEs just tell you to run:

cd data-science-on-gcp/02_ingest
./ingest_from_crsbucket.sh bucketname

But this only copies the CSVs from the book's bucket to the user's. It doesn't cover the next stage i.e. uploading to BQ.

The alternative route of ingesting from the original source of data also doesn't work: I found that my Google Cloud Shell kept disconnecting halfway through the upload process.

Therefore I'd recommend adding the following instruction to both READMEs, showing you explicitly how to do the upload to BQ:

bash bqload.sh bucketname 2015

@lakshmanok
Copy link
Contributor

thanks, I've put in a pull request to make the change. Instead of using ./ingest_from_crsbucket.sh, simply using ./ingest.sh will do the trick as it also uploads to BigQuery.

@jgammerman
Copy link
Author

That approach didn't work for me either - my Cloud Shell would disconnect halfway through the upload to BQ so I would end up with an incomplete table. Solution was simply to run bash bqload.sh bucketname 2015.

Other people may not be so unfortunate though!

@softjobs
Copy link

softjobs commented Mar 23, 2023

Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.

Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.

@softjobs
Copy link

softjobs commented Mar 24, 2023

Struggling for almost a day now trying to load to BigQuery without luck... used the bqload.sh with the correct params but getting the "Not found: URI gs://srini-laks-gcp1-dsongcp" error.

Enjoyed reading the two chapters but surprised to see the "user-unfriendliness" of this GCP platform. It shouldn't;t have to take all this time, given the data available through a Google search, but it does! Frustrating, to say the least.

Got it to work finally... Page 49 changes:

  • Navigate into the flights 02_ingest folder
  • cd data-science-on-gcp/02_ingest

  • Run the code to download the files:
  • for MONTH....
  • bash ../download.sh 2015 $MONTH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants