Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download an entire bucket to a virtual machine #3225

Closed
dachosen1 opened this issue Mar 30, 2020 · 1 comment
Closed

Download an entire bucket to a virtual machine #3225

dachosen1 opened this issue Mar 30, 2020 · 1 comment
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.

Comments

@dachosen1
Copy link

dachosen1 commented Mar 30, 2020

I have a storage bucket that has mp3 and Wav files, and I have a few questions.

  1. is there a way to read data directly from a bucket without downloading it? I've browsed through the files and it seems that you can only download a file and then read it?
  2. Is there a way to download an entire bucket to a virtual machine?
  3. Can you migrate all the data from bucket to SQL?

If it helps, my goal is to train a machine learning model and deploy it using google cloud. I'm at the stage where I'm experimenting with different feature engineering methods and save the data for training.

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Mar 30, 2020
@JustinBeckwith JustinBeckwith added type: question Request for information or clarification. Not an issue. api: storage Issues related to the Cloud Storage API. labels Apr 2, 2020
@JustinBeckwith JustinBeckwith assigned kurtisvg and unassigned crwilcox Apr 2, 2020
@yoshi-automation yoshi-automation removed the triage me I really want to be triaged. label Apr 2, 2020
@kurtisvg
Copy link
Contributor

kurtisvg commented Apr 2, 2020

Hi @dachosen1,

This repo is limited in scope to our samples and issues with them. In the future, this type of question would be better fitted to something like StackOverflow.

However, let me provide some quick pointers to some of your questions:

  1. It doesn't look like the python-storage library supports streaming from a bucket.
  2. You can use gsutil to do this (as well as mount buckets to the filesystem).
  3. There is no automatic feature to sync a GCS bucket and SQL - since the data structure here is greatly varied, you'd need to write your own pipeline that understands how to read your bucket content and convert that content into a the correct SQL format.

@kurtisvg kurtisvg closed this as completed Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants