Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README improvements #132

Open
12 tasks
pm0kjp opened this issue Dec 26, 2021 · 1 comment
Open
12 tasks

README improvements #132

pm0kjp opened this issue Dec 26, 2021 · 1 comment

Comments

@pm0kjp
Copy link

pm0kjp commented Dec 26, 2021

Joy Payton here. A few suggestions / notes on the edition2 branch:

  • include .sh extension (copy how you do it in Ch 2 README?)
  • make clear that "bucketname" should be replaced (copy how you do it in Ch 2 README?)
  • In Ch. 3 README (and propagate forward) indicate that one has to run ./bqload.sh csv-bucket-name YEAR to populate BigQuery before executing the step that runs ./create_views.sh
  • In Ch. 3 README consider adding horizontal lines or other break to distinguish the optional part beginning and end
  • In Ch. 4 README, Warn users about error possibility and venv workaround on the first step of Batch processing transformation in DataFlow. Explain that if they wait too long to execute everything and lose their venv, they'll have to rerun the step.
  • In Ch. 4 README, after the "catch up" section, guide user to navigating into ~/data-science-on-gcp/04_streaming
  • In Ch. 4 README, Indicate that you should replace text (bucket, project) in Read/write to Cloud step
  • In Ch. 4 README, Enable api at https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview before running df07.py
  • In Ch. 4 README, Add "in the cloud dataflow section" to after df07 and/or link https://console.cloud.google.com/dataflow/jobs
  • In Ch. 4 README add a link to BigQuery before the query itself (https://console.cloud.google.com/bigquery)
  • In Ch. 4 README, in the step Simulate Event Stream, learner has just been in transform so change the cd command to cd ../simulate or a path from ~
  • In Ch. 4 README, in the Real-time Stream Processing step:
  • explain how to open a new cloudshell
  • again, provide the path from home directory for the cd command
  • make project and bucketname substitution more obvious
  • describe expected output and that you'll have a foreground process
  • specify using ctrl-c in the correct terminal window
  • make explicit that "you need to run this in dataflow" will be handled by the next script
  • add step tp enable pubsub api at https://console.developers.google.com/apis/api/pubsub.googleapis.com/
  • suggest view name for ATL delays
@vfrankswood
Copy link

@pm0kjp thanks a lot for these comments. Finally code works :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants