Python scripts to capture streaming reddit comments, streaming reddit posts, or streaming stock data and store it in BigQuery.
docker build -t reddit_stream_capture:1.0 .
Streaming comments
docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_comments.py \
--subreddit subreddit_name \
--project project_name \
--dataset dataset_name \
--table table_name
Streaming submissions
docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_submissions.py \
--subreddit subreddit_name \
--project project_name \
--dataset dataset_name \
--table table_name
Streaming stock data
docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_stock_data.py \
--ticker stock_ticker \
--project project_name \
--dataset dataset_name \
--table table_name
--market_hours_only
If this flag is given, data is only streamed during market hours.
--data_frequency
The number of seconds between data pulls, i.e. time to sleep after each data point. Default = 30 seconds.
- Usage:
--data_frequency 60
- Reddit API key - get one here
- Google Cloud account
- Service account with a private JSON key - instructions here
PRAW Documentation: https://praw.readthedocs.io/en/stable/index.html
Reddit API Key Signup for Application Developers: https://ssl.reddit.com/prefs/apps/
yfinance Documentation: https://github.com/ranaroussi/yfinance
BigQuery Storage Write API: https://cloud.google.com/bigquery/docs/write-api