Skip to content

dkfreitag/streaming_reddit_stock_data

Repository files navigation

Stream Reddit Posts, Comments, and Stock Ticker Data into BigQuery

Python scripts to capture streaming reddit comments, streaming reddit posts, or streaming stock data and store it in BigQuery.

Build docker container:

docker build -t reddit_stream_capture:1.0 .

Running a script from docker:

Streaming comments

docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_comments.py \
--subreddit subreddit_name \
--project project_name \
--dataset dataset_name \
--table table_name

Streaming submissions

docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_submissions.py \
--subreddit subreddit_name \
--project project_name \
--dataset dataset_name \
--table table_name

Streaming stock data

docker run -t \
-e CLIENT_ID=client_id_goes_here \
-e CLIENT_SECRET=client_secret_goes_here \
-e GOOGLE_APPLICATION_CREDENTIALS=google_app_cred_file.json \
reddit_stream_capture:1.0 \
src/stream_stock_data.py \
--ticker stock_ticker \
--project project_name \
--dataset dataset_name \
--table table_name

Optional arguments for stream_stock_data.py

--market_hours_only If this flag is given, data is only streamed during market hours.

--data_frequency The number of seconds between data pulls, i.e. time to sleep after each data point. Default = 30 seconds.

  • Usage: --data_frequency 60

Authentication Needed

Documentation Referenced

PRAW Documentation: https://praw.readthedocs.io/en/stable/index.html

Reddit API Key Signup for Application Developers: https://ssl.reddit.com/prefs/apps/

yfinance Documentation: https://github.com/ranaroussi/yfinance

BigQuery Storage Write API: https://cloud.google.com/bigquery/docs/write-api

About

Scripts for streaming comments, posts, and stock data in real time into BigQuery for analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published