Download all posts from a subreddit
- Get all posts ids from pushshift.io
- Query ids from reddit api in the batches of 100
- Merge by id
Get your own reddit api keys and replace:
reddit_client_id = ''
reddit_client_secret = ''
If you want to run steps 1 and 2, set them True
:
do_step_1_now = False
do_step_2_now = False
Otherwise, it will use the sample data, which is ChatGPT subreddit on Jan 26th.
python3 reddit-crawler.py
The first step uses a script from Watchful1/Sketchpad.
Pull requests are welcome