Skip to content

ArmanJR/subreddit-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

SubReddit Scraper

Download all posts from a subreddit

Steps

  1. Get all posts ids from pushshift.io
  2. Query ids from reddit api in the batches of 100
  3. Merge by id

Running

Get your own reddit api keys and replace:

reddit_client_id = ''
reddit_client_secret = ''

If you want to run steps 1 and 2, set them True:

do_step_1_now = False
do_step_2_now = False

Otherwise, it will use the sample data, which is ChatGPT subreddit on Jan 26th.

python3 reddit-crawler.py

Acknowledgement

The first step uses a script from Watchful1/Sketchpad.

Contributing

Pull requests are welcome

Releases

No releases published

Packages

No packages published

Languages