Skip to content

elaaf/redscrap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RedScrap

A Reddit Scrapper built around the PushShift API.

RedScrap

Features

  • Get old Reddit submissions (and their comments)
  • Save to CSV files for easy loading/usage
  • Complex Search Query support and sepcify Subreddits to restrict search
  • Resume from last saved state

Resume State

Requirements

# Create a Python 3.6+ virtual environment and run
pip install -r requirements.txt

How To Run This Code

Clone this repo.

git clone https://github.com/elaaf/redscrap.git

Usage

Either import the RedScrap class OR alter and run main.py

The RedScrap Class

from redscrap.scrapper import RedScrap
# Search between dates...
start_date = "2020-10-01"
end_date = "2020-10-02"

# Search terms...
search_terms = [ "COVID|Corona" ]

# Subreddits to look in.... (default=ALL)
subreddits = []

# Creating a RedScrap Object
scrapper = RedScrap(start_date=start_date, 
                    end_date=end_date, 
                    search_terms=search_terms,
                    subreddits=subreddit)

# Retrieve submissions...
scrapper.retrieve_submissions(retrieve_comments=False)

Run main.py

Alter main.py. (Can use command line arguments)

python main.py

To Do

  • Add Support to gather comments only
  • Improve code documentation
  • Add pip install RedScrap

About

Reddit Scrapper using the PushShift API.

Topics

Resources

Stars

Watchers

Forks

Languages