Skip to content
Joseph Lai edited this page May 20, 2021 · 21 revisions
 __  __  _ __   ____  
/\ \/\ \/\`'__\/',__\ 
\ \ \_\ \ \ \//\__, `\
 \ \____/\ \_\\/\____/
  \/___/  \/_/ \/___/... Universal Reddit Scraper

GitHub top language PRAW Version Build Status Codecov GitHub release (latest by date) Total lines License

Introduction

This is a comprehensive Reddit scraping tool that integrates multiple features:

  • Scrape Reddit via PRAW (the official Python Reddit API Wrapper)
    • Scrape Subreddits
    • Scrape Redditors
    • Scrape submission comments
  • Livestream Reddit via PRAW
    • Livestream submissions submitted within Subreddits or by Redditors
    • Livestream comments submitted within Subreddits or by Redditors
    • Livestream trending submissions within Subreddits
  • Scrape Reddit via the Pushshift API
    • Search for keywords in all publicly available submissions
    • Search for keywords in all publicly available comments
  • Analytical tools for scraped data
    • Generate frequencies for words that are found in submission titles, bodies, and/or comments
    • Generate a wordcloud from scrape results

You can scrape Reddit with or without API credentials; however, I strongly advise taking some time to get your credentials in order to take advantage of the full suite of tools available within URS.

Here is a table describing which tools do or do not require API credentials:

Requires Credentials (PRAW) Does Not Require Credentials (Pushshift)
Scrape Subreddits Search for keywords in submissions
Scrape Redditors Search for keywords in comments
Scrape submission comments
Livestream Subreddits
Livestream Redditors
Livestream trending submissions within Subreddits

See the Getting Started section to get your API credentials.

Installation

NOTE: Requires Python 3.7+

git clone --depth=1 https://github.com/JosephLai241/URS.git
cd URS
pip3 install . -r requirements.txt

Troubleshooting

ModuleNotFoundError

You may run into an error that looks like this:

Traceback (most recent call last):
  File "/home/joseph/URS/urs/./Urs.py", line 30, in <module>
    from urs.utils.Logger import LogMain
ModuleNotFoundError: No module named 'urs'

This means you will need to add the URS directory to your PYTHONPATH. Here is a link that explains how to do so for each operating system.

Clone this wiki locally