Skip to content

qxf2/wisdomofreddit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Wisdom of reddit setup Code to get Wisdom of reddit up and running locally


  1. PYTHON SETUP

a. Install Python 2.x

b. Add to your PATH environment variable

c. If you do not have it already, get pip

d. 'pip install flask'

e. Install Whoosh

g. Install the csv library ('pip install csv')


  1. DATA SETUP

a. Sign up for bigquery

b. Run this query

SELECT link_id,id,score,body,name,created_utc,subreddit,parent_id,gilded FROM [fh-bigquery:reddit_comments.2015_01], [fh-bigquery:reddit_comments.2015_02], [fh-bigquery:reddit_comments.2015_03], [fh-bigquery:reddit_comments.2015_04], [fh-bigquery:reddit_comments.2015_05], [fh-bigquery:reddit_comments.2015_06], [fh-bigquery:reddit_comments.2015_07], [fh-bigquery:reddit_comments.2015_08], [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_10], [fh-bigquery:reddit_comments.2010], [fh-bigquery:reddit_comments.2011], [fh-bigquery:reddit_comments.2012], [fh-bigquery:reddit_comments.2013], [fh-bigquery:reddit_comments.2014], [fh-bigquery:reddit_comments.2007], [fh-bigquery:reddit_comments.2008], [fh-bigquery:reddit_comments.2009] where score>35 and (length(body) - length(replace(body,' ','')) + 1) > 150

NOTE: This query will process 450 GB when run.

c. Export the table to csv format. Since the table is big, you will have multiple csvs

d. Store the csvs in ./data/

d. Run python index_comments.py -d ./data -n wor -c True (this step takes hours)

e. NOTE: -c True should be used only if you want to create the index from scratch

f. If things go well, you should see a ./indexdir created with a bunch of wor_*.seg files


  1. RUN

a. python wisdomofreddit.py (this will run on port 6464 of your local host)

b. If things go well, you should see the Wisdom of reddit homepage and you should be able to search


  1. ISSUES?

a. Contact mak@qxf2.com

About

Code for wisdom of reddit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published