A Python script to download news from a collection of news sources. Written with Scrapy.
Also included is a script to perform LDA on the downloaded articles.
Spiders Provided:
- Soya ( Soyacincau.com )
- paultan ( Paultan.org )
- foxnews ( Foxnews.com )
To Use:
scrapy crawl [ spider-name ] [ -o output-file ] [ -t output-file-type ]
-
Using the csv output from previous step, use:
simple_lda.py -i FILENAME [-s {none,porter,porter2}] [-ni NUM_ITER] [ -twc TOPWORDS_COUNT ]