Skip to content

HiiYL/News-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News-Scraper

A Python script to download news from a collection of news sources. Written with Scrapy.

Also included is a script to perform LDA on the downloaded articles.

USAGE

Scraping:

Spiders Provided:

  1. Soya ( Soyacincau.com )
  2. paultan ( Paultan.org )
  3. foxnews ( Foxnews.com )

To Use:

scrapy crawl [ spider-name ] [ -o output-file ] [ -t output-file-type ]

Running LDA:
  1. Using the csv output from previous step, use:

    simple_lda.py -i FILENAME [-s {none,porter,porter2}] [-ni NUM_ITER] [ -twc TOPWORDS_COUNT ]

About

A Python script to download news from a collection of news sources. Written with Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published