Skip to content

demon386/dmcrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

dmcrawler (Douban Movie Crawler)

Author: Muchenxuan Tong demon386@gmail.com

A Clojure-based crawler for fetching short comments from Douban movie page. (e.g. http://movie.douban.com/subject/11529526/comments?sort=time)

Features:

  • Store context when problems happened or the program shutted down.
  • Parallelized crawling on a single machine.
  • Data are stored in the database, with the storage time attached.

Usage

  • Set up the DB according to the comments in src/dbcrawler/db.clj.
  • Set up the variable starting-movie-id in src/dbcrawler/config.clj. (Optional. You can use the default one.)
  • With leiningen2 installed, run the program with lein run.

License

Copyright © 2013 Muchenxuan Tong

Distributed under the Eclipse Public License.

About

A Clojure-based crawler for fetching short comments from Douban movie page.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published