Skip to content

tad-lispy/node-simplecrawler-queue-mongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NPM version

Mongo powered queue for SimpleCrawler

⚠️ I've made this project to scratch my own itch. I don't actively use it anymore and therefor I don't maintain it either. I'm happy if it works for you, but I don't promise any support. If you want to become a maintainer, please open an issue.

Good luck :)

Install

npm install simplecrawler-queue-mongo

Use

Crawler  = require "simplecrawler"
Queue    = require "./simplecrawler-queue-mongo"
mongoose = require "mongoose"

mongoose.connect "localhost/test"

crawler       = Crawler.crawl "http://radzimy.co/"
crawler.name  = 'radzimy-co' # You don't need this if you only run one crawler.
crawler.queue = new Queue mongoose.connections[0], crawler

which compiles to:

var Crawler, Queue, crawler, mongoose;

Crawler   = require("simplecrawler");
Queue     = require("./simplecrawler-queue-mongo");
mongoose  = require("mongoose");

mongoose.connect("localhost/test");

crawler       = Crawler.crawl("http://radzimy.co/");
crawler.name  = 'radzimy-co';
crawler.queue = new Queue(
  mongoose.connections[0],
  crawler
);

Notes

ATM it relies on Mongoose connection that application provide. In the future I'd like to decouple it, so that application could provide native MongoDB connection or connection string.

If you want to use multiple crawlers with one database (eg. for crawling multiple domains) set unique name property on each crawler (like in the example). It will be used to distinguish queues in a collection.

Contributing

Much welcome :)

Licence (GPL 3)

Copyright (C) 2014 Tadeusz Łazurski

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

About

MongoDB powered queue for Node Simple Crawler

Resources

Stars

Watchers

Forks

Packages

No packages published