Web_Crawler

A scalable, open-source webcrawler that writes website data to file while crawling each new webpage

Installation

Clone this repository:

$ git clone https://github.com/Boomslet/Web_Crawler

Usage

1. Install setup.py

$ python setup.py install

2. Run controller.py

%Run controller.py

3. Call crawl(*urls) with your desired URL(s):

>>> crawl('https://github.com/')

4. Crawl!

Successfully crawled https://github.com/
Successfully crawled https://github.com/#start-of-content
Successfully crawled https://github.com/features
Successfully crawled https://github.com/business
Successfully crawled https://github.com/pricing
Successfully crawled https://github.com/dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
Web_Crawler		Web_Crawler
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web_Crawler

Web_Crawler

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

Web_Crawler

Installation

Clone this repository:

Usage

1. Install setup.py

2. Run controller.py

3. Call crawl(*urls) with your desired URL(s):

4. Crawl!

About

Releases

Packages

Languages

License

Boomslet/Web_Crawler

Folders and files

Latest commit

History

Repository files navigation

Web_Crawler

Installation

Clone this repository:

Usage

1. Install setup.py

2. Run controller.py

3. Call crawl(*urls) with your desired URL(s):

4. Crawl!

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages