Crawlers for various websites mostly news providers

Basic idea, fetch content of a web-page and examine

the text present, extracting matching keywords/text

eg by file extension name or domain.

Once links are extracted, if files, they are

downloaded, or queued up on the cloud for workers to

actually perform the downloads.

To use the local based downloader:

++ Works on any version of Python >= 2.X

python fileDownloader.py
To use the cloud based job queuer:

++ So far built for Python3.X

python3 targetForCloud.py

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
classifier		classifier
resty @ b0769d8		resty @ b0769d8
routing @ 277248c		routing @ 277248c
solos		solos
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
RobotParser.py		RobotParser.py
acmDl.py		acmDl.py
fileDownloader.py		fileDownloader.py
oxy		oxy
routeUtils.py		routeUtils.py
shardy.py		shardy.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier

classifier

resty @ b0769d8

resty @ b0769d8

routing @ 277248c

routing @ 277248c

solos

solos

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

RobotParser.py

RobotParser.py

acmDl.py

acmDl.py

fileDownloader.py

fileDownloader.py

oxy

oxy

routeUtils.py

routeUtils.py

shardy.py

shardy.py

utils.py

utils.py

Repository files navigation

Crawlers for various websites mostly news providers

About

Releases

Packages

Languages

odeke-em/crawlers

Folders and files

Latest commit

History

Repository files navigation

Crawlers for various websites mostly news providers

About

Resources

Stars

Watchers

Forks

Languages