Web Crawler

Web crawler made in python.

How it works

After you define the starting URL this script will use the urllib and re libraries to find more URLs on that page's source code while also looking for a text that contains the word or phrase you've entered.

How to use it

(Optional) Set proxies: Edit the list proxies (line 18) and add your own proxies, example:

proxies = [ # Place your proxy list here with port, example: exampleproxy.com:8080
    {'http': 'http://exampleproxy.com:80'},
    {'http': 'http://0.0.0.0:3127'}
  ]

Run the Python script and enter type in the following info: Number of pages you want to search into The address where the crawler should start looking from The word or text you're looking for

You can also initialize thie script with the following parameters and jump right into execution: python Core.py "Url with http or https here" "Exact word or phrase you want to search for" "Max number of URLs you want to process" Example:

python Core.py "https://www.stackoverflow.com" "You" "50"

Additional info and recommendations

It will write any matches to matches.txt, errors will be written to errorLog.txt. The script deletes these files each time you run it, so be sure to save the results before running it again.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
Core.py		Core.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Core.py

Core.py

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Web Crawler

How it works

How to use it

Additional info and recommendations

About

Releases

Packages

Languages

License

arthurgeron/webCrawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

How it works

How to use it

Additional info and recommendations

About

Resources

License

Stars

Watchers

Forks

Languages