Skip to content

Abhishek-Deshmukh/WEC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WEC (Web crawling Email Collector)

Open Source Love MIT Licence made-with-python GitHub issues GitHub stars

A Web Crawling email collector

This python script goes through the web-page and looks for words provided in the list, and if those words are found stores the email addresses on the page into a database. It then repeats the same for all links on the page.

No, it does not run forever it only goes a few layers in.

Use cases

  • looking for professors who do research in a particular field
  • marketing
  • etc...

How to use

  • connect to the WIFI (You do not wanna do this on your mobile data).
  • clone or download the repository
  • install requirements (pip install -r requirements.txt).
  • go into the WEC folder cd WEC.
  • write down the list of all website too search through in a .txt file.
  • write down the list of all words too search for in a .txt file.
  • on the command line:
    • python3 main.py <websites file path> <words file path>
    • wait (can take a while)(recommended to leave over night)
  • when done it will have created a sqlite database which can be accessed using various tools e.g.: sqlite browser etc.

Requirements

List of institutes

list of institutes can be selected from nature index and many other places

Future prospects

  • write tests
  • make institute selection a part of the script
  • fine tune the algorithm as little
  • G.U.I.
  • M.L.

Note

Developers of the code are not responsible if you send an email you were not supposed to someone who was not supposed to get it. Therefore, it is recommended that you look at page form where the email has been picked before sending the email.

About

A Web Crawling email collector

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages