Email-Phone scraping

This project allows you to easily crawl through the websites' script to collect bulk of emails and phone numbers which are then dumped into a .csv file in an organized way.

The main concern of this 'Advanced' Email and phone scraping using python3 is to provide a platform where we can garner the data (emails and phone no:) in a neat and swift manner.

Applications:

Generally used by marketers to stockpile the data of several organizations.
Used in Business/ eCommerce: Market Analysis

Getting Started

These instructions will help you to deploy this project in your local systems for development and testing purposes. Given below are the steps to be followed systematically to build this project.

Pre-requisites

What are the things which are to be installed in your system?

This project is built using python version 3.7

Libraries to be installed ?

pip install regex (2020.7.14)
pip install google-search (1.0.2)
pip install requests (2.24.0)
pip install beautifulsoup4 (4.9.1)
pip install tld (0.12.2)

Deployment

Now you are good to go :)

Clone and download the zip file.
Extract the file into your required directory.
Erase the content in the .csv file and keep the header undisturbed.
Run the script

Execution

Enter the organization name along with the location if necessary. Ex: Deloitte Hyderabad
The link associated with it will be stored in the 'web_urls.txt'
Enjoy Harvesting Emails and Phone numbers :)

How does it Work?

Firstly, It generates a link for the input which is being provided. It does this using 'search' from the google-search library and stores the present and all the successive urls in the 'web_urls.txt'
Secondly, We now process each and every URL by requesting a HTTP response to the website.
We convert the entire page of that respective url into a html scripted text using bs4.
Now that we have extracted the entire content from the web page, we have to scrap all the emails and phone numbers present in the home page.
The scraping of the data is all done by regular expressions.
The regex code employed in this project is the one which is generalized, which detects and throws back mails along with phone no's from most of the websites. Nevertheless, for some it might not go well.
If the data is not detected in the home page of the website, It traces the contact page and starts collecting the data if present, as most of the websites' contact details reside in the contact-us webpage
Now we merge the home page data and contact page data into a single data structure.
Finally, We dump the entire stuff into a .csv file, so that the data is not in a dishevelled manner and is used for inspection.

Built with

Python 3.x - A Programming Language

Contributing

Open to contributions from the public.

Author

K Sai Chaitanya

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Website_scraping.py		Website_scraping.py
organization_info.csv		organization_info.csv
web_urls.txt		web_urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Website_scraping.py

Website_scraping.py

organization_info.csv

organization_info.csv

web_urls.txt

web_urls.txt

Repository files navigation

Email-Phone scraping

Applications:

Getting Started

Pre-requisites

Deployment

Execution

How does it Work?

Built with

Contributing

Author

About

Releases

Packages

Languages

KatrojuSaiChaitanya/Web-Scraping-using-Python

Folders and files

Latest commit

History

Repository files navigation

Email-Phone scraping

Applications:

Getting Started

Pre-requisites

Deployment

Execution

How does it Work?

Built with

Contributing

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages