Faculty Scraper

Faculty Scraper is a Python web scraping tool designed to extract data from a faculty directory website. It retrieves information such as faculty names, colleges, email addresses, subjects taught, and research topics. The scraped data is stored in a list of dictionaries and can be exported to a CSV file for further analysis. For those interested in a more in-depth understanding, I highly recommend reading my article: Medium Link. It covers the code implementation, step-by-step explanations, and the benefits of utilizing concurrent features for efficient data extraction.

Dependencies

The following Python packages are required to run the scraper:

bs4 (BeautifulSoup): Used for HTML parsing.
requests: Used for sending HTTP requests.
concurrent.futures: Used for concurrent execution of scraping tasks.
pandas: Used for data manipulation and CSV export.
re: Used for email address validation.
logging: Used for error handling and logging.

Usage

Import the FacultyScraper class from the faculty_scraper.FacultyScraper module.
```
from faculty_scraper.FacultyScraper import FacultyScraper
```
Create an instance of the FacultyScraper class with the URL of the faculty directory website.
```
url = "https://example.com/faculty-directory"
scraper = FacultyScraper(url)
```
Scrape the data from the faculty directory website.
```
data = scraper.scrape_data()
```
Dump the scraped data into a CSV file.
```
scraper.dump_to_csv("faculty_data.csv")
```
Retrieve the scraped data as a Pandas DataFrame.
```
df = scraper.return_df()
```

Contributing

Contributions are welcome! If you would like to contribute to Faculty Scraper, follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes in the branch.
Commit your changes with descriptive commit messages.
Push your branch to your forked repository on GitHub.
Open a pull request from your branch to the main repository.
Provide a clear and descriptive title for your pull request, along with a detailed description of the changes you have made.
Wait for the project maintainers to review your pull request. They may provide feedback or ask for additional changes.
Once your pull request is approved and merged, your changes will become a part of the project.

Please note that by contributing to this project, you agree to abide by the Code of Conduct.

License

This project is licensed under the MIT License.

Example

Here's an example that demonstrates the usage of the FacultyScraper class:

from faculty_scraper.FacultyScraper import FacultyScraper

url = "https://engineering.buffalo.edu/computer-science-engineering/people/faculty-directory/full-time.html"
scraper = FacultyScraper(url)
data = scraper.scrape_data()

scraper.dump_to_csv("Department of Computer Science and Engineering Faculty Data.csv")
df = scraper.return_df()

In this example, the FacultyScraper is initialized with the URL of the faculty directory website. The scrape_data() method is called to extract the faculty information, which is then dumped into a CSV file named "Department of Computer Science and Engineering Faculty Data.csv". The scraped data is also returned as a Pandas DataFrame for further analysis.

Note: The current implementation of the scraper is specifically designed for the URL: "https://engineering.buffalo.edu/computer-science-engineering/people/faculty-directory/full-time.html". If you want to scrape a different faculty directory website, you will need to modify the code accordingly referer the steps at Contributing.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
faculty_scraper		faculty_scraper
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CSE_Scraper.ipynb		CSE_Scraper.ipynb
Department of Computer Science and Engineering Faculty Data.csv		Department of Computer Science and Engineering Faculty Data.csv
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scraper.log		scraper.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

faculty_scraper

faculty_scraper

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CSE_Scraper.ipynb

CSE_Scraper.ipynb

Department of Computer Science and Engineering Faculty Data.csv

Department of Computer Science and Engineering Faculty Data.csv

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

scraper.log

scraper.log

Repository files navigation

Faculty Scraper

Dependencies

Usage

Contributing

License

Example

About

Releases 1

Packages

Languages

License

pChitral/University-at-Buffalo-Faculty-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

Faculty Scraper

Dependencies

Usage

Contributing

License

Example

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages