Skip to content

brinasab/web_scraping

Repository files navigation

web_scraping

Introduction to web scraping with requests and BeautifulSoup.

This workshop is designed for beginners to intermediate programmers who have no or little web-scraping experience. It focuses on learning how to use Python to scrape online data that are accessible without using APIs. Some familiarity with Python is preferred but not required.

Participants will learn how to read HTML pages and the Python libraries "requests" and "BeautifulSoup" to scrape online data. We will cover some of the most common challenges (and solutions) encountered in static web-scraping.

By the end of this workshop, you will be able to...

  • Read an HTML page and evaluate it
  • Use the library “requests” to interact with websites
  • Use the library “BeautifulSoup” to parse and get data from websites
  • Understand some of the key tasks in static web scraping (missing data, errors, turn pages)
  • Conceptualize web scraping as a process that goes from the website to the cleaned data

In the first part of the workshop, we will learn the basics of the libraries requests and BeautifulSoup. Then, we will use these libraries by scraping two websites: a University of Arizona website, and the IMDb movie review website.

Click on Binder to launch and play with the workshop dynamically on MyBinder (no installation required, it should take between 30 and 60 seconds to build), or download the Jupyter notebook contained in this repository.

This workshop is licensed under CC-BY-4.0 by Sabrina Nardin

About

Introduction to web scraping with requests and BeautifulSoup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published