web_scraping

Introduction to web scraping with requests and BeautifulSoup.

This workshop is designed for beginners to intermediate programmers who have no or little web-scraping experience. It focuses on learning how to use Python to scrape online data that are accessible without using APIs. Some familiarity with Python is preferred but not required.

Participants will learn how to read HTML pages and the Python libraries "requests" and "BeautifulSoup" to scrape online data. We will cover some of the most common challenges (and solutions) encountered in static web-scraping.

By the end of this workshop, you will be able to...

Read an HTML page and evaluate it
Use the library “requests” to interact with websites
Use the library “BeautifulSoup” to parse and get data from websites
Understand some of the key tasks in static web scraping (missing data, errors, turn pages)
Conceptualize web scraping as a process that goes from the website to the cleaned data

In the first part of the workshop, we will learn the basics of the libraries requests and BeautifulSoup. Then, we will use these libraries by scraping two websites: a University of Arizona website, and the IMDb movie review website.

Click on to launch and play with the workshop dynamically on MyBinder (no installation required, it should take between 30 and 60 seconds to build), or download the Jupyter notebook contained in this repository.

This workshop is licensed under CC-BY-4.0 by Sabrina Nardin

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitattributes		.gitattributes
README.md		README.md
movie_inspect.png		movie_inspect.png
names_tag.png		names_tag.png
requirements.txt		requirements.txt
webscraping.png		webscraping.png
webscraping_workshop.ipynb		webscraping_workshop.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

.gitattributes

.gitattributes

README.md

README.md

movie_inspect.png

movie_inspect.png

names_tag.png

names_tag.png

requirements.txt

requirements.txt

webscraping.png

webscraping.png

webscraping_workshop.ipynb

webscraping_workshop.ipynb

Repository files navigation

web_scraping

About

Releases

Packages

Languages

brinasab/web_scraping

Folders and files

Latest commit

History

Repository files navigation

web_scraping

About

Resources

Stars

Watchers

Forks

Languages