Name		Name	Last commit message	Last commit date
parent directory ..
website		website
README.md		README.md
activity.py		activity.py
docker-compose.yml		docker-compose.yml
slides.pdf		slides.pdf
test.py		test.py
thumbnail.png		thumbnail.png

README.md

Lesson 1 - Introduction To Forging API Requests

This lesson is designed to teach you how data is sent between websites and servers and how we can exploit this to extract data.

Supporting The Project

Star the repo 😎
- Maybe share it with some people new to web-scraping?
Consider sponsoring me on GitHub
Send me an email or a LinkedIn message telling me what you enjoy in the course (and maybe what else you want to see in the future)
Submit PRs for suggestions/issues :)

Learning Objectives

Learners will understand how data is sent between a client and a server.
Learners will forge API requests to a mock website.

Lesson Video

Watch Here

Video Corrections

None so far

How Do Websites Get Data?

Watch this section on YouTube and/or pull up the slides

Popular Ways Websites Get Data

Server Side Rendering (SSR)
- Data is sent as part of the HTML response to the requester
- Each request for new data usually requires a page reload
AJAX
- Takes a client (ex: web browser) and server approach
- When the client needs new data it requests it from the server
- This allows the client to update the data on the page without refreshing the page itself
  - Leads to a more fluid and responsive user experience
- This type is the focus of this lesson

Visualizations of how the data flows available in the video and slides

How Do We Exploit This?

If we're able to emulate the requests that a legitimate client makes then we can extract data from the server without ever interacting with the client itself. This technique is generally referred to as forging requests.

Advantages
- These APIs can be easier to scrape at scale than trying to do it through a client
- They may contain extra information you can't see in the HTML itself
  - Similar to Missouri accidentally exposing their teachers SSNs The Verge
- Less data returned means quicker requests (and less data transfer fees)
  - Excess HTML, CSS, etc isn't usually returned from the server, just pure data
Disadvantages
- Some websites frequently update their APIs
  - Extra work has to be done to keep up with these changes compared to just scraping HTML
  - Might change endpoints, the schema of the data returned, etc
- Can be hard to emulate human behavior to avoid captchas and other blocking mechanisms
- Can be difficult to figure out how the website is generating user sessions and other security parameters to prevent web scraping

Activity

In this activity you'll be looking at a mock website and writing a python script to extract data from it. To get started you should run docker-compose up in this directory. If you don't know what docker is or are new to it check out the docker section of the readme

Brief Description

Our goal is to extract as much data as possible from the website by looking at the network inspector tab of the browser when visiting the mock website. We want to make the same requests that the website (client) makes to the server.

Open activity.py, you will be modifying the existing function to do what the comments tell you to do. I recommend using the requests package, although feel free to use whatever you want.

Do not change the method names, however feel free to call those methods if you want to test them out in the if __name__ == "__main__" section.

Testing

To check if your implementation is correct run python test.py this will import the functions you made. It will tell you what tests failed if any, and will show a success message if all tests passed.

Solutions

You can find the solutions in the video, or use the timestamps here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

001-introduction-to-forging-api-requests

001-introduction-to-forging-api-requests

website

website

README.md

README.md

activity.py

activity.py

docker-compose.yml

docker-compose.yml

slides.pdf

slides.pdf

test.py

test.py

thumbnail.png

thumbnail.png

README.md

Lesson 1 - Introduction To Forging API Requests

Supporting The Project

Learning Objectives

Table of Contents

Lesson Video

Video Corrections

How Do Websites Get Data?

Popular Ways Websites Get Data

How Do We Exploit This?

Activity

Brief Description

Testing

Solutions

Files

001-introduction-to-forging-api-requests

Directory actions

More options

Directory actions

More options

Latest commit

History

001-introduction-to-forging-api-requests

Folders and files

parent directory

Lesson 1 - Introduction To Forging API Requests

Supporting The Project

Learning Objectives

Table of Contents

Lesson Video

Video Corrections

How Do Websites Get Data?

Popular Ways Websites Get Data

How Do We Exploit This?

Activity

Brief Description

Testing

Solutions