Data Scraping with Python

Data extraction methods

There are 3 main extraction methods with Python:

HTML parsing with Requests and Beautiful Soup Python libraries.
Rendering JavaScript with Scrapy (web-crawling Python framework), Selenium (automation testing tool).
API based scraping using API software (Insomnia, Postman etc.) and Python.

My four mini projects

I needed some small-scale data from LinkedIn, but didn’t want to get blocked. So, decided to start parsing locally and used Beautiful Soup to extract data from offline HTML and store that data in CSV file.

With a second project I wanted to upscale a bit and scraped 3173 vacancies with 66k records from job listings website using API. On website used Chrome developer tools to check requests and responses and found that site uses API. With Insomnia software examined API calls, modified them and generated initial Python code. Then updated the code with delayed loops for limited multiple requests, used Pandas to normalize scraped JSON data and store it in CSV file.

For a third project I scraped product prices from three different sites and stored the data in a data base. Used Requests, Beautiful Soup and SQLite Python libraries. Wrote single function to extract name, price, date and place all the data in to a list, then used variables from three different websites to run the function. Lastly created an SQLite data base and loaded all the data from nested lists to the data base.

For forth project I coded a solution for pagination. Used loop and URLs with variables in them.

Future projects

For next project it would be interesting to extract data by rendering JavaScript. Also, I would like to practice using headless browsers for browser fingerprinting, rotating proxies for IP-rate limiting, residential IPs instead of data center IPs, forge and rotate TLS fingerprints, CAPTCHA-solving tools and services.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
01_Clipboard_scraping.py		01_Clipboard_scraping.py
02_API_scraping.py		02_API_scraping.py
03_Prices_scraping.py		03_Prices_scraping.py
04_Pagination.py		04_Pagination.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_Clipboard_scraping.py

01_Clipboard_scraping.py

02_API_scraping.py

02_API_scraping.py

03_Prices_scraping.py

03_Prices_scraping.py

04_Pagination.py

04_Pagination.py

README.md

README.md

Repository files navigation

Data Scraping with Python

Data extraction methods

My four mini projects

Future projects

About

Releases

Packages

Languages

Geduifx/Data-Scraping-with-Python

Folders and files

Latest commit

History

Repository files navigation

Data Scraping with Python

Data extraction methods

My four mini projects

Future projects

About

Topics

Resources

Stars

Watchers

Forks

Languages