#

scraping

Here are 5,695 public repositories matching this topic...

scrapy

scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python

Updated May 12, 2024
Python

code4craft / webmagic

A scalable web crawler framework for Java.

java crawler framework scraping

Updated Apr 23, 2024
Java

gocolly / colly

Elegant Scraper and Crawler Framework for Golang

go golang crawler scraper framework spider scraping crawling

Updated Apr 19, 2024
Go

ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

testing chrome automation webdriver browser captcha scraping selenium navigator python3 cloudflare chromedriver anti-bot bot-detection cloudflare-bypass distil anti-detection

Updated Apr 30, 2024
Python

khuyentran1401 / Data-science

Collection of useful data science topics along with articles, videos, and code

python data-science machine-learning natural-language-processing time-series scraping data-visualization artificial-intelligence data-analysis articles

Updated Mar 21, 2024
Jupyter Notebook

psf / requests-html

Pythonic HTML Parsing for Humans™

python html http scraping requests kennethreitz beautifulsoup lxml css-selectors pyquery

Updated Apr 16, 2024
Python

lorien / awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

crawler spider scraping crawling web-scraping captcha-recaptcha webscraping crawling-framework scraping-framework captcha-bypass scraping-tool crawling-tool scraping-python crawling-python

Updated May 9, 2024
Makefile

NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

python search-engine crawler scraping search-engines search-engine-optimization

Updated Jul 3, 2021
HTML

tabulapdf / tabula

Tabula is a tool for liberating data tables trapped inside PDF files

pdf csv excel scraping tables

Updated Apr 10, 2024
CSS

autoscraper

alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

python crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomation

Updated Apr 30, 2024
Python

kevinzg / facebook-scraper

Scrape Facebook public pages without an API key

facebook scraping hacktoberfest facebook-scraper facebook-scraping

Updated Feb 22, 2024
Python

MorvanZhou / easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

crawler regex scraping crawling requests asyncio scrapy beautifulsoup distributed-scraper urllib

Updated Apr 7, 2024
Jupyter Notebook

aapatre / Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

python scraper scraping selenium python3

Updated May 10, 2024
Python

crawlee

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated May 12, 2024
TypeScript

VinciGit00 / Scrapegraph-ai

Python scraper based on AI

machine-learning scraping sc automated-scraper scraping-python gpt-3 gpt-4 llm scrapingweb llama3

Updated May 12, 2024
Python

fake-useragent / fake-useragent

Up-to-date simple useragent faker with real world database

python agent user-agent scraping fake faker python3 user useragent user-agent-spoofer useragent-scraper

Updated May 6, 2024
Python

sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.

ruby web scraping

Updated Apr 18, 2024
Ruby

AmmeySaini / Edu-Mail-Generator

Generate Free Edu Mail(s) within minutes

python mail scraping selenium python3 scraping-websites edu selenium-python student-mail edumail install-webdriver edu-account edu-generator auto-install-webdriver

Updated Dec 14, 2022
Python

yujiosaka / headless-chrome-crawler

Distributed crawler powered by Headless Chrome

jquery crawler chrome scraper promise scraping crawling chromium headless-chrome puppeteer

Updated Apr 29, 2023
JavaScript

querido-diario

okfn-brasil / querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

data-science machine-learning spider politics scraping artificial-intelligence open-data civic-tech hacktoberfest govtech governments-gazettes hacktoberfest2023

Updated May 11, 2024
Python

Improve this page

Add a description, image, and links to the scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scraping topic, visit your repo's landing page and select "manage topics."