🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, search and extract with a single API.
-
Updated
May 24, 2024 - TypeScript
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, search and extract with a single API.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
Script that fetches page content fromo URL and turns it into Markdown
🛏 An HTML to Markdown converter written in JavaScript
website scraper for text with conversion to markdown.md and directory structuring
A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.
Clients to use with the hosted spider service - spider.cloud
Transform your HTML into clean, easy-to-read markdown with html2md.
Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
A simple Swift package that converts HTML into Markdown
Table/List to Markdown - A simple GM userscript to extract tables and lists from any website and save them as Markdown.
reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.
Article title, authors, date and body extraction dataset.
Firefox add-on to copy selection as Markdown
Multiple Markdown web services hosted in a single desktop application.
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Converts HTML, Excel, CSV and more to markdown tables. Smart selection mode dynamically find cells.
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."