Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 24, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Article title, authors, date and body extraction dataset.
DeepSpam milter v2
An extremely configurable markdown reverser for Python3.
Golang HTML to plaintext conversion library
AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8
Go package that cleans a HTML page for better readability.
A very simple (but efficient) "HTML to plain text" converter ✍️
C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.
📝 Html2Text - Convert HTML to formatted plain text, e.g. for text mails.
A PHP package to convert HTML into a plain text format
a cli tool to fetch webpages main content and print it as markdown
Scraped Web using an automated python script that acted as scrapper to extract content from Wikipedia pages and created a clean dataset from it.
Receive Packt Publishing Ltd. Free Learning updates in Telegram every day
Add a description, image, and links to the html2text topic page so that developers can more easily learn about it.
To associate your repository with the html2text topic, visit your repo's landing page and select "manage topics."