Website to Markdown

Overview

This script is a simple web crawler that extracts content from a given website and converts it into Markdown format. It supports multi-threaded crawling to improve speed and allows customization of parameters such as maximum workers.

Features

Extracts the main content from web pages and converts it to Markdown.
Crawls internal links.
Uses multi-threading for faster execution.
Saves the extracted content to a Markdown file.

Installation

Ensure you have Python installed, then install the required dependencies:

pip install -r requirements.txt

Usage

Run the script with the following command:

python main.py <start_url> --output_file <filename> --max_workers <number>

Or with docker

docker run --rm -v ./:/tmp/out cuongnb14/web2md:1.0.1 <start_url> --output_file /tmp/out.md

Arguments:

<start_url>: The starting URL for crawling.
--output_file: The output file to store the extracted content (default: crawled_content.md).
--max_workers: The number of concurrent workers (default: 5).
--rendering: Use playwright to render page
--only-main: Only use playwright for first page

Example:

python main.py "https://example.com/docs" --output_file output.md --max_workers 10

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Website to Markdown

Overview

Features

Installation

Usage

Arguments:

Example:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

cuongnb14/web2md

Folders and files

Latest commit

History

Repository files navigation

Website to Markdown

Overview

Features

Installation

Usage

Arguments:

Example:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages