From 4b5ebb04f8fcfed33b4b64a1e9361d5ed5cbc78d Mon Sep 17 00:00:00 2001 From: Henry Wilkinson Date: Wed, 20 Mar 2024 12:34:29 -0400 Subject: [PATCH 1/3] Fixes docs edit link --- docs/mkdocs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 709bb781f..de2fe9436 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -1,7 +1,7 @@ site_name: Browsertrix Crawler Docs repo_url: https://github.com/webrecorder/browsertrix-crawler/ repo_name: Browsertrix Crawler -edit_uri: edit/main/docs/ +edit_uri: edit/main/docs/docs/ extra_css: - stylesheets/extra.css theme: From 0d26cf2619e17fed9d3ae498b904bb44d1583061 Mon Sep 17 00:00:00 2001 From: Henry Wilkinson Date: Wed, 20 Mar 2024 12:41:29 -0400 Subject: [PATCH 2/3] =?UTF-8?q?Adds=20note=20about=20where=20to=20find=20B?= =?UTF-8?q?rowsertrix=20=E2=80=94=20the=20cloud=20service?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/docs/index.md | 3 +-- docs/docs/user-guide/index.md | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index fef21c649..5a0b1b0e3 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -10,15 +10,14 @@ Welcome to the Browsertrix Crawler official documentation. Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses [Puppeteer](https://github.com/puppeteer/puppeteer) to control one or more [Brave Browser](https://brave.com/) browser windows in parallel. Data is captured through the [Chrome Devtools Protocol (CDP)](https://chromedevtools.github.io/devtools-protocol/) in the browser. +Browsertrix Crawler is a command line application responsible for the core features of [Browsertrix](https://browsertrix.com), Webrecorder's cloud-based web archiving service. You can find the documentation for Browsertrix — the cloud platform — [here](https://docs.browsertrix.cloud)! !!! note This documentation applies to Browsertrix Crawler versions 1.0.0 and above. Documentation for earlier versions of the crawler is available in the [Browsertrix Crawler Github repository](https://github.com/webrecorder/browsertrix-crawler)'s README file in older commits. - ## Features - - Single-container, browser based crawling with a headless/headful browser running pages in multiple windows. - Support for custom browser behaviors, using [Browsertrix Behaviors](https://github.com/webrecorder/browsertrix-behaviors) including autoscroll, video autoplay, and site-specific behaviors. - YAML-based configuration, passed via file or via stdin. diff --git a/docs/docs/user-guide/index.md b/docs/docs/user-guide/index.md index ab5436972..4f4efca9a 100644 --- a/docs/docs/user-guide/index.md +++ b/docs/docs/user-guide/index.md @@ -1,6 +1,6 @@ # Browsertrix Crawler User Guide -Welcome to the Browsertrix User Guide. This page covers the basics of using Browsertrix Crawler, Webrecorder's browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. +Welcome to the Browsertrix Crawler User Guide. This page covers the basics of using Browsertrix Crawler, Webrecorder's browser-based high-fidelity crawling system, designed to run a complex, customizable, browser-based crawl in a single Docker container. ## Getting Started From 3ec9d1b9e847d49e43bd4b2e096758ecd46a8d95 Mon Sep 17 00:00:00 2001 From: Henry Wilkinson Date: Wed, 20 Mar 2024 13:03:16 -0400 Subject: [PATCH 3/3] Update docs/docs/index.md Co-authored-by: Tessa Walsh --- docs/docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index 5a0b1b0e3..71bbe538a 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -10,7 +10,7 @@ Welcome to the Browsertrix Crawler official documentation. Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses [Puppeteer](https://github.com/puppeteer/puppeteer) to control one or more [Brave Browser](https://brave.com/) browser windows in parallel. Data is captured through the [Chrome Devtools Protocol (CDP)](https://chromedevtools.github.io/devtools-protocol/) in the browser. -Browsertrix Crawler is a command line application responsible for the core features of [Browsertrix](https://browsertrix.com), Webrecorder's cloud-based web archiving service. You can find the documentation for Browsertrix — the cloud platform — [here](https://docs.browsertrix.cloud)! +Browsertrix Crawler is a command line application responsible for the core features of [Browsertrix](https://browsertrix.com), Webrecorder's cloud-based web archiving service. See the [Browsertrix documentation] for more information about Browsertrix, the cloud platform. !!! note