Skip to content

Rust program for tasks involving the English Wiktionary dump

License

Notifications You must be signed in to change notification settings

Erutuon/enwikt-dump-rs

Repository files navigation

enwikt-dump-rs

This program generates data from the pages-articles.xml file from the English Wiktionary dump.

Subcommands

add-template-redirects

Looks up the redirects to a set of templates in a file and generates a new file containing the redirects, suitable for the dump-parsed-templates or dump-templates subcommands.

all-headers

Counts how many times each header appears at each header level and outputs JSON.

dump-parsed-templates

Generates dumps of parsed templates containing CBOR-encoded objects with the title of a page and all the instances of a given template (with the template name, parsed parameters, and the template wikitext) found on that page. This makes it faster to search template instances with a script.

dump-templates

Dumps template instances in an ad-hoc format.

filter-headers

Gathers the titles of all pages that contain certain headers and outputs JSON.

Installation

Download the repository, ensure you have cargo installed, cd to the directory, and do cargo build --release.

About

Rust program for tasks involving the English Wiktionary dump

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published