A command-line toolkit to extract text content and category data from Wikipedia dump files
-
Updated
May 13, 2023 - Ruby
A command-line toolkit to extract text content and category data from Wikipedia dump files
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
Corpus creator for Chinese Wikipedia
A simple utility to index wikipedia dumps using Lucene.
Reading the data from OPIEC - an Open Information Extraction corpus
Extracting useful metadata from Wikipedia dumps in any language.
Research for master degree, operation projizz-I/O
Framework for the extraction of features from Wikipedia XML dumps.
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Python package for working with MediaWiki XML content dumps
Node.js module for parsing the content of wikipedia articles into javascript objects
Collects a multimodal dataset of Wikipedia articles and their images
Wikicompiler is a fully extensible python library that compile and evaluate text from Wikipedia dump. You can extract text, do text analysis or even evaluate the AST(Abstract Syntax Tree) yourself
Scripts to download the Wikipedia dumps (available at https://dumps.wikimedia.org/ )
Implemented a search engine on the wikipedia dump of size 73.4 GB. In order to retrieve result faster and relevant, indexing and ranking is implemented. Relevance ranking algorithm is implemented using TF-IDF score to rank documents. Creating index takes around 14 hr on a given wikipedia dump. Result is retrieved in less than 1 second.
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
Visualize/explore word2vec datasets with pygame
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."