Skip to content
Andrew Krizhanovsky edited this page Jan 20, 2020 · 32 revisions

Welcome to Wikokit - the open-source Wiktionary parser.

This wiki is the main source of documentation for developers working with (or contributing to) the Wikokit project.

Quick navigation

Setup

Getting started Wiktionary parser - how to convert the database of Wiktionary into the machine-readable format (parsed Wiktionary)

MySQL import - how to import Wiktionary database into local MySQL database.

File wikt_parsed_empty_sql - how to create, edit and load empty Wiktionary parsed database into MySQL (./wikt_parser/doc).

Setup NetBeans for parsing - setup NetBeans environment for parsing, run parser.

Image.py postprocessing - get URLs of Wiktionary scaled images (Wikimedia thumbs) and write them to the local MySQL database.

Advanced setup

MySQL Workbench - how to create the empty SQL-file for the Wiktionary parsed database

SQLite - how to convert the Wiktionary parsed database (MySQL) into SQLite-file

Database

Encoding - how to correctly setup database encoding, about character encoding.

[Index wordlist, index_native](Index wordlist index_native) - index wordlist for each language (tables index_native, index_de, index_fr, etc.)

Queries

SQL examples - how to extract some information from the parsed Wiktionary database. SQL query examples.

MRDQuote - quote table (and tables related to quotations) in machine-readable dictionary. SPARQL and SQL queries to work with quotes.

d2rqMappingSPARQL - how to map Wiktionary parsed database (MySQL) to RDF database by D2RQ.

Developer

One more Wiktionary - how to parse one more Wiktionary language edition.

JUnit - unit test requirements.

Todo list - list of improvements and modifications to be done.

Done

Context labels workplan - coordination of work devoted to an extraction of Context Labels from English Wiktionary and Russian Wiktionary (in Russian)

Links

  • New Wiktionary parsed databases from this page.

File mean_semrel_empty_sql - how to create, edit and load empty wikt_sem_rel Wiktionary parsed database with meaning and semantic relations into MySQL (wikt_parser/doc/parsed/mean_semrel/).