Skip to content

cttsai/illinois-cross-lingual-wikifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Illinois Cross-Lingual Wikifier

Given a piece of text in any language, a cross-lingual wikifier identifies mentions of named entities and grounds them to the corresponding entries in the English Wikipedia. This project implements the approaches proposed in the following two papers:

This demo will give you some intuition about this project. The demo is presented in COLING 2016 (the paper and poster)

Setup

For CogComp members, resources for more than 40 languages are on our servers. The paths are specified in config/xlwikifier-demo.config. You only need to do the following soft link under the root of this project:

ln -s /shared/preprocessed/ctsai12/multilingual/xlwikifier-data xlwikifier-data

If you cannot access CogComp servers, we currently only release the resources for these three languages. Download this file which contains MapDB indices of FreeBase dump and English, Spanish, and Chinese Wikipedia. Follow the README inside to extract the files and set the corresponding paths in the config file.

Run Benchmark

mvn dependency:copy-dependencies
mvn compile
./scripts/run-benchmark.sh es config/xlwikifier-tac.config

This script runs and evaluates on the TAC-KBP 2016 EDL shared task (en: English, es: Spanish, zh: Chinese). You need to specify the paths to the evaluation documents and the gold annotations in the config file. Please check config/xlwikifier-tac.config for example. These documents are in the original format provided by LDC. Using the official evaluation script, this package gets the following performance on named entities:

English
strong mention match:       Precision:93.4 Recall:83.7 F1:88.3
strong typed mention match: Precision:90.3 Recall:80.9 F1:85.4
strong typed all match:     Precision:80.9 Recall:72.6 F1:76.5

Spanish 
strong mention match:       Precision:88.4 Recall:81.8 F1:85.0
strong typed mention match: Precision:85.7 Recall:79.3 F1:82.3
strong typed all match:     Precision:78.1 Recall:72.3 F1:75.1

Chinese
strong mention match:       Precision:87.0 Recall:72.8 F1:79.3
strong typed mention match: Precision:83.2 Recall:69.6 F1:75.8
strong typed all match:     Precision:77.5 Recall:64.9 F1:70.6

Contact

Chen-Tse Tsai (ctsai12@illinois.edu)