Skip to content

motazsaad/comparableWikiCoprus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Comparable Wikipedia Coprus

Comparable Wikipedia Corpus (aligned documents)

Corpus extracts from 20-01-2017 Wikipedia dumps

License: CC BY-SA 4.0

This corpus is aligned by WikiDocsAligner

Lanugage pairs list (20-01-2017):

  • Arabic-Egyptian

In the future, other language pairs will be included

Corpus Information

Arabic Wikipedia Egyptian Wikipedia
documents 10,197 10,197
words 8,397,154 1,543,516
vocabulary 740,055 215,659

To cite this resource:

Motaz Saad and Basem Alijla (2017). WikiDocsAligner: an off-the-shelf Wikipedia Documents Alignment Tool. in The Second Palestinian International Conference on Information and Communication Technology (PICICT 2017).