Skip to content

larsjsol/wcb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Corpus Builder

Wikipedia Corpus Builder is a toolkit for creating clean (i.e. most content that usually are of little use for most NLP and IR tasks is removed) corpora from database snapshots of Mediawiki powered wikis.

It is currently being reworked in order to make it more usable for the public.

Documentation is emerging at http://moin.delph-in.net/WcbTop .

About

Wikipedia Corpus Builder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages