Skip to content

tlu-dt-nlp/M2-preprocessing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Preprocessing scripts for the EstGEC-L2 corpus

Scripts for preprocessing the Estonian L2 Grammatical Error Correction Corpus (EstGEC-L2) that contains L2 learner texts error-annotated in the M2 format.

  • convert_conll_to_m2 – used for converting the previous CoNLL-U format error annotation to the M2 format and updating the error tags
  • check_m2_annotation – used for validating manual annotation to detect possible format errors
  • insert_noop_lines – used for adding the 'noop' annotation to sentences that were not corrected by any of the annotators (the latest version of the converter also adds 'noop' annotations)

About

Scripts used for the preprocessing of the EstGEC-L2 corpus that contains Estonian L2 learner texts error-annotated in the M2 format.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%