Skip to content

pitrack/monolign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

monolign

probably better named multilign, but too late now

Repository for the IJCNLP paper "Deriving Consensus for Multi-Parallel Corpora: an English Bible Study" [pdf][slides]

Execution

To run, run

python monolign.py DATA_DIR ALIGNER

where DATA_DIR consists of n texts where the ith line of each document are parallel. ALIGNER in this case is the location of fast_align, and aligner.py would need to be modified for other aligners.

Analysis

These scripts are mostly the same, and were one-off scripts used to generate figures for the paper. Use with caution.

Output + Resources

The output of the program will be in alignments.log for each iteration, though the best will be in the folder for the last iteration. Each line from the input file will correspond to three sections, an alignment matrix (like the ones in the paper but not sorted by index), a list of dependency arcs, and possible paraphrases/word pairs. These can be found here (92M) or a sample (45M).

About

Repository for the IJCNLP paper "Deriving Consensus for Multi-Parallel Corpora: an English Bible Study"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages