Skip to content
jweese edited this page Jan 18, 2012 · 9 revisions

Welcome to the thrax wiki!

Thrax is a grammar extractor for machine translation. It takes word-aligned parallel sentences like this

猫在桌子上。||| the cat is on the table . ||| 0-0 0-1 1-2 1-3 2-4 2-5 3-4 3-5 4-2 4-3 5-6

and turns them into context-free translation rules like this

  • [X] ||| 猫 ||| the cat
  • [X] ||| 桌子 ||| the table
  • [X] ||| 。 ||| .
  • [X] ||| 在 [X] 上 ||| is on [X]

These rules encode the ideas that the respective Chinese phrases are translations of “the cat” and “the table”, and that “is on X” can be written in Chinese as “在 X’ 上”, as long as X’ is a Chinese translation of X.