Skip to content
jweese edited this page Jan 16, 2011 · 1 revision

The RuleWritable is probably the most important datatype in Thrax. It is a representation of an SCFG rule. It has these fields:

  • lhs, a Text (a Hadoop datatype for fast-comparison Strings) representing the left hand side nonterminal of the rule.
  • source, a Text representation of the source side of the rule
  • target, a Text of the target side of the rule
  • e2f and f2e, two AlignmentArrays giving the target-to-source alignments and source-to-target alignments, respectively
  • features, a MapWritable.

Here are some notes:

The AlignmentArray is a two-dimensional array of Text. It has a length equal to the number of terminal symbols on a given side, and the first item of each array is that terminal symbol. The remaining items are the terminals it has been aligned to, or "/UNALIGNED/" if the word is unaligned. For example, let's say we have a rule

[X] ||| foo [X] bar baz ||| a b [X] c |||

where foo is aligned to a and b, baz is aligned to c and bar is unaligned. Then the AlignmentArrays would look like this:

e2f: [ a | foo ] [ b | foo ] [ c | baz]

f2e: [ foo | a | b ] [ bar | /UNALIGNED/ ] [ baz | c ]

features is a MapWritable. This is what you will want to modify to add new feature values to a rule. Once you calculate a feature value, you can simply put it into the map. Easy.