Skip to content
jweese edited this page Jan 16, 2011 · 5 revisions

This page is a description of all currently-implemented features in thrax. For instructions on implementing your own, see feature function implementation. Entries on this page are in this form:

Feature name

Label: feature name as shown in output "name=value"

Name: which value to add to features key in thrax.conf to include this feature

Mathematical description.

Probability of source phrase given target phrase

Label: SourcePhraseGivenTarget

Name: e2fphrase

For a rule like ( X \to \langle \alpha ;, \beta \rangle ), let ( c(\cdot) ) be the number of times a particular phrase has been seen among all the extracted rules. Then we calculate ( p(\alpha | \beta) = \frac{c(\alpha,\beta)}{c(\beta)} ) and the value of this feature is ( - \log{ p(\alpha|\beta)} ).

Probability of target phrase given source phrase

Label: TargetPhraseGivenSource

Name: f2ephrase

Just as in SourcePhraseGivenTarget above, except the calculation is ( - \log{ \frac{c(\alpha,\beta)}{c(\alpha)}} ).

Lexical probability of source given target

Label: LexprobSourceGivenTarget,LexprobTargetGivenSource

Name: lexprob

Does the source side have adjacent nonterminal symbols?

Label: Adjacent

Name: adjacent

Is this rule purely lexical?

Label: Lexical

Name: lexical

Is this rule purely abstract?

Label: Abstract

Name: abstract

Does the rule contain an X nonterminal?

Label: ContainsX

Name: x-rule

Does the rule consume source terminal symbols without producing target output?

Label: SourceTerminalsButNoTarget

Name: source-terminals-without-target

Does the rule produce target output without consuming source terminals?

Label: TargetTerminalsButNoSource

Name: target-terminals-without-source

Is the rule monotonic, or is there reordering?

Label: Monotonic

Name: monotonic

Phrase penalty

Label: PhrasePenalty

Name: phrase-penalty

Number of terminals on target side

Label: TargetWords

Name: target-word-count

Rarity penalty

Label: RarityPenalty

Name: rarity

( = \exp( 1 - c(r)) )

Number of unaligned words in rule

Label: UnalignedSource, UnalignedTarget

Name: unaligned-count