Skip to content
Ranganath Atreya edited this page Aug 10, 2019 · 10 revisions

Engine Design

The old Lipika engine was designed to fully materialize the cross-product of the rule and mapping tries. This made transliteration simple as it just had to walk this large trie. However, the cross-product trie consumed more memory for scripts such as Hindi and Gurmukhi which had a large number of rules. Given that Custom Keyboards in iOS have very limited memory quotas, this method resulted in too many crashes. The current engine is designed to burn more CPU in exchange for much lower memory footprint. This is accomplished by keeping the mapping and rule tries separate and walking them in tandem as inputs flow into the engine.

The goal is to keep memory usage well under 30MB

Canonical Example

There are many problems that arise by keeping the mapping and rule trie separate. Most of these can be captured using the following example that we will use throughout the design.

Mapping:

Type Key Scheme Script
CONSONANT KA k
CONSONANT TA t
CONSONANT RRA R
VOWEL VOCALIC RR RRI
DEPENDENT VOCALIC RR RRI
SIGN VIRAMA .h

Rule:

Input Output
{CONSONANT} [CONSONANT][SIGN/VIRAMA]
{CONSONANT}{DEPENDENT/A} [CONSONANT]
{CONSONANT}{DEPENDENT} [CONSONANT][DEPENDENT]

Input String:

Now we consider the input string ktRRI. We expected the following outputs as the inputs flow in:

Input Output
k क्
kt क्त्
ktR क्त्ऱ्
ktRR क्त्RR
ktRRI क्तॄ

High-level Design

The algorithm depends on the following properties of the TrieWalker:

  • TrieWalker greedily consumes inputs to produce output as deep in the trie as possible
  • All intermediate outputs of a traversal from the root to leaf of a trie share the same epoch
  • TrieWalker bumps epoch when it hits a dead-end and replays part of the inputs from the root
  • Output from an ongoing epoch of the TrieWalker is considered finalized when the epoch ends

The crux of the design rests on this question: at what point can the output of the Engine be finalized - meaning that it won't change any further?

All output up to but excluding the finalized mapping before the last finalized mapping output, both of which also finalized the rules trie can be finalized. So in the example below, when the output at the red arrow is produced, everything above the dotted line can be finalized.

This means that we have to replay all non-finalized events every time they change. The implementation is mainly concerned with tracking all events and figuring out the finalization point.

In the example below, we type kaktRRI. Mapping epoch 3 is the last finalized mapping output that also finalized the rules trie. Even though Mapping epoch 4 finalized the rules trie, it is itself not yet finalized.

Example Output

  • k adds क् - overall we have क्
  • a changes it to क - overall we have क
  • k adds क् - overall we have कक्
  • t adds त् - overall we have कक्त् and everything up to but excluding mapping epoch 2 is finalized
  • R adds ऱ् - overall we have कक्त्ऱ् and everything up to but excluding mapping epoch 3 is finalized
  • R changes it to RR - overall we have कक्त्RR because everything from mapping epoch 3 is replayed
  • I changes it to ॄ - overall we have कक्तृ because everything from mapping epoch 3 is replayed

Engine Output

Essentially, the engine has to produce the following output for the example above:

k: (input: "k", output: "क्", isPreviousFinal: false)
a: (input: "k", output: "क्", isPreviousFinal: false)
   (input: "ka", output: "क", isPreviousFinal: false)
k: (input: "k", output: "क्", isPreviousFinal: false)
   (input: "ka", output: "क", isPreviousFinal: false)
   (input: "kak", output: "कक्", isPreviousFinal: false)
t: (input: "k", output: "क्", isPreviousFinal: false)
   (input: "ka", output: "क", isPreviousFinal: false)
   (input: "k", output: "क्", isPreviousFinal: true)
   (input: "kt", output: "क्त्", isPreviousFinal: false)
R: (input: "k", output: "क्", isPreviousFinal: false)
   (input: "kt", output: "त्", isPreviousFinal: true)
   (input: "ktR", output: "त्ऱ्", isPreviousFinal: false)
R: (input: "kt", output: "त्", isPreviousFinal: false)
   (input: "tRR", output: "त्RR", isPreviousFinal: false)
I: (input: "t", output: "त्", isPreviousFinal: false)
   (input: "tRRI", output: "तृ", isPreviousFinal: false)

Informal Reasoning

Ideally, one should be able to finalize output of a Rules epoch immediately after the epoch has passed. However, there are cases such as RRI in ITRANS which produce intermediate outputs that can spuriously bump the epoch of the Rules TrieWalker and hence need to be eventually backed out. Since only one output per Mapping epoch is executed on the Rules TrieWalker, it is clear that no matter how many intermediate outputs the Mapping TrieWalker produces, it can only spuriously bump the Rules TrieWalker epoch by one. Hence, if we finalize all output beyond the last but one Rules epoch and replay all non-finalized events on new input, then there will be no need to back out of spurious epochs.

Clone this wiki locally