Lexicon-Restricted CTC Decoder -- Design Document

Matt Murphy

CSC499: Deep Learning OCR

General Problem, Approach

The goal of this project was to augment the standard CTC decoder, implemented in TensorFlow, to produce output, restricted by a pre-defined lexicon. Given such a restriction, nonsensical, or "almost-correct" decodes should be pruned from the network output.

Our approach to efficiently implement such a dictionary restriction relies on storing the lexicon within a trie, which is then referenced on each step within the CTC decoder's Beam Search. Beam Search is a path-based algorithm, which considers only the n most likely moves on each ply. In this case, each move represents performing a state expansion to a specific label, based on the label's logit scores.

In the case of our lexicon-augmented decoder, during each state expansion, upon considering moving to the label in question, we check to see if that label is a child of our current location within the trie. If the label is a child, then we know that performing a state expansion to that label would be expanding our frontier within a prefix of some dictionary word. This, such an expansion would be valid, as restricted by our dictionary. If the label in question does not have a corresponding child of our current location within the trie, then we do set a score for that expansion to be the 0 log probability (-inf) and that state is not added to our beam, thus ending our beam search in that path through the input logits.

Upon reaching the end of the beam search, we determine if our current location within the dictionary trie signifies the end of a lexicon word. If it is not, we set the score for the decode at that state to be the 0 log probability, so that decode path is not outputted by the CTC beam search decoder.

Concrete examples of interaction with trie

The beam search repeatedly interacts with the trie in three locations within the TrieBeamScorer: InitializeState, ExpandState, and ExpandStateEnd. In InitializeState, this scorer function is called by the beam search decoder with a beam search root state as a parameter. Accordingly, we set the trie node root to be equal to root of the dictionary trie. On each state expansion, if the to_label is different from the from-label and the to_label is not blank, then we search for the to_label within the children of our current trie node of our from_state to see if that new label continues a prefix from our dictionary. Subsequently, at the end of the beam search, the ExpandStateEnd function references the endWord flag of the current state's trie node to set that decode path's score appropriately. Each of these functions are called for any non-trivial decode, so the trie must be referenced within this scorer implementation.

So, if we have an input dictionary of

root
|
1
|\
1 3
| |
3 1

and an input pre-log-scored logit sequence of

{{{0, 0.4, 0, 0.6, 0}},   // timestep 0
 {{0, 0.4, 0, 0.4, 0.2}}, // timestep 1
 {{0, 1,   0, 0,   0}},   // timestep 2
 {{0, 0.6, 0, 0.4, 0}},   // timestep 3
 {{0, 0.5, 0, 0.5, 0}}}   // timestep 4

On timestep 0, while label 1 has a lower log-score than label 3, we must select label 1, since it is the only child of our root within the trie. Within ExapndState, given a to_label of 3, we would set the score to be log 0, and that state expansion would not be added to leaves_: the data structure that maintains the current frontiers of the beam search.

Suppose that at timestep 4, we have a current path of [1, 4, 1, 1], where 4 signifies a blank label. Our beam state must maintain a reference to the trie node with label 1 on the second level below the root of our trie. A to_label of 0, 1, or 2, would not be valid, as none of those labels are children of that node within the trie. Appropriately the scores of each of those 'potential state expansions' should be log 0, and none would be considered candidate path expansions to be added to leaves_. Only a state expansion to label 3 would be considered valid for timestep 4 and our new frontier state would maintain a reference to this node. Subsequently, in the call to ExpandStateEnd at after this last timestep, we check to see if this frontier node denotes the end of a word and see that it does and thus the decode [1, 4, 1, 1, 3] is considered as a potential output path. More cases similar to this are listed within ctc_trie_beam_search_test.h

Files Developed

This MAP involved the creating the additional files ctc_trie_node.h and ctc_vocabulary.h, as well as creating large additions to the files ctc_beam_entry.h and ctc_beam_scorer.h. For some small-scale testing purposes, a slight modification to ctc_beam_search.h was made. Tests were constructed within the files ctc_trie_node_test.h and ctc_trie_beam_seach_test.h.

Additionally, an operation implementing these CTC additions is defined by an operation, which serves as a header for the implementation, which is constructed within the kernel.

Moreover, to call the back-end C++ op from python code, I made an additional definition for this call within the python operation definition.

Lastly, I made a change to Professor Weinman's cnn_lstm_ctc_ocr repository, such that the lexicon-restricted decoder may be called with the flag --lexicon /path/to/lexicon. At the time of writing this report, this change lies within the branch update-to-tf-1.8.

Where they live

The core utilities for CTC live within the directory //tensorflow/core/util/ctc/.

The op definition lives within the file //tensorflow/core/ops/ctc_ops.cc and the kernel lives within the file //tensorflow/core/kernels/ctc_decoder_ops.cc.

Additionally, the python definition, which calls the back-end C++ op lives within the file //tensorflow/python/ops/ctc_ops.py.

Why they're there

The CTC logic for decoding and all the supporting classes and functions exist as utilities, since they simply used operation kernels, as sorts of 'helper functions'. Following the standard already in place from the base beam search decoder tests, I decided to include the tests for the additional logic within the same directory. Since these are simply additions that rely on the same dependencies as the built-in base beam-search decoder, I decided it was appropriate to include these additions within the core directory, rather than some contrib repository.

Originally, I wanted to place definition and kernel for ctc_beam_search_trie within a separate contrib repository, in order to adhere to the general TensorFlow contribution guidelines. However, after much headache trying to get everything to compile, I found in some research that two of the dependencies of the kernel implementation of the op, namely //tensorflow/core:lib and //tensorflow/core:framework are not able to be included within the build path of files within contrib.

So, I placed the new op definition and kernel within the core op files, as simple additions appended to the end of these libraries. Note that in doing this workaround, these additions are not currently following the standard contribution guidelines. If a pull request to the TensorFlow master branch is to be made, then this explanation should be referenced.

What they do

To implement the trie as a storage and navigation mechanism for the lexicon, I created the additional files ctc_trie_node.h and ctc_vocabulary.h. ctc_vocabulary.h provides helper functions

The core augmentation utilizing the trie is an additional beam scorer class within ctc_beam_scorer.h called TrieBeamScorer, which references the trie at each time-step within the path state expansion. If the character is not valid, as restricted by the trie, then the scorer assigns a log 0 score to that beam state. Therefore, the contributing score of that state expansion sets the score of the whole path to be log 0.

To maintain information of the trie at each state within the beam search, a new state structure, TrieBeamState, maintains a reference to a node within the trie, as well as a vector containing the labels which compose the trie navigation path up to that point. This 'prefix' vector is primarily for debugging purposes, as it is not used in beam search navigation decisions, but is useful for tracking intermediate progress within the search. Note, that the 'word' contained within this vector forms the post-merge-repeated sequence of labels, as the trie only contains complete words, without label repeats as would be generally present among adjacent timesteps within the logit input. All said, this new beam state within ctc_beam_entry.h serves as an interface between the beam search and a backing trie structure, generated from a lexicon before entering the beam search.

TensorFlow works by building relatively high level operations, or ops in the core back-end, and then invoking these operations through front-end APIs (in our case, in Python). So, if we want to create a way for someone building their model in python to use CTC decoding, we must create such an operation. Every operation, requires a definition within an 'op' file, as well as a 'kernel' which servers as the implementation of the function definition. An op definition, CTCBeamSearchDecoderTrie, is defined within ctc_ops.cc and serves as a function definition for this operation, and also does some input checking to ensure correct input tensor rank and dimension equality, as well as set up the output tensors. The kernel implementation, CTCBeamSearchDecoderTrieOp, within ctc_decoder_ops.cc, provides the necessary compute function, which sets up the dictionary trie and uses the CTC utilities to decode an input logit sequence. This is the core operation that is called by the Python API function in order to perform CTC decoding.

Lastly, from within the Python directory within TensorFlow, we implement a function that calls our backend C++ operation, and export this function as a module under nn so that the user may call this function in python code.

How to make things work

Currently, all this code lies within my personal TensorFlow fork, within the master branch. First, to get each of these files, simply clone a copy of this repository using the command:

git clone https://github.com/murphymatt/tensorflow.git

to your machine and build from there. Additionally, to run Professor Weinman's model which uses the augmented trie, clone the repository using the following command:

git clone https://github.com/weinman/cnn_lstm_ctc_ocr.git

TensorFlow also requires some third-party dependencies to run. Note, that on the mathlan, or some other machines, you may or may not have sudo access, so I like to use virtualenv to create a local, isolated environment which permits installing pip packages to without sudo access. Installing the dependencies should then proceed as follows:

virtualenv ~/tf_custom_env
source ~/tf_custom_env/bin/activate
pip install six numpy wheel

After that, simply build the custom TF package from within the repository clone and then proceed. Once the build process is completed, outside of the TensorFlow repository, install the custom package

Compiling

First, to build TensorFlow from source, you must install Bazel. System-specific Bazel installation instructions can be found here.

Subsequently, you must simply configure and then build with Bazel. From within the repository, call ./configure from the root of the repository clone. Some installation preferences may pop up, but all that we have really used for testing purposes this semester has been CUDA support. So in configuration, you can simply say 'yes' to that option, and 'no' to everything else. All default file paths may be used when prompted.

Subsequently, after configuration, installation of the entire package is simply

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

Depending on the machine you are using, this build process may take up to an hour or more, especially when building initially. Patience is a virtue.

Once the compilation is done, create the pip package with the following:

bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/tensorflow_pkg

This will create a pip package within a user-level directory that you can install with pip. After that is completed, outside of the directory, and with a custom virtualenv activated, simply call

pip install ~/tmp/tensorflow_pkg/tensorflow-X.X.X-cpPYTHONVERSION-SYSTEMSPEC.whl

substituting the appropriate variables with what exists inside of the directory. Now, the custom TensorFlow installation is usable any time you are inside this virtualenv. To make incremental upgrades, simply repeat the compilation process. The build_pip_package usually does not take too long, following small changes.

Updating tests

Before using the custom TF build on the mjsynth data, the user should first run the tests of the dictionary-restricted components. Running these tests is simply

bazel test --dbg //tensorflow/core/util/ctc:ctc_trie_node_test

bazel test --dbg //tensorflow/core/util/ctc:ctc_trie_beam_search_test

Any additional test cases regarding the trie structure should go within the first test file, but changes to the beam search scorer, or entry, which utilize the trie logic should go within ctc_trie_beam_search_test.cc.

Anything else to know

Currently, there is a small bug in which sometimes incomplete prefixes are emitted by the beam search decoder. This bug was observed when using the custom build over mjsynth data, on select words. For example, running the end-to-end model over 2210/3/72_FENCES_28542.jpg will output multiple 'FENCES' outputs, but will also output 'ENCE' as one of the lower-ranked top paths output. While 'ENCE' is a strict-prefix of a dictionary word, it is not a complete word and should not be emitted by the decoder. This is likely an issue within ExapandStateEnd within the TrieBeamScorer. This function should check to see if the input TrieBeamState's current node has the endWord flag set. If this flag is false, then the scorer should set the state expansion score to be log 0. Over my last week and a half of my work on this project, I tried various changes and debugging techniques, but

It is also worth noting that beam search extensions should be made within new beam scorers and beam entries, as opposed to adding to or modifying ctc_beam_search.h. However, I made two small changes to ctc_beam_search.h, in order to get everything running. Originally, in the TopPaths function, if the number of top paths stored within the leaves_ structure (the structure that holds candidate paths, updated on each timestep of the beam search) is less than the number n of paths to return, then the function returns an error and no top paths are returned. In my modified version, in order to get the small-scale test cases to pass, I set n equal to the minimum of n and the number of leaves stored. This way, the beam search then outputs at most n top paths.

Additionally, something I found that may be a bug within the beam search had to do with checking candidate values after performing the ExpandStateEnd function at the end of the beam search, before determining the top paths. In the base beam search, this function does not apply any changes to the beam state, but, in this modified search, this function should set the scores of incomplete words to log 0. Originally, the function does not check to see if the state is still a candidate after this function, but I put in a quick fix to check to see if it is still a candidate. I will be submitting a written up pull request to TensorFlow master soon.

Lastly, you can debug developments to the core framework using gdb. For example, if after the development of additional test cases in ctc_trie_beam_search_test.cc, some tests are failing, you can compile the test library with debugging flags as follows:

bazel build --config=dbg //tensorflow/core/util/ctc:ctc_trie_beam_search_test

and then run the test using

gdb bazel-bin/tensorflow/core/util/ctc/ctc_trie_beam_search_test

It is worth noting that I was only ever able to make this work when compiling on a machine without CUDA support. Additionally, if there are some issues from running the mjsynth Python code, after compiling the pip package with debugging flags, you can debug the python code using IPython as follows:

ipython
!ps | grep -i ipython

to get the process id of the IPython process, then attach this process with a debugger and then continue the process within the debugger, then run the python code by hopping back to IPython and calling

run example_script.py

This was very helpful for debugging issues that resulted when calling validate.py over the mjsynth dataset.

How might this tool change as TF changes?

If you are working on a branch of my fork or your own fork, you should regularly keep up to date with the master TensorFlow repository. To do this, add the master repo as an upstream repository as follows:

git remote add upstream https://github.com/tensorflow/tensorflow.git

You should fetch and merge regularly as follows:

git fetch upstream
git merge upstream/master

It is unlikely that there would be merge conflicts, but if they are, they may be resolved appropriately. If any CTC utility file would be most likely to obtain merge conflicts, it would likely be ctc_beam_search.h since this is the utility that should generally not change. It may also be possible that the kernel or op specifications may change, but if so, the format of the op or kernel should just stay consistent with the others within their respective files.

`Documentation`

TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture enables you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence Research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python API and C APIs as well as without API backwards compatibility guarantee like C++, Go, Java, JavaScript and Swift.

Keep up to date with release announcements and security updates by subscribing to announce@tensorflow.org.

Installation

See Installing TensorFlow for instructions on how to install our release binaries or how to build from source.

People who are a little more adventurous can also try our nightly binaries:

Nightly pip packages

We are pleased to announce that TensorFlow now offers nightly pip packages under the tf-nightly and tf-nightly-gpu project on pypi. Simply run pip install tf-nightly or pip install tf-nightly-gpu in a clean environment to install the nightly TensorFlow build. We support CPU and GPU packages on Linux, Mac, and Windows.

Try your first TensorFlow program

$ python

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> sess.run(hello)
'Hello, TensorFlow!'
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> sess.run(a + b)
42
>>> sess.close()

Learn more examples about how to do specific tasks in TensorFlow at the tutorials page of tensorflow.org.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs. So please see TensorFlow Discuss for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development:

Continuous build status

Official Builds

Build Type	Status	Artifacts
Linux CPU		pypi
Linux GPU		pypi
Linux XLA		TBA
MacOS		pypi
Windows CPU		pypi
Windows GPU		pypi
Android

Community Supported Builds

Build Type	Status	Artifacts
IBM s390x		TBA
IBM ppc64le CPU		TBA
IBM ppc64le GPU		TBA
Linux CPU with Intel® MKL-DNN Nightly		Nightly
Linux CPU with Intel® MKL-DNN Python 2.7 Linux CPU with Intel® MKL-DNN Python 3.5 Linux CPU with Intel® MKL-DNN Python 3.6		1.9.0 py2.7 1.9.0 py3.5 1.9.0 py3.6

For more information

Learn more about the TensorFlow community at the community page of tensorflow.org for a few ways to participate.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 38,549 Commits
tensorflow		tensorflow
third_party		third_party
tools		tools
.gitignore		.gitignore
ACKNOWLEDGMENTS		ACKNOWLEDGMENTS
ADOPTERS.md		ADOPTERS.md
AUTHORS		AUTHORS
BUILD		BUILD
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
WORKSPACE		WORKSPACE
arm_compiler.BUILD		arm_compiler.BUILD
configure		configure
configure.py		configure.py
models.BUILD		models.BUILD

License

murphymatt/tensorflow

Folders and files

Latest commit

History

Repository files navigation

Lexicon-Restricted CTC Decoder -- Design Document

Matt Murphy

CSC499: Deep Learning OCR

General Problem, Approach

Concrete examples of interaction with trie

Files Developed

Where they live

Why they're there

What they do

How to make things work

Compiling

Updating tests

Anything else to know

How might this tool change as TF changes?

Installation

Try your first TensorFlow program

Contribution guidelines

Continuous build status

Official Builds

Community Supported Builds

For more information

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages