Skip to content

MNIST digits stroke sequence data

Edwin D. de Jong edited this page Sep 15, 2016 · 23 revisions

The MNIST handwritten digit images transformed into a data set for sequence learning. This data set contains pen stroke sequences based on the original MNIST images.

Download

Sequences: sequences.tar.gz

Thinned images: digit-images-thinned.tar.gz

Examples

Input image (MNIST training image no. 25): Input image Thresholded: Thinned:

Sequence(positive y direction is downward):

Input image (MNIST training image no. 12): Input image Thresholded: Thinned: Sequence:

Input image (MNIST training image no. 2): Input image Thresholded: Thinned: Sequence:

The data set aims to provide a consistent set of sequences, meaning that similar images should result in similar sequences. The choices made by the algorithm will sometimes differ from the choices made by humans who write digits; the figure '4' above is an example. After the downward and rightward stroke, the TSP algorithm prefers to continue with the downward stroke, and then draw the remaining upper part of the rightmost line in the upward direction. Most humans would probably stop instead, and draw the rightmost line as a single stroke in one go. This reflects differences in the criteria or preferences that are optimized in selecting between different options. For the purpose of providing a consistent data set for sequence learning, any set of criteria used to guide these choices could be used in principle, as long the choices are made consistently and as long as the complexity of the resulting sequences is minimized.

Input image (MNIST training image no. 13): Input image Thresholded: Thinned: Sequence:

Sequence data file format

The files contain four columns:

  • dx and dy represent the movement in the horizontal and vertical direction, where the positive y direction is downward. The implicit starting point is (0, 0). In the example for the figure '6' (training image 13, see sequence below), the first line shows that the sequence starts at ( 0 + 18, 0 + 4), i.e. 18 points to the right from the top-left corner and 4 steps down. The next point is 1 step to the left (-1) and 1 step down (1), which gets us to (17, 5).
  • The third column (end-of-stroke, eos) is 1 when a point is the last point of a stroke, i.e. the pen will be lifted from the paper after that point.
  • The fourth column (end-of-digit, eod) is 1 for the last point of the entire sequence.

The corresponding code project contains R functions for visualizing the sequences.

Sequence for MNIST training image 13 (see figure '6' above, see complete sequence here):

dx dy eos eod
18 4 0 0
-1 1 0 0
-1 0 0 0
-1 1 0 0
0 1 0 0
-1 0 0 0
0 1 0 0
-1 1 0 0
0 1 0 0
-1 0 0 0
. . . .

Sequence learning benchmark

This sequence learning data set can be used for at least two different types of sequence learning challenges:

  1. Sequence prediction, i.e. predicting the stroke sequences themselves:

    • Given step k of a sequence, predict step k + 1. If the steps of a length n sequence are numbered from 1 to n, then steps 1 to n-1 are presented sequentially as inputs, and steps 2 to n must be predicted, feeding and predicting one step at a time.
    • After presenting a sequence, the first step of the next sequence will be fundamentally unpredictable, assuming the sequences are presented in a random order, and does not form part of the prediction problem, as the above description implies. The RMSE over all predicted sequence steps in the test data, i.e. all steps excluding the first steps of the sequences, is one example of a suitable error measure for performance reporting. For training the model, other loss functions may be more suitable.
  2. Sequence classification, i.e. predicting the digit class given the stroke sequence:

    • The sequence is received either step by step, or (for systems that can deal with variable length sequences as input) in one go.
    • Once the entire sequence is received, the task is to predict the digit class ( 0 to 9 ). For this problem, the test error rate (% of test data sequences classified incorrectly), as used for the regular MNIST benchmark, is a suitable error measure. To facilitate this prediction problem type, target data files are provided that include a binary (one-hot) representation of the digit class.

If you obtain results with this data set for either of these problems, or another variant, I'm happy to include them in an overview on this page; see contact info below.

Code

The code that was used to create this data set is available here

Terms of Use

Edwin D. de Jong holds the copyright of the MNIST stroke sequence data set, which is a derivative work of the MNIST dataset. Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of the MNIST dataset, which is a derivative work from original NIST datasets. The MNIST stroke sequence dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.

Contact

Please feel free to contact me for any questions or comments. My email is the 3 parts of my name in reverse order: