/
CHANGELOG
39 lines (29 loc) · 1.37 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Possible Future Features:
Pitman-Yor LMs
Latent Dirichlet Allocation
Linear regression for Good-Turing discounts
Linear interpolation for various models
Estimation of class-based models
Ver. 0.0.6 (5/21/2010)
Fixed a bug caused when manually setting a vocabulary with words that don't occur in the training corpus.
Ver. 0.0.5 (11/25/2009)
Fixed a bug in specifying the start and terminal symbols (thanks to Toru Taniguchi for pointing it out.)
Ver. 0.0.4 (11/13/2009)
Support for class-based LMs added
Smoothing for unigrams
Added the ability to output separate beginning and terminal symbols
Added support for Modified Kneser-Ney smoothing (of Chen and Goodman)
Fixed a bug in the unknown model code that affected models with unknown characters
Ver. 0.0.3 (6/22/2009)
Fixed the creation of empty strings when multiple white spaces exist
A variety of speed and memory improvements (removal of linked lists, indexing of the root node)
Added support for character-based modeling of unknown words
Fixed trimming so it works with Good-Turing smoothed models
Fixed a problem when piping data in to CountNgrams
Fixed a problem with WFST output that was killing beginning-of-sentence context
Ver. 0.0.2 (5/28/2009)
New Features:
It is now possible to trim n-grams by count
A set vocabulary list can be used to limit the vocabulary
Output in AT&T WFST format is possible
Documentation has been improved