Skip to content

scarletcho/subword-mikolov

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

subword-mikolov

This is an implementation of English subword split method proposed in T. Mikolov (2012).
Details that are not explicitly described in the paper are inferred based on the example input-output pair in the paper.

  • Usage:

      $ python subword.py <corpus-filename> <W-parameter> <S-parameter>
    
  • Example (Mikolov, et al., 2012):

      INPUT:	new company dreamworks interactive
      OUTPUT:	new company dre+ am+ wo+ rks: in+ te+ ra+ cti+ ve:
    
  • Reference:

    Mikolov, T., Sutskever, I., Deoras, A., Le, H. S., Kombrink, S., & Cernocky, J. (2012b). Subword language modeling with neural networks. preprint. [pdf]

About

An implementation of subword division algorithm proposed in T. Mikolov (2012).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published