Skip to content
/ gibbs Public

Unsupervised morpheme segmentation using non-parametric bayesian model. Web app deployed on AWS Elastic Beanstalk using Github Actions.

License

Notifications You must be signed in to change notification settings

oya163/gibbs

Repository files navigation

Gibbs Sampling gibbs_ci MIT license

These are the outcomes of my learning during my internship (Jun-Aug 2020) in NAAMII. I wrote couple of scripts for finite and infinite mixture models based on Chinese Restaurant Process. I have shown usages of various conjugate priors and also collapsed sampling methods.

I worked along with Sushil Awale to implement unsupervised monolingual word segmentation based on goldwater-etal-2006-contextual and snyder-barzilay-2008-unsupervised. We came up with monolingual word segmentation with beta-geometric conjugate prior over the length of a given word.

Basic implementations

  • Finite Gaussian Mixture Model
  • Infinite Gaussian Mixture Model
  • Categorial/Multinomial Mixture Model with Dirichlet prior using MLE
  • Categorial/Multinomial Mixture Model with Dirichlet prior using collapsed sampling

Tasks

  • Implement Beta Geometric conjugate prior for word segmentation
  • Create Nepali segmentation dataset
  • Create simple flask based web app
  • Apply CI/CD pipeline
  • Deploy on AWS Elastic Beanstalk

Deployment

Please click here

About

Unsupervised morpheme segmentation using non-parametric bayesian model. Web app deployed on AWS Elastic Beanstalk using Github Actions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published