Skip to content

In this project we will generate the sentences using ngrams

License

Notifications You must be signed in to change notification settings

screddy1313/Language-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Language-modelling

In this project we will generate the sentences using ngrams

Dataset

We will be using 20 newsgroup dataset which is standard dataset for text related tasks.

Code

  • In this project we will do the following tasks:

    • train the unigram, bigram, trigram model using all files of rec.sport.baseball and rec.motorcycle
    • given a sentence find the log probabilty of the sentence for above models
    • given a sentence find the perplexity of the sentence for different above models
    • given a sentence find the log probabilty using good turing smoothing for different models
  • Code is self documented in python notebook

Releases

No releases published

Packages

No packages published