Skip to content

Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham).

Notifications You must be signed in to change notification settings

neeti098/Text-classification-using-different-probability-distributions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham). calculated the likelihood of each test data document being in the 'spam' and 'ham' classes using gaussian, poisson, bernoulli and multinoulli probability distributions. Conducted a classification based on the probability values, calculated and printed the classification accuracy for each model. The accuracy represents the proportion of correctly classified documents. Got impressive accuracy scores of 86%, 85%, 79% and 85% for gaussian, poisson, bernoulli and multinoulli probability distributions respectively.

About

Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham).

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published