Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham). calculated the likelihood of each test data document being in the 'spam' and 'ham' classes using gaussian, poisson, bernoulli and multinoulli probability distributions. Conducted a classification based on the probability values, calculated and printed the classification accuracy for each model. The accuracy represents the proportion of correctly classified documents. Got impressive accuracy scores of 86%, 85%, 79% and 85% for gaussian, poisson, bernoulli and multinoulli probability distributions respectively.
-
Notifications
You must be signed in to change notification settings - Fork 0
Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham).
neeti098/Text-classification-using-different-probability-distributions
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham).
Topics
Resources
Stars
Watchers
Forks
Packages 0
No packages published