GitHub - neeti098/Text-classification-using-different-probability-distributions: Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham).

Used data of emails being spam or non-spam for performing text classification using different probability distributions. Used NLTK library to remove stop words, non-alphabetic characters, and for tokenizing the text. Calculated mean and variance and other params for each word based on the label(spam or ham). calculated the likelihood of each test data document being in the 'spam' and 'ham' classes using gaussian, poisson, bernoulli and multinoulli probability distributions. Conducted a classification based on the probability values, calculated and printed the classification accuracy for each model. The accuracy represents the proportion of correctly classified documents. Got impressive accuracy scores of 86%, 85%, 79% and 85% for gaussian, poisson, bernoulli and multinoulli probability distributions respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
text_classification.ipynb		text_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

text_classification.ipynb

text_classification.ipynb

Repository files navigation

About

Packages

Languages

neeti098/Text-classification-using-different-probability-distributions

Folders and files

Latest commit

History

README.md

README.md

text_classification.ipynb

text_classification.ipynb

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages