Skip to content

futurikidis21/Spark-spam-message-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Employing pySpark to classify emails and detect spam

Abstract — This notebook presents pySpark code that processes and clasifies text data from a corpus of emails. The code implements data wrangling and classification techniques (i.e. logistic regression analysis) to build a process that recognises whether a certain email is spam or not. Alternative specifications of the classification technique used are explored to assess the performance and efficiency of the process.

See IPython Notebook: Ipython Notebook

Releases

No releases published

Packages

No packages published