Study on Spam Email Classification Algorithms

Description

This repository represents the practical assignment within the "Machine Learning" course. The project aims to investigate the adaptability/adequacy of various classification algorithms in the context of solving the spam email detection problem, using the Ling-Spam dataset available here.

Requirements

1. Understanding the Dataset

Document the attributes and labels of the dataset, as well as the process of extracting them from the textual representation. Highlight the clues in the file titles (in the form of the "spm" prefix) indicating spam messages.

2. Dataset Split

Utilize the 9 folders (from part1 to part9) for training and keep one folder for testing (part10) from each category (wood, bars, stop, wood_stop).

3. Algorithm Selection and Implementation

Choose and implement an algorithm, among those studied, that you consider suitable for solving the spam classification problem.

4. LaTeX Report

Justify the algorithm choice in a LaTeX report, both theoretically and experimentally. Include a comparison with other candidate algorithms.

5. Leave-One-Out Cross-Validation

Implement and present results using the Leave-One-Out cross-validation strategy, including a statistical graph.

6. Algorithm Performance on Test Set

Add to the report a graph illustrating the algorithm's performance on the test dataset in terms of accuracy obtained. The accuracy should be significantly better than trivial strategies (random guessing or constant class selection). Include comparative graphs if you tested multiple algorithms.

7. Additional Details

Explain any relevant experiment detail, either in text or through graphs. Investigate improved variants of the algorithm studied in the seminar to implement and enhance accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
Classifiers		Classifiers
Data_Processing		Data_Processing
.gitattributes		.gitattributes
Documentație.pdf		Documentație.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Classifiers

Classifiers

Data_Processing

Data_Processing

.gitattributes

.gitattributes

Documentație.pdf

Documentație.pdf

README.md

README.md

Repository files navigation

Study on Spam Email Classification Algorithms

Description

Requirements

1. Understanding the Dataset

2. Dataset Split

3. Algorithm Selection and Implementation

4. LaTeX Report

5. Leave-One-Out Cross-Validation

6. Algorithm Performance on Test Set

7. Additional Details

About

Releases

Packages

Contributors 2

Languages

anaungurean/Spam-Email-Classification

Folders and files

Latest commit

History

Repository files navigation

Study on Spam Email Classification Algorithms

Description

Requirements

1. Understanding the Dataset

2. Dataset Split

3. Algorithm Selection and Implementation

4. LaTeX Report

5. Leave-One-Out Cross-Validation

6. Algorithm Performance on Test Set

7. Additional Details

About

Topics

Resources

Stars

Watchers

Forks

Languages