footnews-detection

Overview

The "footnews-detection-API" is a machine learning project aimed at predicting misinformation in football news. This project utilizes DistilBERT, a distilled version of BERT, for its core functionality. The primary goal is to fine-tune an AutoEncoder-like model to accurately perform this task.

EDA + Data Cleaning

In this section, the data was cleaned using regular expressions, and visualizations were created to display the most common words according to the label, utilizing matplotlib and wordcloud. The data were then split into three distinct sets: training, testing, and validation, and formatted into a DatasetDict structure.

Tokenization

Datasets were loaded and formatted into the "Dataset" format from Hugging Face. Subsequently, tokenization was performed using DistilBERT's associated tokenizer, preparing the data for the model.

Visualization

Here, the training dataset corpora were encoded and pooled (retrieving the [CLS] token) to facilitate dimensionality reduction for visualization. The data were standardized, and PCA was applied to identify potential clustering trends in 2D.

Training

DistilBERT was trained over 3 epochs, with accuracy and F1 score metrics being logged. This phase focused on model optimization to ensure reliable predictions.

Error Analysis (EA)

Error analysis involved examining the confusion matrix and assessing the loss on validation set examples. This step aimed to identify and analyze the phrases where the model was most frequently incorrect. Additionally, a practical example was applied to test the model's response in a real-world scenario.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
footnews-detection.ipynb		footnews-detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md