Skip to content

I am building a model that predicts if a football tweet/news is real or not.

Notifications You must be signed in to change notification settings

ilanaliouchouche/footnews-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

footnews-detection

Overview

The "footnews-detection-API" is a machine learning project aimed at predicting misinformation in football news. This project utilizes DistilBERT, a distilled version of BERT, for its core functionality. The primary goal is to fine-tune an AutoEncoder-like model to accurately perform this task.

Table of Contents

EDA + Data Cleaning

In this section, the data was cleaned using regular expressions, and visualizations were created to display the most common words according to the label, utilizing matplotlib and wordcloud. The data were then split into three distinct sets: training, testing, and validation, and formatted into a DatasetDict structure.

Tokenization

Datasets were loaded and formatted into the "Dataset" format from Hugging Face. Subsequently, tokenization was performed using DistilBERT's associated tokenizer, preparing the data for the model.

Visualization

Here, the training dataset corpora were encoded and pooled (retrieving the [CLS] token) to facilitate dimensionality reduction for visualization. The data were standardized, and PCA was applied to identify potential clustering trends in 2D.

Training

DistilBERT was trained over 3 epochs, with accuracy and F1 score metrics being logged. This phase focused on model optimization to ensure reliable predictions.

Error Analysis (EA)

Error analysis involved examining the confusion matrix and assessing the loss on validation set examples. This step aimed to identify and analyze the phrases where the model was most frequently incorrect. Additionally, a practical example was applied to test the model's response in a real-world scenario.

About

I am building a model that predicts if a football tweet/news is real or not.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published