COVID19 Fake News Detection in English 🔎 👀

This repository contains the code for implementing the "A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection " (Accepted at CONSTRAINT Workshop, AAAI 2021).

Preprint: https://arxiv.org/abs/2101.03545

💡 Please look into our extended work: https://arxiv.org/pdf/2104.01791.pdf Accepted at Neurocomputing!

Task Description

It is a subtask in the CONSTRAINT-2021 shared task on the hostile post detection. This subtask focuses on the detection of COVID19-related fake news in English. The sources of data are various social-media platforms such as Twitter, Facebook, Instagram, etc. Given a social media post, the objective of the shared task is to classify it into either fake or real news.

For example, the following two posts belong to fake and real categories, respectively.

English Dataset: https://competitions.codalab.org/competitions/26655 or https://github.com/diptamath/covid_fake_news/tree/main/data

English dataset paper: https://arxiv.org/abs/2011.03327

Link to Competition: https://constraint-shared-task-2021.github.io/

Our Approach

Our basic approach involves trying out different language models. Such model have achievedstate-of-the-art results on a variety of text classification tasks, which was the basic driving force behind our intuition to use them. We have tried out different language models like XLNet, RoBERTa, XLM-RoBERTa, DeBERTa, ELECTRA and ERNIE2.0. The individual training model files can be obtained here.

In order to improve the performance of our classification model, we have tried out various ensemble techniques using various combinations of these models. The combination that has yielded the best result is the one using XLNet, RoBERTa, XLM-RoBERTa, DeBERTa. We have created a new feature set using the predictions from different model predictions and saved the resulting feature data. We have also tried out 2 ensemble techniques: Hard Voting and Soft Voting, where Soft Voting has achieved superior results with the above model combination. The code files related to ensembling can be found at this link.

All our work related to Heuristic Post-Processing can be obtained from the Analysis Folder. First, we extract our username statistics and domain statistics from the training data and save them in the Statistical meta folder. We merge our statistical features using this code. Finally, we create our datasets for post-processing and apply our post-processing algorithm to obtain the final classification result.

We also perform an ablation study regarding the priority of username handles and URL domains, and also regarding the threshold parameter, which can be accessed here.

Results

Our initial approach using ensembling achieved an F-score of 98.31 against the 98.69 F1-score of the leaderboard topper
Post evaluation, we have been able to improve our solution drastically achieving an F1-score of 98.83, using Heuristic Post-Processing

Citation

Please consider citing our paper in your publications if the project helps your research. The BibTeX reference is as follows:

@article{das2021heuristic,
title={A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection},
author={Das, Sourya Dipta and Basak, Ayan and Dutta, Saikat},
journal={arXiv preprint arXiv:2101.03545},
year={2021}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
Analysis		Analysis
Boosting		Boosting
ERNIE2.0		ERNIE2.0
Results		Results
Submissions		Submissions
data		data
training_code		training_code
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Analysis

Analysis

Boosting

Boosting

ERNIE2.0

ERNIE2.0

Results

Results

Submissions

Submissions

data

data

training_code

training_code

LICENSE

LICENSE

README.md

README.md

Repository files navigation

COVID19 Fake News Detection in English 🔎 👀

Task Description

Our Approach

Results

Citation

About

Releases

Packages

Contributors 3

Languages

License

diptamath/covid_fake_news

Folders and files

Latest commit

History

Repository files navigation

COVID19 Fake News Detection in English 🔎 👀

Task Description

Our Approach

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Languages