Skip to content

fake-news-UFG/FactChecks.br

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


FactChecks.br

https://www.flaticon.com/free-icon/detective_695826
GitHub release (latest by date) GitHub GitHub Repo stars

Collection of Portuguese Fact-Checking Benchmarks.

Getting Started

Dataset Type Domain Anotation Data time Number of samples
Fake.br Claim News Annotated 01/2016 - 01/2018 7.200
FakeRecogna Source News Agency 03/2017 - 05/2020 11.773
Central de Fatos Source News Agency 01/2013 - 05/2021 10.461
Fact-check_tweet (pt split) Claim-source pair Tweets-News Auto-Agency 2019 - 2021 656 - 656
FakeNewsSet Claim-source pair Tweets-News Auto-Agencys 26.970 - 598

Usage 🤗

from datasets import load_dataset

data = load_dataset("fake-news-UFG/FactChecksbr")

We additionally upload raw versions from Fake.br, FakeRecogna, Central de Fatos, and FakeNewsSet.

Review urls were tagged using review id.

Scripts

  • Notebook generation script and EDA is located at process.ipynb.
  • Builder scripts for Dataset Hub are located at builders/.

Data Analysis

Agency domains per dataset

image

Duplication

There are 23,467 sources in total, of which there are 20,028 unique sources. The biggest overlap is between "FakeRecogna" and "Central de Fatos". There is no source in common between all datasets.

From 3303 duplicated sources, we excluded 130 contradictory examples, in which one dataset indicates that source alledges “fake” while not alledges as "not fake".

image

Samples per class

image

Evaluation

If you evaluated any dataset, please feel free to pull a request. 😄

Dataset Model Accuracy Precision Recall macro-F1 URL
Fake.br Bertimbau 99,22% - - - repo
Fake.Br GloVe 100-600D - HAN 97% - - - paper
Fake.br Bertimbau + Regressão Logística 96,14% 96,40% 95,49% 96,13% paper
Fake.Br BoW 96% - - - paper
Fake.br GloVe 100D + BiLSTM 93.56% - - - repo
Fake.br TfidfVectorizer 92,85% 92,19% 93,36% - repo
Fake.BR BoW 89% 89% 89% 89% paper
Fake.br BoW + MLP 88,65% - - - repo
FakeNewsSetGen Detective 97,93% 97,93% - - repo
Fact-check_tweet XLM-R 84,08% - - 83,63% paper
FakeRecogna MLP + BoW 93,1% 93,1% 93,1% 93,0% repo

Citing

@misc{FactChecksbr,
author = {R. S. Gomes, Juliana},
title = {FactChecks.br},
url = {https://github.com/fake-news-UFG/FactChecks.br},
doi = { 10.57967/hf/1016 },
}

Acknowledgments

This work has been supported by the FAPEG (Fundação de Amparo à Pesquisa do Estado de Goiás) and ANATEL (Agência Nacional de Telecomunicações).