Bcromas / propaganda_bert Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Predicting propaganda in news articles using BERT.

3 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
README.md		README.md
bert_dev.ipynb		bert_dev.ipynb
bert_test.ipynb		bert_test.ipynb
bert_train_validate.ipynb		bert_train_validate.ipynb
dummy_classifier.ipynb		dummy_classifier.ipynb
generate_data.ipynb		generate_data.ipynb
logistic_regression.ipynb		logistic_regression.ipynb
toa-heftiba-QHuauUyXRt8-unsplash.jpg		toa-heftiba-QHuauUyXRt8-unsplash.jpg

Repository files navigation

Predicting Propaganda using BERT

Medium article for the project

Guide to files:

EDA.ipynb
- Exploratory data analysis on training data.
- Leverage spaCy, intervaltree, textstat, and Matplotlib for visualizations.
- Features an approach for increasing training data by 160%!
generate_data.ipynb
- Create train, dev, and test .tsv files for classifiers.
- Generate negative spans from news articles, increasing data by +160%.
dummy_classifier.ipynb
- Establish a baseline model for evaluating more sophisticated models.
- Perform multi-class classification using training and dev data.
- Evaluate performance using confusion matrix, classification report, and micro F1.
logistic_regression.ipynb
- Create a logistic regression model.
- Perform grid search to adjust the model's hyperparameters.
- Perform multi-class classification using training and dev data.
- Evaluate performance using confusion matrix, classification report, and micro F1.
bert_train_validate.ipynb
- Fine-tune a pre-trained BERT model on the train data.
- Perform hyperparameter tuning by evaluating different variations based on: model, epochs, learning rate, and batch size.
- Evaluate average performance on validation sets using accuracy, precision, recall, and F1.
- Save the best performing version for evalution on the dev set.
bert_dev.ipynb
- Run best performing version on dev data.
- Evaluate performance using accuracy, precision, recall, and F1.
bert_test.ipynb
- Generate predictions for test data.
- Results are to be submitted to Semeval 2020 task 11 (TC).

Photo by Toa Heftiba on Unsplash

About

Predicting propaganda in news articles using BERT.

nlp scikit-learn pandas classification bert propaganda huggingface

Report repository

Languages

Jupyter Notebook 100.0%