Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
transformers		transformers
.DS_Store		.DS_Store
README.md		README.md
botometer.npy		botometer.npy
collect_results.py		collect_results.py
covid_exp_env.yml		covid_exp_env.yml
covid_exp_utils.py		covid_exp_utils.py
fasttext_exp.py		fasttext_exp.py
run_en_bert.sh		run_en_bert.sh
run_en_fasttext.sh		run_en_fasttext.sh
tanbih.npy		tanbih.npy
tanbih_sentences.npy		tanbih_sentences.npy
twitter-v2.ipynb		twitter-v2.ipynb
web_helpers.py		web_helpers.py
xg_boost_binary.py		xg_boost_binary.py
xg_boost_multiclass.py		xg_boost_multiclass.py

README.md

COVID-19-disinformation

Setup Environment

Please create an experimental directory

mkdir exp_covid19_disinfo

To better organize things please copy the data and scripts into exp_covid19_disinfo directory.

cd /your_path/exp_covid19_disinfo/
conda env create -f bin/covid_exp_env.yaml
source activate $path_to_your_env/transformers

Dataset

Please check the Readme in directory data/

How to Run Experiments

Experiments with transformers

cd /your_path/exp_covid19_disinfo/

All transformers scripts are here

cd /your_path/exp_covid19_disinfo/
export HOME_DIR="/your_path/exp_covid19_disinfo/bin/transformers"
bash bin/run_en_bert.sh

Experiments with FastText

First, download the pre-trained embeddings released by the FastText team.

Arabic: Common Crawl and Wikipedia CBOW
Bulgarian: Common Crawl and Wikipedia CBOW
Dutch: Common Crawl and Wikipedia CBOW
English: 2 million word vectors trained with subword information on Common Crawl (600B tokens).

After downloading and extracting the embeddings, two *.vec files will be available to be used in our experiments.

Run the following command to start an experiment for a specific question:

bash bin/run_en_fasttext.sh

pretrained-vectors/crawl-300d-2M-subword.vec points to the pretrained vector (replace with language specific vectors when running experiments)
--autotuneDuration defines how long the tuning should run in seconds - the longer this is, the better the final model.

Experiments with Social features

First, install the following packages:

pip install requests
pip install feature-engine

Then, go through the notebook twitter-v2.ipynb. The notebook reads the social features in data/english/covid19_infodemic_english_data_multiclass_final_all.jsonl, and converts them to machine learning format. This means that categorical features are converted via one-hot-encoder technique, numerical features are log scaled, and boolean features are turned to 0s and 1s. The last cell of the notebook saves the output under data/ folder with a file named feature_english.tsv.

Script to get the results from the generated json files

Run the following script to collect all results within a base experimental directory:

python bin/collect_results.py --set test --metrics "accuracy, micro-f1, weighted-f1" experiments/exp_bert_arabic/

The script supports nested experimental setups, it will automatically find all experiments within this directory for all the questions and output one row per experiment.

Files

bin

Directory actions

More options