GitHub - ayushbits/robust-aggregate-lfs: Source code of our ACL 2022 paper 'Learning to robustly aggregate labeling functions for semi-supervised data programming'

How to reproduce results

CUDA_LAUNCH_BLOCKING=0 python3 gpu_rewt_ss_generic.py /tmp l1 0 l3 l4 0 l6 qg 5 <dataset_path> <num_class> nn 0 <batch_size> <lr_learning_rate> <gm_learning_rate> normal f1

<dataset_path> is the path to the directory of the stored LFs
<num_class> is number of classes in the dataset (for eg, TREC has 6 classes and SMS has 2 classes)
<batch_size> is kept sa 32 in all our experiments
<lr_learning_rate> is set as 0.0003
<gm_learning_rate> is set as 0.01
last argument can be either f1 or accuracy where f1 refers to macro-F1.

cd reef/
python generate_human_lfs.py dataset(imdb/trec/sms/youtube) count/lemma savetype(dict/lemma)

cd reef/
python generic_generate_labels.py youtube normal dt 1 26 yt_val2.5_sup5_dt1 count

1st argument is dataset name (i.e imdb/trec/sms/youtube/sst5/twitter)
2nd argument is prefix of generated pkl files
3rd argument is number of LFs per step
4th argument is number of epochs
5th argument is storage path (LFs/data/youtube/<storage_path>) where pkl files will be stored
6th argument is type of features

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
reef		reef
trec_human		trec_human
.gitignore		.gitignore
README.md		README.md
cage.py		cage.py
deep_lstm.py		deep_lstm.py
deep_net.py		deep_net.py
get_embeddings.py		get_embeddings.py
gpu_cage.py		gpu_cage.py
gpu_rewt_cage.py		gpu_rewt_cage.py
gpu_rewt_ss_generic.py		gpu_rewt_ss_generic.py
gpu_ss_generic.py		gpu_ss_generic.py
gpu_weighted_cage.py		gpu_weighted_cage.py
logistic_regression.py		logistic_regression.py
losses.py		losses.py
requirements.txt		requirements.txt
rewt_ss_generic.py		rewt_ss_generic.py
ss_generic.py		ss_generic.py
train_reef_snuba.py		train_reef_snuba.py
weighted_cage.py		weighted_cage.py