This repo contains our submission for Semeval 2024 Task 4.
Feel free to take a deep dive 🦦🌊
Using pip and Python3.10.0
pip install -r requirements.txt
- Get data from the Task Site
- Load data into data directory:
data └── subtask1 ├── dev.json ├── dev_unlabeled.json ├── train.json └── validation.json
- Run preprocessing
python3 -m src.classes.preprocess
Execute the notebook: src/tune_classification_model.ipynb
✨Parameters✨ can be changed in the run-config:
dataset_style
: Eithercleaned
orall_lower
model_name
: Huggingface identifier for the model (likebert-base-cased
)use_custom_head
: Whether to use the custom head we developed (True
orFalse
)use_hierarchy
: Whether to use the hierarchy instead of only the leaves (True
orFalse
)extra_lazers
: Whether to add additional linear layers in the custom head (True
orFalse
)weight_loss
: Whether to weight the classes based on their inverse frequencies in the cross entropy loss calculation (True
orFalse
)epochs
: Number of training epochslr
: Learning ratebatch_size
: For the GPUacc_steps
: Accumulation stepsseed
: Set random seedlimit
: Only train on a subset of the data (int
orNone
to use the full dataset)