Skip to content

HimariO/HatefulMemesChallenge

Repository files navigation

HatefulMemes

Intro

This is the source code of FacebookAI HatefulMemes challenge first place solution. In this comeptetion, we using multiple type of annotation extracted from hateful-memes dataset and feed those data into multi-modal transformers to achieve high accuracy. You can read about the detail of our approch in:

Dependency

NOTE: Make sure you follow this guide to let docker run as root, so it can be run by shell scripts with out sudo.

System sepc

Original experiement was conduct on GCP n1-highmem-16 instance init with TensorFlow2.3/Keras.CUDA11.0.GPU GCE Image:

  • OS: Ubuntu 18.04.5 LTS
  • CPU: 16 Core Intel CPU
  • Memory: 104 GB
  • GPU: 4 Nvidia T4
  • Disk: 500GB HDD

Most of the data preprocessing and model training could be done with only 1 T4 GPU, except VL-BERT need 4 GPU to achieve high enough batch size when fine-tuning Faster-RCNN & BERT togather.
NOTE: All models used in this project is using fp16 acceleration during training. Please use GPU support NVDIA AMP.

Steps

  1. Data preprocess and extract additional features. See detailed instruction at data_utils/README.

  2. Train modified VL-BERT(2 large one and 1 base one). See detailed instruction at VL-BERT/README.

  3. Train UNITER-ITM(1 large one and 1 base one) and VILLA-ITM(1 large one and 1 base one). See detailed instruction at UNITER/README.

  4. Train ERNIE-Vil(1 large one and 1 base one). See detailed instruction at ERNIE-VIL/README.

  5. Ensemble by average predictions of all model then apply simple rule-base racism detector on top of it.

    bash run_ensemble.sh

    This script will let you select the predition of different model to taken into ensemble. As result it will output ROOT/test_set_ensemble.csv as final result and copy all the csv files used in ensemble to ROOT/test_set_csvs folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published