low accuracy on VQAv2 test-std when reproducing prompt tuning experiments #418

huahuaxiaomuzhu · 2023-09-23T06:31:09Z

Hi OFA team, lots of appreciation for the great work!
Recently i've been trying to reproduce the experiments in the paper OFA-prompt, and i got a base-size model with accuracy of 73.08 on test-dev split.
However the test accuracy dropped significantly on test-std set, in which i only got around 22.38(11189 out of 50000).
below is my training script:

for total_num_updates in 40000; do
  echo "total_num_updates "${total_num_updates}
  for warmup_updates in 1000; do
    echo "warmup_updates "${warmup_updates}  
    for lr in 0.03; do
      echo "lr "${lr}
      for patch_image_size in 480; do
        echo "patch_image_size "${patch_image_size}

        log_file=${log_dir}/${total_num_updates}"_"${warmup_updates}"_"${lr}"_"${patch_image_size}"_rank"${RANK}".log"
        save_path=${save_dir}/${total_num_updates}"_"${warmup_updates}"_"${lr}"_"${patch_image_size}
        mkdir -p $save_path

        CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=${MASTER_PORT} ../../train.py \
            ${data} \
            --selected-cols=${selected_cols} \
            --bpe-dir=${bpe_dir} \
            --user-dir=${user_dir} \
            --restore-file=${restore_file} \
            --reset-optimizer --reset-dataloader --reset-meters \
            --save-dir=${save_path} \
            --task=${task} \
            --arch=${arch} \
            --criterion=${criterion} \
            --label-smoothing=${label_smoothing} \
            --batch-size=${batch_size} \
            --update-freq=${update_freq} \
            --encoder-normalize-before \
            --decoder-normalize-before \
            --share-decoder-input-output-embed \
            --share-all-embeddings \
            --layernorm-embedding \
            --patch-layernorm-embedding \
            --code-layernorm-embedding \
            --resnet-drop-path-rate=${resnet_drop_path_rate} \
            --encoder-drop-path-rate=${encoder_drop_path_rate} \
            --decoder-drop-path-rate=${decoder_drop_path_rate} \
            --dropout=${dropout} \
            --attention-dropout=${attention_dropout} \
            --weight-decay=0.01 \
            --optimizer=adam \
            --adam-betas="(0.9,0.999)" \
            --adam-eps=1e-08 \
            --clip-norm=1.0 \
            --lr-scheduler=polynomial_decay \
            --lr=${lr} \
            --total-num-update=${total_num_updates} \
            --warmup-updates=${warmup_updates} \
            --log-format=simple \
            --log-interval=10 \
            --fixed-validation-seed=7 \
            --keep-last-epochs=15 \
            --save-interval=1 --validate-interval=1 \
            --max-update=${total_num_updates} \
            --best-checkpoint-metric=vqa_score --maximize-best-checkpoint-metric \
            --max-src-length=${max_src_length} \
            --max-object-length=${max_object_length} \
            --max-tgt-length=${max_tgt_length} \
            --find-unused-parameters \
            --freeze-encoder-embedding \
            --freeze-decoder-embedding \
            ${unconstrained_training_flag} \
            --ans2label-file=${ans2label_file} \
            --valid-batch-size=20 \
            --add-type-embedding \
            --scale-attn \
            --scale-fc \
            --encoder-prompt \
            --decoder-prompt \
            --encoder-prompt-type=${prompt_type_method} \
            --decoder-prompt-type=${prompt_type_method} \
            --encoder-prompt-length=${encoder_prompt_length} \
            --decoder-prompt-length=${decoder_prompt_length} \
            --scale-heads \
            --disable-entangle \
            --num-bins=${num_bins} \
            --patch-image-size=${patch_image_size} \
            --prompt-type='none' \
            --fp16 \
            --fp16-scale-window=512 \
            --add-object \
            ${uses_ema} \
            ${store_ema} \
            ${ema_fp32} \
            --ema-decay=${ema_decay} \
            --ema-start-update=${ema_start_update} \
            --val-inference-type=${val_inference_type} \
            --num-workers=0 > ${log_file} 2>&1
      done
    done
  done
done

during evaluating, i changed batch-size to 80 and ran single chunked file in #68 . I wonder whether batch-size and size of test-size affects. Thanks for your attention @JustinLin610

The text was updated successfully, but these errors were encountered:

huahuaxiaomuzhu · 2023-09-23T09:16:38Z

sorry for my missing of reading README.md, still hope for reviewing the training script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low accuracy on VQAv2 test-std when reproducing prompt tuning experiments #418

low accuracy on VQAv2 test-std when reproducing prompt tuning experiments #418

huahuaxiaomuzhu commented Sep 23, 2023

huahuaxiaomuzhu commented Sep 23, 2023

low accuracy on VQAv2 test-std when reproducing prompt tuning experiments #418

low accuracy on VQAv2 test-std when reproducing prompt tuning experiments #418

Comments

huahuaxiaomuzhu commented Sep 23, 2023

huahuaxiaomuzhu commented Sep 23, 2023