Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training procedure fails in phoenix2014.py #6

Open
DorinK opened this issue May 1, 2021 · 6 comments
Open

The training procedure fails in phoenix2014.py #6

DorinK opened this issue May 1, 2021 · 6 comments

Comments

@DorinK
Copy link

DorinK commented May 1, 2021

Dear authors,

I've been trying to use the code for the Phoenix dataset by running:

python main.py --bsl1k_mouthing_prob_thres 0.8 --checkpoint checkpoint/phoenix2014t_i3d_pkinetics --datasetname phoenix2014 --phoenix_path /home/nlp/dorink/project/bsl1k/data_phoenix --gpu_collation 0 --num-classes 1233 --num_figs 0 --pretrained misc/pretrained_models/model.pth.tar --schedule 5 10 12 --snapshot 1 --test-batch 1 --train-batch 1 --workers 0 --num_gpu 4

following your instructions in run

  • Technical question - I would like to train the model from scratch on the phoenix2014T dataset using the I3D algorithm, should I use the pertained model you provided in model?
    I also hope the command I mentioned above relates to my training intent, please correct me if not.

First of all, I think there may be an error in line 72 of datasets/phoenix2014.py:
self.frame_level_glosses = data["videos"]["alignments"]["gloss_id"]
because in the dictionary created by misc/phoenix2014T/gather_frames.py, there is no "alignment" property.
So I'm currently using the following patch:
self.frame_level_glosses = data["videos"]["gloss_ids"]
Please update me if this should be something else.

My main problem is when I try to train the model using the command mentioned above, the code fails in the ‘_get_class’ function of datasets/phoenix2014.py, because the variable clip_glosses seems to be an empty list [] 98% of the time.
The error:

File "/home/nlp/dorink/project/bsl1k/datasets/phoenix2014.py", line 124, in _get_class
max_indices = np.where(cnts == cnts.max())[0]
ValueError: zero-size array to reduction operation maximum which has no identity.

What can be done to solve this problem?
Thanks in advance.

@gulvarol
Copy link
Owner

gulvarol commented May 7, 2021

Thanks for reporting this. I have realized that I haven't done a long overdue update of the code about the phoenix part. The current version of the code does not really correspond to the released model training because phoenix2014T was trained with the CTC loss for which I have removed support to simplify the code. On the other hand, self.assign_labels == "auto" in the dataloader is only applicable to phoenix2014 (without T) for which automatic frame-level gloss alignments were provided, but I see that I didn't include the info.pkl file that has these alignments. I will need some time to check these edits and update the code in the next week or two. In the meantime, you could try by setting self.assign_labels="uniform" but this had worse performance.

@DorinK
Copy link
Author

DorinK commented May 8, 2021

Gul Thanks for your reply, I would love for you to update me as soon as the code matching updates to phoenix2014T are done and in which code files the updates were made.

Also, I tried the alternative option you offered even before I opened the issue, but also in the evaluation process I've encountered incompatibilities in the evaluate.py file, which begin with the aggregate_clips function in the gt variable and are dragged along the entire evaluation process and prevent its completion.

For that matter, I used the following command for the evaluation -
python main.py --checkpoint /home/nlp/dorink/project/bsl1k/checkpoint/phoenix2014/bug_fix/test_050 --datasetname phoenix2014 --num_gpus 4 -j 32 -e --evaluate_video 1 --pretrained /home/nlp/dorink/project/bsl1k/checkpoint/phoenix2014t_i3d_pkinetics_bug_fix/checkpoint_050.pth.tar --num-classes 1233 --num_in_frames 16 --save_features 1 --include_embds 1 --test_set test --phoenix_path /home/nlp/dorink/project/bsl1k/data_phoenix

I would be happy if you could also update the evaluation code accordingly.

In addition, I would be happy to receive your answer regarding the Technical question I asked above (my first comment). Is the trained model you provided necessary as a starting point for training? And in particular, is it suitable for phoneix2014T?

@DorinK
Copy link
Author

DorinK commented Jun 5, 2021

I would love to get an update on whether the updated code is ready and will be pulled soon to the repo? If not yet, then I would love to know how much longer do you think it will take to complete these updates?
Thanks in advance!

@gulvarol
Copy link
Owner

Sorry for the slow response. I clearly failed to update the code on time, so I would prefer not to make another estimate now. I will try to find some time for it. Find other answers below:

  1. Please use evaluate_seq.py to run the evaluation on phoenix.
  2. The released model for Phoenix2014T was trained with multiple stages:
    a - Training on Phoenix2014 with automatic labels (BSL-1K pretraining) [1296 classes] => 50 epochs: 53.7 WER;
    b - Finetuning on Phoenix2014T with uniform labels [1232 classes] => 50 epochs: 48.2 WER;
    c - Finetuning on Phoenix2014T with CTC loss freezing up to Mixed_5c I3D layers [1233 classes adding a background] => 6 epochs: 41.5 WER;
    d - Finetuning on Phoenix2014T with CTC loss unfreezing all layers [1233 classes] => 4 epochs: 39.5 WER.

Step a was pretrained on this model or equivalently by setting --pretrained misc/pretrained_models/bsl1k.pth.tar. So I would suggest putting this for training from scratch. The link you have asked corresponds to the controlled experiment of following steps a through d, but instead with Kinetics pretraining in step a.

Steps c and d are heavy. I'd like to check whether I can train one model with a single step so that it's simpler. If it helps: the result of training with a single stage only on Phoenix2014T uniform labels (without CTC, without Phoenix2014 pretraining, with BSL-1K pretraining) was 53.7 WER.

@rabeya-akter
Copy link

Sorry for the slow response. I clearly failed to update the code on time, so I would prefer not to make another estimate now. I will try to find some time for it. Find other answers below:

1. Please use `evaluate_seq.py` to run the evaluation on phoenix.

2. The released model for Phoenix2014T was trained with multiple stages:
   a - Training on Phoenix2014 with automatic labels (BSL-1K pretraining) [1296 classes] => 50 epochs: 53.7 WER;
   b - Finetuning on Phoenix2014T with uniform labels [1232 classes] => 50 epochs: 48.2 WER;
   c - Finetuning on Phoenix2014T with CTC loss freezing up to Mixed_5c I3D layers [1233 classes adding a background] => 6 epochs: 41.5 WER;
   d - Finetuning on Phoenix2014T with CTC loss unfreezing all layers [1233 classes] => 4 epochs: 39.5 WER.

Step a was pretrained on this model or equivalently by setting --pretrained misc/pretrained_models/bsl1k.pth.tar. So I would suggest putting this for training from scratch. The link you have asked corresponds to the controlled experiment of following steps a through d, but instead with Kinetics pretraining in step a.

Steps c and d are heavy. I'd like to check whether I can train one model with a single step so that it's simpler. If it helps: the result of training with a single stage only on Phoenix2014T uniform labels (without CTC, without Phoenix2014 pretraining, with BSL-1K pretraining) was 53.7 WER.

Can you give the pretrained model for step d? I want to extract the i3d feature using that.

@gulvarol
Copy link
Owner

gulvarol commented Oct 8, 2023

Hi, sorry I am not at capacity to provide support too much. But from what I read above, the released model is already the result of step d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants