[!] `train_step()` retuned `None` outputs. Skipping training step. #3637

daufilataf · 2024-03-18T09:49:03Z

Describe the bug

When I start to train with modified data, I got this error:

--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)

[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.

Then it continues in normal way:

--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)

--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)

........

To Reproduce

EPOCH: 0/100
--> TTS/recipes/ljspeech/our_new_tts/run-March-18-2024_01+31PM-dbf1a08a

DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 15852
| > Preprocessing samples
| > Max text length: 169
| > Min text length: 21
| > Avg text length: 76.67335352006056
|
| > Max audio length: 447821
| > Min audio length: 28224
| > Avg audio length: 124906.38638657583
| > Num. instances discarded samples: 0
| > Batch group size: 0.

TRAINING (2024-03-18 13:31:17)
i̇nanırıq ki, belə də olacaqdır
[!] Character '̇' not found in the vocabulary. Discarding it.

--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)

[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.

--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)

--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)

Expected behavior

Expected behavior is that, it should not return:
[!] train_step() retuned None outputs. Skipping training step.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA RTX A5000"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.1+cu121",
        "numpy": "1.24.3"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.5",
        "version": "#110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024"
    }
}

Additional context

No response

The text was updated successfully, but these errors were encountered:

stale · 2024-04-22T05:35:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

daufilataf added the bug Something isn't working label Mar 18, 2024

stale bot added the wontfix This will not be worked on but feel free to help. label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[!] `train_step()` retuned `None` outputs. Skipping training step. #3637

[!] `train_step()` retuned `None` outputs. Skipping training step. #3637

daufilataf commented Mar 18, 2024

stale bot commented Apr 22, 2024

[!] train_step() retuned None outputs. Skipping training step. #3637

[!] train_step() retuned None outputs. Skipping training step. #3637

Comments

daufilataf commented Mar 18, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

stale bot commented Apr 22, 2024

[!] `train_step()` retuned `None` outputs. Skipping training step. #3637

[!] `train_step()` retuned `None` outputs. Skipping training step. #3637