You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training Conformer U2++ with version 3.0.1, I noticed that the time taken for each batch to train in later epochs increased from 1-2 minutes in the early stages to 4-5 minutes. I have checked the following issues and they are all normal. Could you give me some other suggestions? THX!
Change
1,In this version, I have additionally introduced an online data augmentation method (after speed perturbation and before spectral enhancement) to dynamically blend some noise and reverberation during training (not enabled for the validation set).
Observation
1,I have noticed that only the training speed has slowed down, not the speed during validation evaluation.
2,The overall loss is still decreasing.
Investigate
1,I am using shards mode to read data via HTTP, so I checked the network and IO of the machine storing the samples and found them to be normal.
2,I checked the network, CPU, IO, and disk of the training machine and did not find any performance bottlenecks.
3,I checked the GPU on the training machine and found that the memory usage, power, and temperature are all normal, with the GPU utilization still at 100%.
The text was updated successfully, but these errors were encountered:
srdfjy
changed the title
Training is getting slower and slower
During the later epochs, the training speed decreases by 2-3 times(3.0.1)
May 7, 2024
Hi
When training Conformer U2++ with version 3.0.1, I noticed that the time taken for each batch to train in later epochs increased from 1-2 minutes in the early stages to 4-5 minutes. I have checked the following issues and they are all normal. Could you give me some other suggestions? THX!
Change
1,In this version, I have additionally introduced an online data augmentation method (after speed perturbation and before spectral enhancement) to dynamically blend some noise and reverberation during training (not enabled for the validation set).
Observation
1,I have noticed that only the training speed has slowed down, not the speed during validation evaluation.
2,The overall loss is still decreasing.
Investigate
1,I am using shards mode to read data via HTTP, so I checked the network and IO of the machine storing the samples and found them to be normal.
2,I checked the network, CPU, IO, and disk of the training machine and did not find any performance bottlenecks.
3,I checked the GPU on the training machine and found that the memory usage, power, and temperature are all normal, with the GPU utilization still at 100%.
The text was updated successfully, but these errors were encountered: