Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During the later epochs, the training speed decreases by 2-3 times(3.0.1) #2514

Closed
srdfjy opened this issue May 6, 2024 · 0 comments
Closed

Comments

@srdfjy
Copy link
Contributor

srdfjy commented May 6, 2024

Hi

When training Conformer U2++ with version 3.0.1, I noticed that the time taken for each batch to train in later epochs increased from 1-2 minutes in the early stages to 4-5 minutes. I have checked the following issues and they are all normal. Could you give me some other suggestions? THX!

Change

1,In this version, I have additionally introduced an online data augmentation method (after speed perturbation and before spectral enhancement) to dynamically blend some noise and reverberation during training (not enabled for the validation set).

Observation

1,I have noticed that only the training speed has slowed down, not the speed during validation evaluation.
2,The overall loss is still decreasing.

Investigate

1,I am using shards mode to read data via HTTP, so I checked the network and IO of the machine storing the samples and found them to be normal.
2,I checked the network, CPU, IO, and disk of the training machine and did not find any performance bottlenecks.
3,I checked the GPU on the training machine and found that the memory usage, power, and temperature are all normal, with the GPU utilization still at 100%.

@srdfjy srdfjy changed the title Training is getting slower and slower During the later epochs, the training speed decreases by 2-3 times(3.0.1) May 7, 2024
@srdfjy srdfjy closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant