You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi:
thanks for you great job!
I have a problem for the training , when I trained use in two 32g/v100, i found training time is slower in several step, and I
found when normal speed multiprocessing spawn have two process, and when time increase one process would be kill, and i
can't find why the process be killed, the ps -ef stat is Sl+, cpu memroy and gpu memory is sufficient,and I try to decrease batch_size to 256 , still can't solve。
the slow code is inference:
output, x_norm = model(input, target)
How do I need to deal with this problem?
The text was updated successfully, but these errors were encountered:
hi:
thanks for you great job!
I have a problem for the training , when I trained use in two 32g/v100, i found training time is slower in several step, and I
found when normal speed multiprocessing spawn have two process, and when time increase one process would be kill, and i
can't find why the process be killed, the ps -ef stat is Sl+, cpu memroy and gpu memory is sufficient,and I try to decrease batch_size to 256 , still can't solve。
the slow code is inference:
output, x_norm = model(input, target)
How do I need to deal with this problem?
The text was updated successfully, but these errors were encountered: