Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with continue and fork mode terminated due to unhandled system error #1017

Open
drremo1 opened this issue Mar 31, 2023 · 0 comments

Comments

@drremo1
Copy link

drremo1 commented Mar 31, 2023

Hello, I have recently installed wav2letter v0.2 on ubuntu 18.04. I am now trying to continue training with the pretrained dev-clean transformer models from sota/2019 recipe for only 1 epoch. However, the training won't start and it immediately gets terminated showing these errors:

I0401 05:59:19.868680 17296 Train.cpp:80] Parsing command line flags
I0401 05:59:19.868815 17296 Train.cpp:81] Overriding flags should be mutable when using `continue`
I0401 05:59:19.868882 17296 Train.cpp:85] Reading flags from file /mnt/d/198/train.cfg
terminate called after throwing an instance of 'std::runtime_error'
  what():  unhandled system error
*** Aborted at 1680299961 (unix time) try "date -d @1680299961" if you are using GNU date ***
PC: @     0x7f5e92f1ce87 gsignal
*** SIGABRT (@0x3e800004390) received by PID 17296 (TID 0x7f5ec06ac380) from PID 17296; stack trace: ***
    @     0x7f5ebf583980 (unknown)
    @     0x7f5e92f1ce87 gsignal
    @     0x7f5e92f1e7f1 abort
    @     0x7f5e93911957 (unknown)
    @     0x7f5e93917ae6 (unknown)
    @     0x7f5e93917b21 std::terminate()
    @     0x7f5e93917d54 __cxa_throw
    @     0x55cf42b5c6f8 fl::detail::ncclCheck()
    @     0x55cf42b5ddd7 fl::distributedInit()
    @     0x55cf42acb387 w2l::initDistributed()
    @     0x55cf4283eab2 main
    @     0x7f5e92effc87 __libc_start_main
    @     0x55cf428a7e4a _start
Aborted

This happens also happens when I try it with fork.

This error was obtained by running this:

wav2letter/build/Train continue /mnt/d/198 --flagsfile /mnt/d/198/train.cfg --logtostderr=1 --minloglevel=0 --rndv_filepath=

At first I thought it was the flagsfile but removing it from the command line gives the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant