Problem in train process #799

younes-mirinezhad · 2024-04-25T10:01:33Z

Hi,

I need to use the MFA to align my voices and texts, so I have started to train the MFA for my target language.

First, I prepare a folder (speech_corpus) that contains *.wav (voice) and *.lab (transcript) files.

speech_corpus
   |___ln_01_00001.lab
   |___ln_01_00001.wav
   |___ln_01_00002.lab
   |___ln_01_00002.wav
   |___ln_01_00003.lab
   |___ln_01_00003.wav
   ...

After that, I prepare a pronunciation dictionary for my target language (pronunciation_dictionary.txt)

Then, I install my working environment

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner

As I see, my dataset has just one speaker so I use --single_speaker in all commands.

Then I start to validate my data
mfa validate --single_speaker ./speech_corpus pronunciation_dictionary.txt
This is my validation output:

INFO     Setting up corpus information...                                                                                                                                           
 INFO     Loading corpus from source files...                                                                                                                                        
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,892/100  [ 0:00:00 < 0:00:00 , 6,034 it/s ]
 INFO     Found 1 speaker across 3300 files, average number of utterances per speaker: 3300.0                                                                                        
 INFO     Initializing multiprocessing jobs...                                                                                                                                       
 INFO     Normalizing text...                                                                                                                                                        
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3,300/3,300  [ 0:00:01 < 0:00:00 , 2,293 it/s ]
 INFO     Generating MFCCs...                                                                                                                                                        
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3,300/3,300  [ 0:00:20 < 0:00:00 , 171 it/s ]
 INFO     Calculating CMVN...                                                                                                                                                        
 INFO     Generating final features...                                                                                                                                               
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3,300/3,300  [ 0:00:02 < 0:00:00 , 1,445 it/s ]
 INFO     Creating corpus split...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3,300/3,300  [ 0:00:01 < 0:00:00 , ? it/s ]
 INFO     Corpus                                                                                                                                                                     
 INFO     3300 sound files                                                                                                                                                           
 INFO     3300 text files                                                                                                                                                            
 INFO     1 speakers                                                                                                                                                                 
 INFO     3300 utterances                                                                                                                                                            
 INFO     27678.298 seconds total duration                                                                                                                                           
 INFO     Sound file read errors                                                                                                                                                     
 INFO     There were no issues reading sound files.                                                                                                                                  
 INFO     Feature generation                                                                                                                                                         
 INFO     There were no utterances missing features.                                                                                                                                 
 INFO     Files without transcriptions                                                                                                                                               
 INFO     There were no sound files missing transcriptions.                                                                                                                          
 INFO     Transcriptions without sound files                                                                                                                                         
 INFO     There were no transcription files missing sound files.                                                                                                                     
 INFO     Dictionary                                                                                                                                                                 
 INFO     Out of vocabulary words                                                                                                                                                    
 WARNING  80 OOV word types                                                                                                                                                          
 WARNING  64551total OOV tokens                                                                                                                                                      
 WARNING  For a full list of the word types, please see: /home/chiko/Documents/MFA/speech_corpus/oovs_found.txt. For a by-utterance breakdown of missing words, see:                 
          /home/chiko/Documents/MFA/speech_corpus/utterance_oovs.txt                                                                                                                 
 INFO     Training                                                                                                                                                                   
 INFO     Creating subset directory with 2000 utterances...                                                                                                                          
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:01 < 0:00:00 , ? it/s ]
 INFO     Initializing training for monophone...                                                                                                                                     
 INFO     Compiling training graphs...                                                                                                                                               
 INFO     Generating initial alignments...                                                                                                                                           
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 713 it/s ]
 INFO     Initialization complete!                                                                                                                                                   
 INFO     monophone - Iteration 1 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:41 < 0:00:00 , 47 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
  27%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 535/2,000  [ 0:00:01 < 0:00:05 , 348 it/s ]
 INFO     monophone - Iteration 2 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:12 < 0:00:00 , 162 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 949 it/s ]
 INFO     monophone - Iteration 3 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:08 < 0:00:00 , 251 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 947 it/s ]
 INFO     monophone - Iteration 4 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:07 < 0:00:00 , 276 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 930 it/s ]
 INFO     monophone - Iteration 5 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:06 < 0:00:00 , 302 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 929 it/s ]
 INFO     monophone - Iteration 6 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:06 < 0:00:00 , 312 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 922 it/s ]
 INFO     monophone - Iteration 7 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:06 < 0:00:00 , 318 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 914 it/s ]
 INFO     monophone - Iteration 8 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:06 < 0:00:00 , 331 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 925 it/s ]
 INFO     monophone - Iteration 9 of 40                                                                                                                                              
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 336 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:01 < 0:00:00 , 913 it/s ]
 INFO     monophone - Iteration 10 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 345 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 897 it/s ]
 INFO     monophone - Iteration 11 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:01 < 0:00:00 , 903 it/s ]
 INFO     monophone - Iteration 12 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 344 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 885 it/s ]
 INFO     monophone - Iteration 13 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 880 it/s ]
 INFO     monophone - Iteration 14 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 351 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 850 it/s ]
 INFO     monophone - Iteration 15 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 890 it/s ]
 INFO     monophone - Iteration 16 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 354 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 905 it/s ]
 INFO     monophone - Iteration 17 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 878 it/s ]
 INFO     monophone - Iteration 18 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 342 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:01 < 0:00:00 , 868 it/s ]
 INFO     monophone - Iteration 19 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 891 it/s ]
 INFO     monophone - Iteration 20 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 353 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:01 < 0:00:00 , 848 it/s ]
 INFO     monophone - Iteration 21 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 866 it/s ]
 INFO     monophone - Iteration 22 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 871 it/s ]
 INFO     monophone - Iteration 23 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 336 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 796 it/s ]
 INFO     monophone - Iteration 24 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:01 < 0:00:00 , 854 it/s ]
 INFO     monophone - Iteration 25 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 858 it/s ]
 INFO     monophone - Iteration 26 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 342 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 859 it/s ]
 INFO     monophone - Iteration 27 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 863 it/s ]
 INFO     monophone - Iteration 28 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 858 it/s ]
 INFO     monophone - Iteration 29 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 340 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 819 it/s ]
 INFO     monophone - Iteration 30 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 827 it/s ]
 INFO     monophone - Iteration 31 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 833 it/s ]
 INFO     monophone - Iteration 32 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 334 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 840 it/s ]
 INFO     monophone - Iteration 33 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 843 it/s ]
 INFO     monophone - Iteration 34 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 844 it/s ]
 INFO     monophone - Iteration 35 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 333 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 832 it/s ]
 INFO     monophone - Iteration 36 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 847 it/s ]
 INFO     monophone - Iteration 37 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 839 it/s ]
 INFO     monophone - Iteration 38 of 40                                                                                                                                             
 INFO     Generating alignments...                                                                                                                                                   
 100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/2,000  [ 0:00:05 < 0:00:00 , 340 it/s ]
 INFO     Accumulating statistics...                                                                                                                                                 
 100%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 834 it/s ]
 INFO     monophone - Iteration 39 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 830 it/s ]
 INFO     monophone - Iteration 40 of 40                                                                                                                                             
 INFO     Accumulating statistics...                                                                                                                                                 
 100%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/2,000  [ 0:00:02 < 0:00:00 , 845 it/s ]
 INFO     Training complete!                                                                                                                                                         
 INFO     Compiling training graphs...                                                                                                                                               
 INFO     Generating alignments...                                                                                                                                                   
  61% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/3,300  [ 0:00:05 < 0:00:04 , 333 it/s ]
 INFO     Accumulating transition stats...                                                                                                                                           
  61%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,003/3,300  [ 0:00:03 < 0:00:03 , 639 it/s ]
 INFO     Finished accumulating transition stats!                                                                                                                                    
 INFO     Collecting phone and word alignments from monophone_ali lattices...                                                                                                        
  61%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2,000/3,300  [ 0:00:06 < 0:00:05 , 321 it/s ]
 WARNING  Alignment analysis not available without using postgresql                                                                                                                  
 INFO     Beginning phone LM training...                                                                                                                                             
 INFO     Collecting training data...                                                                                                                                                
   0%  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3,300  [ 0:00:01 < -:--:-- , ? it/s ]
 INFO     Training model...                                                                                                                                                          
 INFO     Completed training in 296.4236822128296 seconds!                                                                                                                           
 INFO     Done! Everything took 333.653 seconds

As you can see after Iteration 40 of 40: Generating alignments... , Accumulating transition stats... , Collecting phone and word alignments from monophone_ali lattices... , Collecting training data... doesn't finish.
Whay?

After that, I validate my pronunciation dictionary
mfa validate_dictionary --single_speaker pronunciation_dictionary.txt
And this is my outputs:

WARNING  Skipped the following configuration keys: clean                                                                                                                            
 INFO     Not using a pretrained G2P model, training from the dictionary...                                                                                                          
 INFO     Training aligner                                                                                                                                                           
 INFO     Calculating alignments...                                                                                                                                                  
  89%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 223/250  [ 0:22:08 < 0:04:40 , 0 it/s ]
 INFO     Best likelihood: -13.7423                                                                                                                                                  
 INFO     Completed computing alignments!                                                                                                                                            
 INFO     Encoding the alignments as FSAs                                                                                                                                            
 INFO     Success! FAR path: /home/chiko/Documents/MFA/pronunciation_dictionary/train_g2p/pronunciation_dictionary.far; encoder path:                                                
          /home/chiko/Documents/MFA/pronunciation_dictionary/train_g2p/pronunciation_dictionary.enc                                                                                  
 INFO     Saved model to /home/chiko/Documents/MFA/pronunciation_dictionary/train_g2p/log/g2p_model.zip                                                                              
 WARNING  Skipped the following configuration keys: temporary_directory and num_jobs                                                                                                 
 WARNING  Skipped the following configuration keys: temporary_directory and num_jobs                                                                                                 
 INFO     Generating pronunciations...                                                                                                                                               
 INFO     Generating pronunciations...                                                                                                                                               
  92%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38,175/41,617  [ 0:00:47 < 0:00:03 , 1,229 it/s ]
 INFO     WER:    0.21                                                                                                                                                               
 INFO     WER:    0.21                                                                                                                                                               
 INFO     LER:    0.02                                                                                                                                                               
 INFO     LER:    0.02                                                                                                                                                               
 INFO     Done! Everything took 1407.547 seconds                                                                                                                                     
 INFO     Done! Everything took 1407.547 seconds

Now, I want to train the model and I run this command
mfa train --single_speaker speech_corpus/ pronunciation_dictionary.txt newModel.zip
And this is my error:

INFO     Using previous initialization.                                                                                                                                             
 ERROR    There was an error in the run, please see the log.                                                                                                                         
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/chiko/anaconda3/envs/aligner/bin/mfa", line 10, in <module>
    sys.exit(mfa_cli())
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/command_line/train_acoustic_model.py", line 144, in train_acoustic_model_cli
    trainer.train()
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 537, in train
    self.set_current_workflow(trainer.identifier)
  File "/home/chiko/anaconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 356, in set_current_workflow
    wf.current = True
AttributeError: 'NoneType' object has no attribute 'current'

What is the problem and what should I do?

The text was updated successfully, but these errors were encountered:

younes-mirinezhad · 2024-04-28T13:23:49Z

I started training without validation and the model was trained.

bdthanh · 2024-05-17T06:16:00Z

Hi, may I ask how you were able to run the train process, I am having the same problem now?

mmcauliffe · 2024-05-17T20:06:57Z

@bdthanh can you try rerunning your mfa train command with --clean appended and see if that works?

bdthanh · 2024-05-18T10:13:13Z

@bdthanh can you try rerunning your mfa train command with --clean appended and see if that works?

I solved it alr, thanks for your attention. It turns out my command was wrong at corpus path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in train process #799

Problem in train process #799

younes-mirinezhad commented Apr 25, 2024 •

edited

younes-mirinezhad commented Apr 28, 2024

bdthanh commented May 17, 2024

mmcauliffe commented May 17, 2024

bdthanh commented May 18, 2024

Problem in train process #799

Problem in train process #799

Comments

younes-mirinezhad commented Apr 25, 2024 • edited

younes-mirinezhad commented Apr 28, 2024

bdthanh commented May 17, 2024

mmcauliffe commented May 17, 2024

bdthanh commented May 18, 2024

younes-mirinezhad commented Apr 25, 2024 •

edited