Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSE with Librimix: mismatch in number of speakers #5728

Open
AntoineBlanot opened this issue Apr 2, 2024 · 4 comments
Open

TSE with Librimix: mismatch in number of speakers #5728

AntoineBlanot opened this issue Apr 2, 2024 · 4 comments
Labels
Bug bug should be fixed SE Speech enhancement

Comments

@AntoineBlanot
Copy link

Describe the bug
There is a mismatch in the number of speech references and the number of speakers (which is 2 for the Librimix dataset).
Because of this issue, we cannot run the recipe training.

Basic environments:

  • OS information: Linux 5.4.0-173-generic pytorch GPU memory full error at validation #191-Ubuntu SMP Fri Feb 2 13:55:07 UTC 2024 x86_64
  • python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
  • espnet version: espnet 202402
  • pytorch version: pytorch 2.0.1
  • Git hash: 3858d84051d6bed263cefb968bb1727452012cf2
    • Commit date: Thu Mar 28 13:55:11 2024 +0000

Environments from torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-173-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB

Nvidia driver version: 470.239.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      43 bits physical, 48 bits virtual
CPU(s):                             256
On-line CPU(s) list:                0-255
Thread(s) per core:                 2
Core(s) per socket:                 64
Socket(s):                          2
NUMA node(s):                       8
Vendor ID:                          AuthenticAMD
CPU family:                         23
Model:                              49
Model name:                         AMD EPYC 7742 64-Core Processor
Stepping:                           0
Frequency boost:                    enabled
CPU MHz:                            3386.151
CPU max MHz:                        2250.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4491.76
Virtualization:                     AMD-V
L1d cache:                          4 MiB
L1i cache:                          4 MiB
L2 cache:                           64 MiB
L3 cache:                           512 MiB
NUMA node0 CPU(s):                  0-15,128-143
NUMA node1 CPU(s):                  16-31,144-159
NUMA node2 CPU(s):                  32-47,160-175
NUMA node3 CPU(s):                  48-63,176-191
NUMA node4 CPU(s):                  64-79,192-207
NUMA node5 CPU(s):                  80-95,208-223
NUMA node6 CPU(s):                  96-111,224-239
NUMA node7 CPU(s):                  112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Vulnerable
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca sme sev sev_es

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-ranger==0.1.1
[pip3] torch==2.0.1
[pip3] torch-complex==0.4.3
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.0.2
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.8           py310h5eee18b_0  
[conda] mkl_random                1.2.4           py310hdb19cb5_0  
[conda] numpy                     1.23.5          py310h5f9d8c6_1  
[conda] numpy-base                1.23.5          py310hb5e798b_1  
[conda] pytorch                   2.0.1           py3.10_cuda11.8_cudnn8.7.0_0    pytorch
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch                     2.2.2+cu118              pypi_0    pypi
[conda] torch-complex             0.4.3                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                2.2.2+cu118              pypi_0    pypi
[conda] torchtriton               2.0.0                     py310    pytorch

Task information:

  • Task: ENH-TSE
  • Recipe: librimix
  • ESPnet2

To Reproduce
Steps to reproduce the behavior:

  1. move to cd egs2/librimix/tse1
  2. execute `run.sh

Error logs

Traceback (most recent call last):                                                                                                                                                                             
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap                                                                                       
    self.run()                                                                                                                                                                                                 
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/multiprocessing/process.py", line 108, in run                                                                                              
    self._target(*self._args, **self._kwargs)                                                                                                                                                                  
  File "/home/c.maeda/espnet/espnet2/tasks/abs_task.py", line 1471, in main_worker                                                                                                                             
    cls.trainer.run(                                                                                                                                                                                           
  File "/home/c.maeda/espnet/espnet2/train/trainer.py", line 317, in run                                                                                                                                       
    all_steps_are_invalid = cls.train_one_epoch(                                                                                                                                                               
  File "/home/c.maeda/espnet/espnet2/train/trainer.py", line 614, in train_one_epoch                                                                                                                           
    retval = model(**batch)                                                                                                                                                                                    
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                        
    return forward_call(*args, **kwargs)                                                                                                                                                                       
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward                                                                     
    output = self._run_ddp_forward(*inputs, **kwargs)                                                                                                                                                          
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward                                                            
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]                                                                                                                                       
  File "/home/c.maeda/espnet/tools/.venv/envs/espnet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                        
    return forward_call(*args, **kwargs)                                                                                                                                                                       
  File "/home/c.maeda/espnet/espnet2/enh/espnet_model_tse.py", line 104, in forward                                                                                                                            
    assert len(speech_ref) == num_spk, (len(speech_ref), num_spk)                                                                                                                                              
AssertionError: (1, 2)
@AntoineBlanot AntoineBlanot added the Bug bug should be fixed label Apr 2, 2024
@sw005320 sw005320 added the SE Speech enhancement label Apr 2, 2024
@sw005320
Copy link
Contributor

sw005320 commented Apr 2, 2024

Thanks for raising the issue.
@Emrys365, can you answer it for me?

@Emrys365
Copy link
Collaborator

Emrys365 commented Apr 3, 2024

@AntoineBlanot Could you paste the content of run.sh and the model config file (.yaml) you used?

@AntoineBlanot
Copy link
Author

AntoineBlanot commented Apr 4, 2024

@AntoineBlanot Could you paste the content of run.sh and the model config file (.yaml) you used?

Sure ! Here there are:
run.sh

#!/usr/bin/env bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail

sample_rate=16k # 8k or 16k
min_or_max=min  # "min" or "max". This is to determine how the mixtures are generated in local/data.sh.


train_set="train"
valid_set="dev"
test_sets="test "

CUDA_VISIBLE_DEVICES=0,1 ./enh.sh \
    --is_tse_task true \
    --train_set "${train_set}" \
    --valid_set "${valid_set}" \
    --test_sets "${test_sets}" \
    --fs "${sample_rate}" \
    --ref_num 2 \
    --local_data_opts "--sample_rate ${sample_rate} --min_or_max ${min_or_max}" \
    --lang en \
    --ngpu 2 \
    --enh_config ./conf/train.yaml \
    "$@"

train.yaml

optim: adam
max_epoch: 100
batch_type: folded
batch_size: 16
iterator_type: chunk
chunk_length: 48000
# exclude keys "enroll_ref", "enroll_ref1", "enroll_ref2", ...
# from the length consistency check in ChunkIterFactory
chunk_excluded_key_prefixes:
- "enroll_ref"
num_workers: 4
optim_conf:
    lr: 1.0e-03
    eps: 1.0e-08
    weight_decay: 0
unused_parameters: true
patience: 20
accum_grad: 1
grad_clip: 5.0
val_scheduler_criterion:
- valid
- loss
best_model_criterion:
-   - valid
    - snr
    - max
-   - valid
    - loss
    - min
keep_nbest_models: 1
scheduler: reducelronplateau
scheduler_conf:
   mode: min
   factor: 0.7
   patience: 3

model_conf:
    num_spk: 2
    share_encoder: true

train_spk2enroll: data/train-100/spk2enroll.json
enroll_segment: 48000
load_spk_embedding: false
load_all_speakers: false

encoder: conv
encoder_conf:
    channel: 256
    kernel_size: 32
    stride: 16
decoder: conv
decoder_conf:
    channel: 256
    kernel_size: 32
    stride: 16
extractor: td_speakerbeam
extractor_conf:
    layer: 8
    stack: 4
    bottleneck_dim: 256
    hidden_dim: 512
    skip_dim: 256
    kernel: 3
    causal: False
    norm_type: gLN
    pre_nonlinear: prelu
    nonlinear: relu
    # enrollment related
    i_adapt_layer: 7
    adapt_layer_type: mul
    adapt_enroll_dim: 256
    use_spk_emb: false

# A list for criterions
# The overlall loss in the multi-task learning will be:
# loss = weight_1 * loss_1 + ... + weight_N * loss_N
# The default `weight` for each sub-loss is 1.0
criterions:
  # The first criterion
  - name: snr
    conf:
      eps: 1.0e-7
    wrapper: fixed_order
    wrapper_conf:
      weight: 1.0

@Emrys365
Copy link
Collaborator

Emrys365 commented Apr 4, 2024

Thank you! I think the error is caused by the default value of the argument load_all_speakers (=false) and in TSEPreprocessor. So it will only prepare one reference signal (corresponding to one of the speakers in each mixture sample) as the target.

To avoid this error, you could modify its value to True in train.yaml.

Sorry about this mistake, I will also make a PR to update the related files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed SE Speech enhancement
Projects
None yet
Development

No branches or pull requests

3 participants