Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hypo.word file missing during MMS ASR inference #5117

Open
ahazeemi opened this issue May 22, 2023 · 90 comments
Open

hypo.word file missing during MMS ASR inference #5117

ahazeemi opened this issue May 22, 2023 · 90 comments

Comments

@ahazeemi
Copy link

❓ Questions and Help

What is your question?

I'm facing the following issue while running the MMS ASR inference script examples/mms/asr/infer/mms_infer.py:

  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/workspace/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tmpsjatjyxt/hypo.word'

Code

python examples/mms/asr/infer/mms_infer.py --model "/workspace/fairseq/mms1b_fl102.pt" --lang "urd-script_arabic" --audio "/workspace/audio.wav"

What have you tried?

Tried running the ASR on different audios and languages

What's your environment?

  • fairseq Version (e.g., 1.0 or main): main
  • PyTorch Version (e.g., 1.0): 2.0.0
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): pip
  • Build command you used (if compiling from source): N/A
  • Python version: 3.10.10
  • CUDA/cuDNN version: 11.6
  • GPU models and configuration: NVIDIA A6000
  • Any other relevant information: N/A
@shsagnik
Copy link

shsagnik commented May 22, 2023

Facing the exact same issue

@vineelpratap
Copy link
Contributor

Hi, can you share the entire log? I just tested the code again and it works fine from my end.

@audiolion
Copy link

you need to check what the error is, change your mms_infer.py to

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL,)
print(out)

to see the error, for me it was I needed to pass cpu=True because I don't have CUDA installed. I did this by modifying my infer_common.yml file to have a new top level key common with the cpu: true key/val in it

common:
  cpu: true

@audiolion
Copy link

I am hitting this though and I am not sure what I am doing wrong, not sure if I am using the right lang_code, it doesn't say what the lang codes are or what standard it is referencing, I have tried en and en-US so far.

image

@shsagnik
Copy link

shsagnik commented May 22, 2023

Sure here is the full log of mine

(base) hello_automate_ai@machinelearningnotebook:~/fairseqmmstest/fairseq$ python "examples/mms/asr/infer/mms_infer.py" --model "/home/hello_automate_ai/fairseqmmstest/mms1b_all.pt" --lang hin --audio "/home/hello_automate_ai/fairseqmmstest/audio.wav"
preparing tmp manifest dir ...
loading model & running inference ...
Traceback (most recent call last):
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 18, in
import editdistance
ModuleNotFoundError: No module named 'editdistance'
Traceback (most recent call last):
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in
process(args)
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp6u8grbxl/hypo.word'

@shsagnik
Copy link

This is after the fix suggested by audiolion

@vineelpratap
Copy link
Contributor

@audiolion We expect a 3-digit language code. See 'Supported languages' section in README file for each model.
For example - use 'eng' for English.

@vineelpratap
Copy link
Contributor

@shsagnik
No module named 'editdistance' - You should install the missing module.

@audiolion
Copy link

@shsagnik

ModuleNotFoundError: No module named 'editdistance'

you need to install the modules that are used

@shsagnik
Copy link

shsagnik commented May 22, 2023

Got these errors this time

preparing tmp manifest dir ...
loading model & running inference ...
/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/core/plugins.py:202: UserWarning:
Error importing 'hydra_plugins.hydra_colorlog'.
Plugin is incompatible with this Hydra version or buggy.
Recommended to uninstall or upgrade plugin.
ImportError : cannot import name 'SearchPathPlugin' from 'hydra.plugins' (/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/plugins/init.py)
warnings.warn(
Traceback (most recent call last):
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai/INFER/None'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai/INFER'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/hello_automate_ai'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 499, in
cli_main()
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/speech_recognition/new/infer.py", line 495, in cli_main
hydra_main() # pylint: disable=no-value-for-parameter
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/main.py", line 32, in decorated_main
_run_hydra(
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
run_and_report(
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/utils.py", line 355, in
lambda: hydra.multirun(
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 136, in multirun
return sweeper.sweep(arguments=task_overrides)
File "/home/hello_automate_ai/miniconda3/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 140, in sweep
sweep_dir.mkdir(parents=True, exist_ok=True)
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/home/hello_automate_ai/miniconda3/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/checkpoint'
Traceback (most recent call last):
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in
process(args)
File "/home/hello_automate_ai/fairseqmmstest/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp0mcwde4n/hypo.word'

@altryne
Copy link

altryne commented May 22, 2023

Getting pretty much the same, used the right 3 letter language code (while waiting on #5119 to be answered) and doesn't seem to have an effect, hypo.word error is showing up

@dakouan18
Copy link

dakouan18 commented May 22, 2023

I got this error when i want to try ASR on google colab

/content/fairseq
>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/content/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/content/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/content/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/content/fairseq/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "/content/fairseq/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "/content/fairseq/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "/content/fairseq/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "/content/fairseq/fairseq/dataclass/configs.py", line 12, in <module>
    from omegaconf import II, MISSING
ModuleNotFoundError: No module named 'omegaconf'
CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'/content/mms1b_fl102.pt\'" task.data=/tmp/tmp79w8mawp dataset.gen_subset="eng:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmp79w8mawp\n        ', returncode=1)
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp79w8mawp/hypo.word'

@audiolion
Copy link

Please y'all read the error messages and try to debug yourself.

@dakouan18

ModuleNotFoundError: No module named 'omegaconf'

you need to install the missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

@shsagnik your hydra install has some issues, and you need to specify a checkpoint directory, it was setup to run on linux where you can make directories off the root (probably in a container) so change infer_common.yaml

image

@altryne
Copy link

altryne commented May 22, 2023

Thanks @audiolion
It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there.

Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'

@dakouan18
Copy link

hi @audiolion, after installing omegaconf & hydra a new error appeared

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
2023-05-22 22:22:29.307454: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-22 22:22:30.440434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/content/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/content/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/content/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/content/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/content/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/content/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'/content/mms1b_fl102.pt\'" task.data=/tmp/tmpk2ot70rk dataset.gen_subset="eng:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpk2ot70rk\n        ', returncode=1)
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 53, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpk2ot70rk/hypo.word'

@audiolion
Copy link

Thanks @audiolion It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there.

Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'

you need to do what I said in my first comment and output the process error message. the hyp.word file is not found because the actual ASR never ran and produced an output

@altryne
Copy link

altryne commented May 22, 2023

SIGH, I am, it prints the command and that's it.

>>> loading model & running inference ...
CompletedProcess(args='\nPYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common                                                                                                                           decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path=\'mms1b_l1107.pt\'" task.data=C:\\Users\\micro\\AppData\\Local\\Temp\\tmp                                                                                                                         p9t2lty3_ dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\\Users\\micro\\AppData\\Local\\Temp\\tmp9t2lty3_\n', returncode=0)
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 55, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 47, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmp9t2lty3_\\hypo.word'

However, when I go back and recreate that temp dir, and run the command manually myself I do seem to get errors.

Just for some reason not via the way you mentioned.

Had to install many packages on the way, here's a partial list (in case it helps anyone)

pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install hydra-core
pip install editdistance
pip install soundfile
pip install omegaconf
pip install hydra-core
pip install fairseq
pip install scikit-learn
pip install tensorboardX

Still getting nowhere. Running the subprocess command even with check=True and printing the output returns status code 0 with no errors.

@altryne
Copy link

altryne commented May 22, 2023

Got the model to finally load and run, apparently windows doesn't allow : in directory names and the above code adds :dev to the directory name.

So if you pass --lang tur like I did, it will try to create a directory named /tur:dev inside the /checkpoint which per @audiolion I also had to.. change as /checkpoint doesn't seem to do anything on windows.

I think the full inference ran, as the process got stuck for a few minutes, the GPU went to 8GB (impressive) and after a while, I had 2 errors again.

the hypo.word error seems to be a "catch all" error that means... many things that could go wrong, hopefully the authors will clean it up?

I'm currently staring at this error, and am pretty sure that's due to me removing the : from the dir name

  File "C:\Users\micro\projects\mms\examples\speech_recognition\new\infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "C:\Users\micro\projects\mms\examples\speech_recognition\new\infer.py", line 132, in __init__
    self.task.load_dataset(
  File "C:\Users\micro\projects\mms\fairseq\tasks\audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "C:\Users\micro\projects\mms\fairseq\tasks\audio_pretraining.py", line 175, in load_dataset
    for key, file_name in data_keys:
ValueError: not enough values to unpack (expected 2, got 1)

@bbz662
Copy link

bbz662 commented May 22, 2023

I had the same error with Google Colab and have investigated.

my error

>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='/content/mms1b_fl102.pt'" task.data=/content/tmp dataset.gen_subset="jpn:dev" common_eval.post_process=letter decoding.results_path=/content/tmp
        
>>> loading model & running inference ...
2023-05-22 22:02:52.055738: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-05-22 22:02:58,730][HYDRA] Launching 1 jobs locally
[2023-05-22 22:02:58,730][HYDRA] 	#0 : decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 common_eval.path='/content/mms1b_fl102.pt' task.data=/content/tmp dataset.gen_subset=jpn:dev common_eval.post_process=letter decoding.results_path=/content/tmp
[2023-05-22 22:02:59,254][__main__][INFO] - /content/mms1b_fl102.pt
Killed
Traceback (most recent call last):
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 54, in <module>
    process(args)
  File "/content/fairseq/examples/mms/asr/infer/mms_infer.py", line 46, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/content/tmp/hypo.word'

As it turns out, it was crashing at the following location.

self.layers = nn.ModuleList(

Looking at the RAM status, I believe the crash was caused by lack of memory.
image

So I feel that perhaps increasing the memory will solve the problem.

I hope this helps you in your investigation.

@betimd
Copy link

betimd commented May 22, 2023

Getting same error. Also documentation to run sample is horrible.

@audiolion
Copy link

audiolion commented May 22, 2023

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

@altryne
Copy link

altryne commented May 23, 2023

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

Yeah, that's what I mean, if anything happens within the subprocess for any reason, folks are going to get the above mentioned error. Then they will likely google their way into this issue, which covers many of the possible ways it can fail.
I was trying to be extra verbose for other folks to potentially help.

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

Thanks! You helped a lot, I eventually had to rewrite that whole block like so:

        import os
        os.environ["TMPDIR"] = str(tmpdir)
        os.environ["PYTHONPATH"] = "."
        os.environ["PREFIX"] = "INFER"
        os.environ["HYDRA_FULL_ERROR"] = "1"
        os.environ["USER"] = "micro"

        cmd = f"""python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}" common_eval.post_process={args.format} decoding.results_path={tmpdir}
"""

To even have the command execute and do something and not fail outright.

@audiolion
Copy link

glad you got it working!

@fcecagno
Copy link

fcecagno commented May 23, 2023

Hi, thanks for this discussion - I've learned a lot. This is the Dockerfile I created after a few hours trying to make it work:

FROM python:3.8

WORKDIR /usr/src/app

COPY . .

RUN pip install --no-cache-dir . \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir torch \
 && pip install --no-cache-dir hydra-core \
 && pip install --no-cache-dir editdistance \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir omegaconf \
 && pip install --no-cache-dir scikit-learn \
 && pip install --no-cache-dir tensorboardX \
 && python setup.py build_ext --inplace \
 && apt update \
 && apt -y install libsndfile-dev \
 && rm -rf /var/lib/apt/lists/* \
 && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
 && chmod +x /usr/bin/yq \
 && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml

CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

I built the image with:

docker build -t fairseq:dev .

And run it with:

docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav

@MohamedAliRashad
Copy link

I kept tracing error and solving them until i met this error:


  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):

Does anyone know a solution ?

@didadida-r
Copy link

didadida-r commented May 23, 2023

Hi, thanks for this discussion - I've learned a lot. This is the Dockerfile I created after a few hours trying to make it work:

FROM python:3.8

WORKDIR /usr/src/app

COPY . .

RUN pip install --no-cache-dir . \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir torch \
 && pip install --no-cache-dir hydra-core \
 && pip install --no-cache-dir editdistance \
 && pip install --no-cache-dir soundfile \
 && pip install --no-cache-dir omegaconf \
 && pip install --no-cache-dir scikit-learn \
 && pip install --no-cache-dir tensorboardX \
 && python setup.py build_ext --inplace \
 && apt update \
 && apt -y install libsndfile-dev \
 && rm -rf /var/lib/apt/lists/* \
 && wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq \
 && chmod +x /usr/bin/yq \
 && yq -i '.common.cpu = true' examples/mms/asr/config/infer_common.yaml

CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

I built the image with:

docker build -t fairseq:dev .

And run it with:

docker run --rm -it -e USER=root -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/mms1b_fl102.pt --lang eng --audio /mms/audio.wav

i run the code based on the docker, but it fails again

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "examples/speech_recognition/new/infer.py", line 499, in <module>
    cli_main()
  File "examples/speech_recognition/new/infer.py", line 495, in cli_main
    hydra_main()  # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 354, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/utils.py", line 355, in <lambda>
    lambda: hydra.multirun(
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 136, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 154, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/usr/local/lib/python3.8/site-packages/hydra/_internal/core_plugins/basic_launcher.py", line 76, in launch
    ret = run_job(
  File "/usr/local/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "examples/speech_recognition/new/infer.py", line 460, in hydra_main
    distributed_utils.call_main(cfg, main)
  File "/usr/src/app/fairseq/distributed/utils.py", line 404, in call_main
    main(cfg, **kwargs)
  File "examples/speech_recognition/new/infer.py", line 407, in main
    with InferenceProcessor(cfg) as processor:
  File "examples/speech_recognition/new/infer.py", line 132, in __init__
    self.task.load_dataset(
  File "/usr/src/app/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
    super().load_dataset(split, task_cfg, **kwargs)
  File "/usr/src/app/fairseq/tasks/audio_pretraining.py", line 150, in load_dataset
    if task_cfg.multi_corpus_keys is None:
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 305, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 303, in __getattr__
    return self._get_impl(key=key, default_value=DEFAULT_VALUE_MARKER)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 361, in _get_impl
    node = self._get_node(key=key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 383, in _get_node
    self._validate_get(key)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 135, in _validate_get
    self._format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/usr/local/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigAttributeError: Key 'multi_corpus_keys' is not in struct
        full_key: task.multi_corpus_keys
        reference_type=Any
        object_type=dict
Traceback (most recent call last):
  File "examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4o9kxdyr/hypo.word'

@EklavyaFCB
Copy link

EklavyaFCB commented May 23, 2023

Same error.

$ python examples/mms/asr/infer/mms_infer.py --model /idiap/temp/esarkar/cache/fairseq/mms1b_all.pt --lang shp --audio /idiap/temp/esarkar/Data/shipibo/downsampled_single_folder/short/shp-ROS-2022-03-14-2.1.wav

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/__init__.py", line 1, in <module>
    from . import criterions, models, tasks  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/__init__.py", line 15, in <module>
    importlib.import_module(
  File "/idiap/temp/esarkar/miniconda/envs/fairseq/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py", line 13, in <module>
    from fairseq import utils
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/__init__.py", line 33, in <module>
    import fairseq.criterions  # noqa
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/fairseq/criterions/__init__.py", line 18, in <module>
    (
TypeError: cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/remote/idiap.svm/user.active/esarkar/speech/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '/idiap/temp/esarkar/tmp/tmpnhi5rrui/hypo.word'

@hrishioa
Copy link

Same issue.

python examples/mms/asr/infer/mms_infer.py --model "models/mms1b_fl102.pt" --lang eng --audio "../testscripts/audio.wav"
>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
  File "~/fairseq/examples/speech_recognition/new/infer.py", line 21, in <module>
    from examples.speech_recognition.new.decoders.decoder_config import (
  File "~/fairseq/examples/__init__.py", line 7, in <module>
    from fairseq.version import __version__  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/fairseq/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "~/fairseq/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "~/fairseq/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "~/fairseq/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "~/fairseq/fairseq/dataclass/configs.py", line 1127, in <module>
    @dataclass
     ^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 1223, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 1213, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<location>/opt/anaconda3/envs/mms/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
Traceback (most recent call last):
  File "~/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "~/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/7r/6k64fzpn6sx5ml6pb2h67kbw0000gn/T/tmp9ubxk363/hypo.word'

@MinSukJoshyOh
Copy link

Ok for anyone who still have the FileNotFoundError: [Errno 2] No such file or directory for the hypo.word error and just want to test the inference:

Its really what the error says. :D
While inference the programming accesses the tmp folder and need to write some file. Including the hypo.word.
As the error says: in line 44 of the mms_infer.py the it trys to open and write the hypo.word.
with open(tmpdir/"hypo.word") as fr:
as you can see no righs are defined for the open method are defined. so just give python the right to write and read the file.
with open(tmpdir/"hypo.word", "w+") as fr:
this should be all.

you can see in the code

def process(args):    
    with tempfile.TemporaryDirectory() as tmpdir:
        print(">>> preparing tmp manifest dir ...", file=sys.stderr)
        tmpdir = Path("/home/divisio/projects/tmp/")
        with open(tmpdir / "dev.tsv", "w") as fw:
            fw.write("/\n")
            for audio in args.audio:
                nsample = sf.SoundFile(audio).frames
                fw.write(f"{audio}\t{nsample}\n")
        with open(tmpdir / "dev.uid", "w") as fw:
            fw.write(f"{audio}\n"*len(args.audio))
        with open(tmpdir / "dev.ltr", "w") as fw:
            fw.write("d u m m y | d u m m y\n"*len(args.audio))
        with open(tmpdir / "dev.wrd", "w") as fw:
            fw.write("dummy dummy\n"*len(args.audio))
        cmd = f"""
        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}:dev" common_eval.post_process={args.format} decoding.results_path={tmpdir}
        """
        print(">>> loading model & running inference ...", file=sys.stderr)
        subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,)
        with open(tmpdir/"hypo.word", "w+") as fr:
            for ii, hypo in enumerate(fr):
                hypo = re.sub("\(\S+\)$", "", hypo).strip()
                print(f'===============\nInput: {args.audio[ii]}\nOutput: {hypo}')

python should already created tthe files dev.tsv, dev.uid, dev.ltr and dev.wrd in the same tmp folder. If you want to check this, simply change the

tmpdir = Path(tmpdir) in to a static folder for instance in your user directory like
tmpdir = Path("/home/myuser/path/to/my/project/test")

and you will see that those file will bne created. including the hypo.word if you did the changes like I discribed before.

now the the examples/speech_recognition/new/infer.py will be triggerd in line 40.
and it might fail writing down the inference log file. like @v-yunbin discribed
`FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/.../INFER/None'

and its again just a problem with permissions to write some files.
next to the mms_infer.py file is a config folder including a infer_common.yaml and there is the property

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /checkpoint/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so it trys to write in to the checkpoint folder on root level. If you can not do that. simply change tthis folder my some folder in your user folder:
for instance

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /home/myuser/my/project/folder/tmp/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so now the script will have access to those folders and whrite down the inferece log (infer.log) in to that folder which includes the result of the ASR.

@KyattPL
Copy link

KyattPL commented May 26, 2023

I would say it isn't a catch all error, but rather that error handling from the subprocess call is not done, so if the call to run the inference fails for any reason, the hypo.word file will not have been created, and thus the open() call will fail and throw that error. So you have to dig backwards at the subprocess command to find out what happens. This just got open sourced so it makes sense there are some rough edges, contribute back to the repo!

Yeah, that's what I mean, if anything happens within the subprocess for any reason, folks are going to get the above mentioned error. Then they will likely google their way into this issue, which covers many of the possible ways it can fail. I was trying to be extra verbose for other folks to potentially help.

edit: @altryne my bad I thought by your message you were printing the command out itself, not the output of running the command. Your error does look like its failing because of the lack of :. Good news is its open source so you could change : to another character, or run it on windows subsytem linux, or run it in docker.

Thanks! You helped a lot, I eventually had to rewrite that whole block like so:

        import os
        os.environ["TMPDIR"] = str(tmpdir)
        os.environ["PYTHONPATH"] = "."
        os.environ["PREFIX"] = "INFER"
        os.environ["HYDRA_FULL_ERROR"] = "1"
        os.environ["USER"] = "micro"

        cmd = f"""python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}" common_eval.post_process={args.format} decoding.results_path={tmpdir}
"""

To even have the command execute and do something and not fail outright.

I'm pretty sure I made the same changes and I still get the unpack error. I change the ENV vars before the cmd string + copied your entire cmd string. Maybe I'm missing something in infer_common.yaml or with running with args? (Windows paths do be scuffed)

@hebochang
Copy link

There is a problem with the mms1b_fl102.pt model, and the replacement model is mms1b_all.pt.
I solved this problem.

@aberaud
Copy link

aberaud commented May 28, 2023

Not sure what I missed but running this I ran into an error this error. Maybe its a quick permission issue? Apologies I don't work with Docker regularly.

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
 File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
   self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER/None'**

During handling of the above exception, another exception occurred:

I edited the script and it's now working for me, with an Ubuntu 22.04 image, tested with both CUDA 11.8 and 12.1.
Note that I added permissions for /checkpoint/${USERNAME}.

Dockerfile.mms:

# Also works with CUDA 12.1:
#FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /usr/src/app

RUN apt-get update \
    && apt-get install -y python-is-python3 git python3-pip sudo wget curl

RUN git clone https://github.com/facebookresearch/fairseq.git \
    && cd fairseq \
    && pip install pip -U \
    && pip install --no-cache-dir . \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir torch \
    && pip install --no-cache-dir hydra-core \
    && pip install --no-cache-dir editdistance \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir omegaconf \
    && pip install --no-cache-dir scikit-learn \
    && pip install --no-cache-dir tensorboardX \
    && python setup.py build_ext --inplace

ENV USERNAME=user
RUN echo "root:root" | chpasswd \
    && adduser --disabled-password --gecos "" "${USERNAME}" \
    && echo "${USERNAME}:${USERNAME}" | chpasswd \
    && echo "%${USERNAME}    ALL=(ALL)   NOPASSWD:    ALL" >> /etc/sudoers.d/${USERNAME} \
    && chmod 0440 /etc/sudoers.d/${USERNAME}

RUN mkdir -p /checkpoint/${USERNAME}/INFER \
    && chown -R ${USERNAME}:${USERNAME} /checkpoint/${USERNAME}

USER ${USERNAME}
WORKDIR /usr/src/app/fairseq
CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

Building with:

docker build -t fairseq:dev -f Dockerfile.mms .

Running with:

docker run --rm -it --gpus all -e USER=user -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/examples/mms/mms1b_l1107.pt --lang fra --audio /mms/examples/mms/test16k.wav

@bekarys0504
Copy link

bekarys0504 commented May 29, 2023

Not sure what I missed but running this I ran into an error this error. Maybe its a quick permission issue? Apologies I don't work with Docker regularly.

>>> preparing tmp manifest dir ...
>>> loading model & running inference ...
Traceback (most recent call last):
 File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
   self._accessor.mkdir(self, mode)
**FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/user/INFER/None'**

During handling of the above exception, another exception occurred:

I edited the script and it's now working for me, with an Ubuntu 22.04 image, tested with both CUDA 11.8 and 12.1. Note that I added permissions for /checkpoint/${USERNAME}.

Dockerfile.mms:

# Also works with CUDA 12.1:
#FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /usr/src/app

RUN apt-get update \
    && apt-get install -y python-is-python3 git python3-pip sudo wget curl

RUN git clone https://github.com/facebookresearch/fairseq.git \
    && cd fairseq \
    && pip install pip -U \
    && pip install --no-cache-dir . \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir torch \
    && pip install --no-cache-dir hydra-core \
    && pip install --no-cache-dir editdistance \
    && pip install --no-cache-dir soundfile \
    && pip install --no-cache-dir omegaconf \
    && pip install --no-cache-dir scikit-learn \
    && pip install --no-cache-dir tensorboardX \
    && python setup.py build_ext --inplace

ENV USERNAME=user
RUN echo "root:root" | chpasswd \
    && adduser --disabled-password --gecos "" "${USERNAME}" \
    && echo "${USERNAME}:${USERNAME}" | chpasswd \
    && echo "%${USERNAME}    ALL=(ALL)   NOPASSWD:    ALL" >> /etc/sudoers.d/${USERNAME} \
    && chmod 0440 /etc/sudoers.d/${USERNAME}

RUN mkdir -p /checkpoint/${USERNAME}/INFER \
    && chown -R ${USERNAME}:${USERNAME} /checkpoint/${USERNAME}

USER ${USERNAME}
WORKDIR /usr/src/app/fairseq
CMD [ "python", "examples/mms/asr/infer/mms_infer.py" ]

Building with:

docker build -t fairseq:dev -f Dockerfile.mms .

Running with:

docker run --rm -it --gpus all -e USER=user -v $(pwd):/mms:ro fairseq:dev python examples/mms/asr/infer/mms_infer.py --model /mms/examples/mms/mms1b_l1107.pt --lang fra --audio /mms/examples/mms/test16k.wav

Worked for me thanks! For the ones not proficient with docker, just make sure to create on a directory where your docker file is located a directory /examples/mms and place your model and audio files in that directory. What this line $(pwd):/mms:ro does is it mounts the current directory (the present working directory) as a read-only volume inside the container at the path /mms.

@abdeladim-s
Copy link

Hi all,
If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files.
`Hope it will be useful! :)

@bekarys0504
Copy link

bekarys0504 commented May 30, 2023

Hi all,
If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files.
`Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

@abdeladim-s
Copy link

Hi all,
If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files.
`Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

I just noticed that the MMS project is not included yet in the released version of fairseq, so you will need to install it from source until then:

pip uninstall fairseq && pip install git+https://github.com/facebookresearch/fairseq

The installation steps are updated accordingly.
Let me know @bekarys0504 if that solved the issue ?

@bekarys0504
Copy link

bekarys0504 commented May 30, 2023

Hi all,
If someone is still struggling to run the code, I tried to create a Python package to easily use the MMS project, instead of calling subprocess and dealing with yaml files.
`Hope it will be useful! :)

I get the following error after following all the steps:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:16
     15 try:
---> 16     from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     17     from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'fairseq.examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:21
     20 try:
---> 21     from examples.mms.data_prep.align_and_segment import get_alignments
     22     from examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans

ModuleNotFoundError: No module named 'examples.mms'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 from easymms.models.asr import ASRModel
      3 asr = ASRModel(model='/bekarys/fairseq/models/mms1b_fl102.pt')
      4 files = val_data_annotated.audio_path.to_list()[:2]

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py:37
     35 from easymms import utils
     36 from easymms._logger import set_log_level
---> 37 from easymms.models.alignment import AlignmentModel
     38 from easymms.constants import CFG, HYPO_WORDS_FILE, MMS_LANGS_FILE
     40 logger = logging.getLogger(__name__)

File /scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/alignment.py:27
     25 import fairseq
     26 sys.path.append(str(Path(fairseq.__file__).parent))
---> 27 from fairseq.examples.mms.data_prep.align_and_segment import get_alignments
     28 from fairseq.examples.mms.data_prep.align_utils import get_uroman_tokens, get_spans
     29 from fairseq.examples.mms.data_prep.text_normalization import text_normalize

ModuleNotFoundError: No module named 'fairseq.examples.mms'

I just noticed that the MMS project is not included yet in the released version of fairseq, so you will need to install it from source until then:

pip uninstall fairseq && pip install git+https://github.com/facebookresearch/fairseq

The installation steps are updated accordingly. Let me know @bekarys0504 if that solved the issue ?

I have the following error now :( @abdeladim-s

Traceback (most recent call last):
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_2570058/32768016.py", line 6, in <module>
    transcriptions = asr.transcribe(files, lang='kaz', align=False)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/easymms/models/asr.py", line 170, in transcribe
    self.wer = hydra_main(cfg)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/hydra/main.py", line 27, in decorated_main
    return task_function(cfg_passthrough)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 436, in hydra_main
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/distributed/utils.py", line 369, in call_main
    if cfg.distributed_training.distributed_init_method is None:
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 383, in main
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 103, in __init__
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/examples/speech_recognition/new/infer.py", line 205, in load_model_ensemble
    out_file.write(line)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 367, in load_model_ensemble
    arg_overrides (Dict[str,Any], optional): override model args that
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 482, in load_model_ensemble_and_task
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2056, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Wav2VecCtc:
	Unexpected key(s) in state_dict: "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.0.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.1.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.2.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.3.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.4.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.5.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.6.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.7.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.8.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.9.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.10.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.11.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.12.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.13.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.14.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.15.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.16.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.17.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.18.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.19.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.20.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.21.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.22.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.23.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.24.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.25.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.26.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.27.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.28.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.29.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.30.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.31.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.32.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.33.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.34.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.35.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.36.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.37.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.38.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.39.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.40.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.41.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.42.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.43.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.44.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.45.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.46.adapter_layer.ln_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.W_a", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.W_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.b_a", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.b_b", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.ln_W", "w2v_encoder.w2v_model.encoder.layers.47.adapter_layer.ln_b". 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2102, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1310, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1199, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 1052, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 978, in format_exception_as_a_whole
    frames.append(self.format_record(record))
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 878, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/IPython/core/ultratb.py", line 712, in lines
    return self._sd.lines
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 681, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/stack_data/core.py", line 660, in executing_piece
    return only(
  File "/scriptur/nemo_asr/env/lib/python3.8/site-packages/executing/executing.py", line 190, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

@abdeladim-s
Copy link

@bekarys0504, what model are you using ? I think you are using a wrong model!

@bekarys0504
Copy link

bekarys0504 commented May 30, 2023

@bekarys0504, what model are you using ? I think you are using a wrong model!

this one mms1b_fl102.pt downloaded through this link https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt

Should be the right one it is for ASR @abdeladim-s

@abdeladim-s
Copy link

@bekarys0504, what model are you using ? I think you are using a wrong model!

this one mms1b_fl102.pt downloaded through this link https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt

Should be the right one it is for ASR @abdeladim-s

@bekarys0504 , yes it is a right model it seems.
Could you please submit an issue on the project repo so we can debug this issue further together ?

@andergisomon
Copy link

Ok for anyone who still have the FileNotFoundError: [Errno 2] No such file or directory for the hypo.word error and just want to test the inference:

Its really what the error says. :D While inference the programming accesses the tmp folder and need to write some file. Including the hypo.word. As the error says: in line 44 of the mms_infer.py the it trys to open and write the hypo.word. with open(tmpdir/"hypo.word") as fr: as you can see no righs are defined for the open method are defined. so just give python the right to write and read the file. with open(tmpdir/"hypo.word", "w+") as fr: this should be all.

you can see in the code

def process(args):    
    with tempfile.TemporaryDirectory() as tmpdir:
        print(">>> preparing tmp manifest dir ...", file=sys.stderr)
        tmpdir = Path("/home/divisio/projects/tmp/")
        with open(tmpdir / "dev.tsv", "w") as fw:
            fw.write("/\n")
            for audio in args.audio:
                nsample = sf.SoundFile(audio).frames
                fw.write(f"{audio}\t{nsample}\n")
        with open(tmpdir / "dev.uid", "w") as fw:
            fw.write(f"{audio}\n"*len(args.audio))
        with open(tmpdir / "dev.ltr", "w") as fw:
            fw.write("d u m m y | d u m m y\n"*len(args.audio))
        with open(tmpdir / "dev.wrd", "w") as fw:
            fw.write("dummy dummy\n"*len(args.audio))
        cmd = f"""
        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='{args.model}'" task.data={tmpdir} dataset.gen_subset="{args.lang}:dev" common_eval.post_process={args.format} decoding.results_path={tmpdir}
        """
        print(">>> loading model & running inference ...", file=sys.stderr)
        subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,)
        with open(tmpdir/"hypo.word", "w+") as fr:
            for ii, hypo in enumerate(fr):
                hypo = re.sub("\(\S+\)$", "", hypo).strip()
                print(f'===============\nInput: {args.audio[ii]}\nOutput: {hypo}')

python should already created tthe files dev.tsv, dev.uid, dev.ltr and dev.wrd in the same tmp folder. If you want to check this, simply change the

tmpdir = Path(tmpdir) in to a static folder for instance in your user directory like tmpdir = Path("/home/myuser/path/to/my/project/test")

and you will see that those file will bne created. including the hypo.word if you did the changes like I discribed before.

now the the examples/speech_recognition/new/infer.py will be triggerd in line 40. and it might fail writing down the inference log file. like @v-yunbin discribed `FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/.../INFER/None'

and its again just a problem with permissions to write some files. next to the mms_infer.py file is a config folder including a infer_common.yaml and there is the property

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /checkpoint/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so it trys to write in to the checkpoint folder on root level. If you can not do that. simply change tthis folder my some folder in your user folder: for instance

hydra:
  run:
    dir: ${common_eval.results_path}/${dataset.gen_subset}
  sweep:
    dir: /home/myuser/my/project/folder/tmp/${env:USER}/${env:PREFIX}/${common_eval.results_path}
    subdir: ${dataset.gen_subset}

so now the script will have access to those folders and whrite down the inferece log (infer.log) in to that folder which includes the result of the ASR.

I did what you described and while it ran for 6 minutes, I get a "Killed" in the output with no other information. The RAM was basically maxed out throughout and there was no hypo.word not found error. The model is probably just too big to run on free Colab.

@andergisomon
Copy link

How much resources does it really take to run the l1107 model anyways? Because running it on colab maxed out 12GB of system RAM. Feels really overkill for a 10 second audio input.

@patrickvonplaten
Copy link
Contributor

It takes less than 8GB with the code snippet of https://huggingface.co/facebook/mms-1b-all and can easily be run on CPU - give it a try ;-)

@andergisomon
Copy link

andergisomon commented Jun 2, 2023

It takes less than 8GB with the code snippet of https://huggingface.co/facebook/mms-1b-all and can easily be run on CPU - give it a try ;-)

That's good to know B), because even after tweaking the line where asr.py was supposed to do stuff to hypo.word it ran on Colab but after 6 minutes of maxing out the 12GB of RAM it was killed. The audio file wasn't even long, it was less than 10 seconds long.

By the way I have yet to try it using 🤗 transformers, I'm referring to the colab notebook demoing ASR that's having trouble running.

@patrickvonplaten
Copy link
Contributor

Here we go: https://colab.research.google.com/drive/1jqREwuNUn0SrzcVjh90JSLleSVEcx1BY?usp=sharing simple 4 cell colab

@bagustris
Copy link

I also found this error "hypo.word" on another machine (Ubuntu 20.04) while there is no problem in the other (Ubuntu 22.04). Actually there is an error before No such file or directory: /tmp/hypo.word:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

After updating Numpy (from 1.21.5 to 1.24.3) the error was gone and the the output of ASR is shown in the bottom.

@bmox
Copy link

bmox commented Jun 27, 2023

missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

Yes you are right. Smaller model is working 😓

@HironTez
Copy link

HironTez commented Jul 4, 2023

To fix this issue, add open(tmpdir/"hypo.word", 'w').close() before the line 48 in "fairseq\examples\mms\asr\infer\mms_infer.py"

@Jackylee2032
Copy link

what files need to be changed in windows

@jackylee1
Copy link

BTW, it should now be very simple to use MMS with transformers:

See:

your project is perfect,but i want to know how to use my own voice to translate

@SalmaZakaria
Copy link

Please y'all read the error messages and try to debug yourself.

@dakouan18

ModuleNotFoundError: No module named 'omegaconf'

you need to install the missing modules, one of them being omegaconf

@altryne you need to print the error output to debug

@shsagnik your hydra install has some issues, and you need to specify a checkpoint directory, it was setup to run on linux where you can make directories off the root (probably in a container) so change infer_common.yaml
image

I have the same error as @shsagnik
What should I do ? I ran on ubuntu

@spanta28
Copy link

Sorry to bother you here. I am unable to run mms asr transcribe . I am using python3.11 and facing a range of issues from hypo.word not found, AttributeError: 'PosixPath' object has no attribute 'find' and what not.

Going through the issues, there is no landing solutions except lot of comments..
#5284 (already tried that solution and lead to my error posted below)
#5117 (has no solution)

There are just way too many threads relating to mms asr transcribe issues but no working solutions posted, if there is one set of installation instructions that actually work and are documented somewhere that be great.

Here is my error:

os.environ["TMPDIR"] ='/Users/spanta/Downloads/fairseq-main/temp_dir'

os.environ["PYTHONPATH"] = "."

os.environ["PREFIX"] = "INFER"

os.environ["HYDRA_FULL_ERROR"] = "1"

os.environ["USER"] = "micro"

os.system('python3.11 examples/mms/asr/infer/mms_infer.py --model "/Users/spanta/Downloads/fairseq/models_new/mms1b_fl102.pt" --lang "tel" --audio "/Users/spanta/Documents/test_wav/1.wav"')

preparing tmp manifest dir ...

loading model & running inference ...

/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py:440: UserWarning:

The version_base parameter is not specified.

Please specify a compatability version level, or None.

Will assume defaults for version 1.1

@hydra.main(config_path=config_path, config_name="infer")

Traceback (most recent call last):

File "/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py", line 499, in

cli_main()

File "/Users/spanta/Downloads/fairseq-main/examples/speech_recognition/new/infer.py", line 495, in cli_main

hydra_main()  # pylint: disable=no-value-for-parameter

^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main

_run_hydra(

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 355, in _run_hydra

hydra = run_and_report(

        ^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report

raise ex

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report

return func()

       ^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/utils.py", line 356, in

lambda: Hydra.create_main_hydra2(

        ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 61, in create_main_hydra2

config_loader: ConfigLoader = ConfigLoaderImpl(

                              ^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_loader_impl.py", line 48, in init

self.repository = ConfigRepository(config_search_path=config_search_path)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 65, in init

self.initialize_sources(config_search_path)

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 72, in initialize_sources

scheme = self._get_scheme(search_path.path)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/homebrew/lib/python3.11/site-packages/hydra/_internal/config_repository.py", line 143, in _get_scheme

idx = path.find("://")

      ^^^^^^^^^

AttributeError: 'PosixPath' object has no attribute 'find'

0

@didi222-lqq
Copy link

Thanks @audiolion It wasn't immediately clear that mms_infer.py calls the whole hydra thing via a command, as it obscures the errors that pop up there.
Here's the full output I'm getting (added a print out of the cmd command as well)

$ python examples/mms/asr/infer/mms_infer.py --model mms1b_l1107.pt --audio output_audio.mp3 --lang tur
>>> preparing tmp manifest dir ...

        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name i
                                                                                                                                           infer_common decoding.type=viterbi dataset.max_tokens=4000000 distributed_training.distributed_world_size=1 "common_eval.path='mms1b_l1107.pt'" task.data=C:\Users\micro\AppData\Local\Temp\tmpxzum3zve dataset.gen_subset="tur:dev" common_eval.post_process=letter decoding.results_path=C:\Users\micro\AppData\Local\Temmp\tmpxzum3zve

>>> loading model & running inference ...
Traceback (most recent call last):
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 53, in <module>
    process(args)
  File "C:\Users\micro\projects\mms\examples\mms\asr\infer\mms_infer.py", line 45, in process
    with open(tmpdir/"hypo.word") as fr:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\micro\\AppData\\Local\\Temp\\tmpxzum3zve\\hypo.word'

you need to do what I said in my first comment and output the process error message. the hyp.word file is not found because the actual ASR never ran and produced an output

Hello, I output the error message according to your comment, and it printed the following error
“CompletedProcess(args='\n PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path='./models_new/mms1b_all.pt'" task.data=/tmp/tmpepozridd dataset.gen_subset="adx:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpepozridd \n ', returncode=1)”
The complete log is as follows:
微信图片_20240305211018

@didi222-lqq
Copy link

After series of tries, i was able to get it to infer on Linux but it could probably work on Windows also. The hypo.word file missing error is due to exceptions thrown during subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL,) so first i suggest you replace that line with the following:

out = subprocess.run(cmd, check=True, shell=True, stdout=subprocess.DEVNULL, )
print(out)

This will enable you see whats causing the error. Also provide the full paths of your model and audio files like this python examples/mms/asr/infer/mms_infer.py --model "/home/hunter/Downloads/mms1b_all.pt" --lang eng --audio "/home/hunter/Downloads/audio.wav"

After I replaced the code, the error message output is as follows:

CompletedProcess(args='\n        PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1 python examples/speech_recognition/new/infer.py -m --config-dir examples/mms/asr/config/ --config-name infer_common decoding.type=viterbi dataset.max_tokens=1440000 distributed_training.distributed_world_size=1 "common_eval.path=\'models_new/mms1b_all.pt\'" task.data=/tmp/tmpuarv2nsi dataset.gen_subset="adx:dev" common_eval.post_process=letter decoding.results_path=/tmp/tmpuarv2nsi \n        ', returncode=1)

The complete log is as follows:
微信图片_20240310220025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests