-
-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization Pipeline config on diarize.py #773
Comments
hi, having the same problem with network restriction. if there are any solution, i would be interested to know. Thank you in advanced |
It is possible to download all the required models and reference them from a local file system. This article from AWS describes downloading all of the models to a local file system, which is similar to the approach I took. I was able to build a docker image that loads all the models from AWS S3 into the docker container during build and then reference all the models via their local path when running whipserx. Specifically for Diarization, where print(">> Loading Diarization Pipeline") |
@alejandrogranizo the above is my config.yaml. You can go to huggingface, sign the agreement with pyannote and download the respective model.bin file, and change the path to the model.bin file. Implemented and worked on my machine, both on cloud and on prem. DM me if you have any probs |
initialize and
this should work |
@Hyprnx which version of pyannote.audio are you using? I got an error with pyannote.audio=3.1.1 : "threshold parameter doesn't exist". |
@Dmitriuso the pyannote audio i use comes with WhisperX when i install it. i didnt install it separately.
this should work |
If you are looking for a way to run whisperx completely offline, I have a script for that, Repo - https://github.com/nkilm/offline-whisperx You have to manually download the models and then specify the paths in the script. The script works 100% locally without internet. |
Hi, im opening this issue since we are working from a place with connection restrictions. HuggingFace downloads falls into these kinds of restrictions, so the configuration of the DiarizationPipeline class is becoming a problem when trying to use the diarization feature of the library.
We are trying to run the following code in our project:
self.diarize_model = whisperx.DiarizationPipeline(model_name='pyannote/speaker-diarization-3.1', use_auth_token='OUR_VALID_TOKEN', device='cuda')
This is the recommended way to create the diarization pipeline to later start the diarization feature.
Issue comes because of the restrictions of the network. As the use of the Pipeline in the DiarizationPipeline class is using pyannote.audio Pipeline.from_pretrained method with arguments (model_name, auth_token) instead of giving a way of checking local resources first (such as providing the local path for the config.yml or config.yaml file as arg), it always uses the instantiation of pyannote.audio Pipeline in a way that gets to the line hf_hub_download, as the model name is never detected as a yml/yaml file because of the treatment made on DiarizationPipeline class. As the request does not go trough because of the network restrictions that are present in many territories, it delays the execution until multiple timeouts for the request occur, just to finally go to the option of searching for the local model that exists in the filesystem since the beggining of the execution.
Please is any solution available, when using it in a bigger project it is so annoying having to wait for the multiple timeouts to occur just to test and debug the project.
Thanks
The text was updated successfully, but these errors were encountered: