You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully fine-tuned a Pyannote Audio model for speaker diarization using a custom dataset and now I'm facing difficulties testing the fine-tuned model. Despite following the documentation and adjusting the paths for the model checkpoint and configuration file, I encounter errors when attempting to test the model on a new audio file.
Here's the training code snippet I used for fine-tuning:
# Training code snippet
`importosimporttorchos.environ["PYANNOTE_DATABASE_CONFIG"] ="/yedek/pyannote/gsmDatasets202/datasets.yaml"frompyannote.databaseimportregistry , FileFinderregistry.load_database("/yedek/pyannote/gsmDatasets202/datasets.yaml")
dataset=registry.get_protocol("DATATEST.SpeakerDiarization.main", {"audio": FileFinder()})
frompyannote.audio.tasksimportSpeakerDiarizationfrompyannote.audio.models.segmentationimportPyanNettask=SpeakerDiarization(
dataset,
duration=5.0,
max_speakers_per_chunk=2,
max_speakers_per_frame=2,
batch_size=128,
num_workers=8,
loss="bce"
)
model=PyanNet(task=task)
# this takes approximately 15min to run on Google Colab GPUimporttorchtorch.set_float32_matmul_precision('high')
fromtypesimportMethodTypefromtorch.optimimportAdamfrompytorch_lightning.callbacksimport (
EarlyStopping,
ModelCheckpoint,
RichProgressBar,
)
# we use Adam optimizer with 1e-4 learning ratedefconfigure_optimizers(self):
returnAdam(self.parameters(), lr=1e-4)
model.configure_optimizers=MethodType(configure_optimizers, model)
# we monitor diarization error rate on the validation set# and use to keep the best checkpoint and stop earlymonitor, direction=task.val_monitorcheckpoint=ModelCheckpoint(
monitor=monitor,
mode=direction,
save_top_k=1,
every_n_epochs=1,
save_last=False,
save_weights_only=False,
filename="{epoch}",
verbose=False,
)
early_stopping=EarlyStopping(
monitor=monitor,
mode=direction,
min_delta=0.0,
patience=10,
strict=True,
verbose=False,
)
callbacks= [RichProgressBar(), checkpoint, early_stopping]
# we train for at most 20 epochs (might be shorter in case of early stopping)frompytorch_lightningimportTrainertrainer=Trainer(accelerator="gpu",
callbacks=callbacks,
max_epochs=200,
gradient_clip_val=0.5)
trainer.fit(model)
finetuned_model=checkpoint.best_model_pathprint(finetuned_model)
`
Andthisisthetestingcodethatleadstoerrors:
`frompyannote.audioimportModelimportjson# Model ve yapılandırma dosyasının yollarıMODEL_PATH="lightning_logs/version_24/checkpoints/epoch=57.ckpt"CONFIG_PATH="lightning_logs/version_9/hparams.yaml"AUDIO_FILE_PATH="wav2/20240123_112622.mp3"# Test edilecek ses dosyası# Konuşmacı diarizasyonu için hazır pipeline yükleniyorpipeline=Model.from_pretrained(MODEL_PATH)
# Ses dosyası üzerinde diarizasyon gerçekleştiriliyordiarization=pipeline(AUDIO_FILE_PATH)
# Diarizasyon sonuçlarının yazdırılmasıoutput= []
forsegment, _, speakerindiarization.itertracks(yield_label=True):
start=round(segment.start, 2) # Konuşmanın başladığı zaman (saniye cinsinden)end=round(segment.end, 2) # Konuşmanın bittiği zaman (saniye cinsinden)output.append({"speaker": speaker, "start": start, "end": end})
# Sonuçların JSON olarak yazdırılmasıprint(json.dumps(output, indent=4))`
`Traceback (mostrecentcalllast):
File"C:\Users\serca\PycharmProjects\pyannote\nemoo.py", line14, in<module>diarization=pipeline(AUDIO_FILE_PATH)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1518, in_wrapped_call_implreturnself._call_impl(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1527, in_call_implreturnforward_call(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\pyannote\audio\models\segmentation\PyanNet.py", line172, inforwardoutputs=self.sincnet(waveforms)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1518, in_wrapped_call_implreturnself._call_impl(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1527, in_call_implreturnforward_call(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\pyannote\audio\models\blocks\sincnet.py", line81, inforwardoutputs=self.wav_norm1d(waveforms)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1518, in_wrapped_call_implreturnself._call_impl(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\module.py", line1527, in_call_implreturnforward_call(*args, **kwargs)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\instancenorm.py", line71, inforwardself._check_input_dim(input)
File"C:\Users\serca\PycharmProjects\pyannote\venv2\lib\site-packages\torch\nn\modules\instancenorm.py", line161, in_check_input_dimifinput.dim() notin (2, 3):
AttributeError: 'str'objecthasnoattribute'dim'`
I'm looking for guidance on how to properly test the fine-tuned Pyannote Audio model or if there'sanyspecificstepImightbemissing. Anyhelporpointerstowardsresolvingthisissuewouldbegreatlyappreciated.
Thankyouinadvanceforyourassistance.
### Minimal reproduction example (MRE)https://colab.research.google.com/github/pyannote/pyannote-audio/blob/develop/tutorials/MRE_template.ipynb#scrollTo=gVrDtBcusDbK
The text was updated successfully, but these errors were encountered:
I think you are confusing pyannote's "models" (pyannote.audio.models.....) and pyannote's "pipelines" (pyannote.audio.pipelines.....).
The model that you finetune/train is the 'segmentation' model, it performs the speaker diarization task on duration=5.0 seconds windows.
There may be examples in a pyannote tutorial notebook, but I can't remember which one, so here is a pretty complete notebook about training a model and testing its pipeline (in particular the "Adapted pipeline output" section).
Tested versions
pyannote.audio 3.1.1
System information
Windows 11 - pyannote.audio 3.1.1
Issue description
I have successfully fine-tuned a Pyannote Audio model for speaker diarization using a custom dataset and now I'm facing difficulties testing the fine-tuned model. Despite following the documentation and adjusting the paths for the model checkpoint and configuration file, I encounter errors when attempting to test the model on a new audio file.
Here's the training code snippet I used for fine-tuning:
The text was updated successfully, but these errors were encountered: