Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LABLoader raise ValueError("path must contain the {uri} placeholder.") even if the placeholder is configured correctly #99

Open
alephpi opened this issue Mar 15, 2024 · 3 comments

Comments

@alephpi
Copy link

alephpi commented Mar 15, 2024

Part of my configuration:

Databases:
  # tell pyannote.database where to find AMI wav files.
  # {uri} is a placeholder for the session name (eg. ES2004c).
  # you might need to update this line to fit your own setup.
  AMI: amicorpus/{uri}/audio/{uri}.Mix-Headset.wav
  AMI-SDM: amicorpus/{uri}/audio/{uri}.Array1-01.wav

Protocols:

  AMI-SDM:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab
        test:
            uri: ../lists/test.meetings.txt
            annotation: ../only_words/rttms/test/{uri}.rttm
            annotated: ../uems/test/{uri}.uem
            lab: ../only_words/labs/test/{uri}.lab

When I comment out these two lines, the program runs well and file['lab'] returns exactly an Annotation object

if "uri" not in self.placeholders_:
raise ValueError("`path` must contain the {uri} placeholder.")

Seems this sanity check is not working as expected. Also other loaders (e.g. RTTMLoader) don't have this line (I guess the logic should be similar).

@alephpi
Copy link
Author

alephpi commented Mar 16, 2024

Another observation:
load_rttm() returns a dict as {uri: annotation} while load_lab() returns simply annoation object, just wonder if this is a delibrate design as I see no reason for distinguishing the behaviour for similar functionalities.

@hbredin
Copy link
Member

hbredin commented Mar 29, 2024

The difference between rttm and lab lies in the fact that

  • lab format has no filename field, one lab file can therefore contain annotations for only one audio file. the uri must therefore be infered from the lab file name.
  • rttm format has a filename field, one rttm file can therefore contain annotations for multiple audio file.

@alephpi
Copy link
Author

alephpi commented Mar 30, 2024

Then what's the proper way to configure lab? Could you give me an example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants