Skip to content

Is it best to extract features before calling trim_to_supervision? #1191

Answered by desh2608
RuABraun asked this question in Q&A
Discussion options

You must be logged in to vote

Lhotse only loads the relevant segments of the audio, so the full recording is not loaded. Usually I have 2 rules of thumb when deciding whether to trim before or after feature extraction:

  • If I will be using different segmentations of the same audio, I extract features before. This is so that all the segmentations can use the same underlying features. For example, I used this approach recently when working on long-form decoding with different segment sizes.
  • If the supervisions are only referring to a small part of the full recording, I extract after trimming. This is the case for some corpora such as VoxPopuli, where the full recordings are several hundred hours, whereas the transcriptio…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@RuABraun
Comment options

@desh2608
Comment options

Answer selected by RuABraun
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants