Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch dataloader cannot compute length #1337

Open
njellinas opened this issue May 10, 2024 · 1 comment
Open

Pytorch dataloader cannot compute length #1337

njellinas opened this issue May 10, 2024 · 1 comment

Comments

@njellinas
Copy link

njellinas commented May 10, 2024

I have prepared a dataset with Cuts as mentioned in the tutorial:

recs = CutSet(...)
trainset = lhotse.dataset.unsupervised.UnsupervisedWaveformDataset(recs)
sampler = lhotse.dataset.sampling.SimpleCutSampler(recs, max_cuts=16, shuffle=True)
trainloader = DataLoader(trainset, sampler=sampler, batch_size=None)

I want a batch_size=16 so I have defined the max_cuts argument. But, when I calculate for my training loop the total number of iterations as len(trainloader), I get the error TypeError: object of type 'SimpleCutSampler' has no len().
When I define my own samplers without lhotse there is always a method len that calculates the total number of batches, is this not implemented here?

@pzelasko
Copy link
Collaborator

You seem to have a very outdated example, I see now that I missed a few places to update in the docs.

Samplers don't support len() because of dynamic batch sizes in lhotse. In the general case, you can't know the exact number of iterations up-front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants