-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore shuffling of the validation set #123
base: main
Are you sure you want to change the base?
Conversation
thanks for the change. I looked over it and am ok with going with 2. One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state |
Thanks @odelalleau! I am fine with v2 as well, though maybe in this case, to make it clearer, we should document the |
Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something? |
I have a slight personal preference for v1 as it results in less code to maintain, and removes an extra API-binding-contract which an algo writer must obey in terms of providing the update directly to the dataloader/sampler, which is a bit of an odd dynamic. That being said, the comp sci professor in me prefers the explicit safety of v2, and notes that it is the philosophically more correct of the two. So honestly, hard to say, I would be happy with either approach. I note that we have 1 vote for v1, 1 vote for v2, and Olivier prefers v2, so to not create a deadlock, count me in for v2 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It is stateful but the trainer now overwrites the statefulness, so our trainer state determines everything. |
What does this PR do ?
This is a follow-up to #112 (comment) to restore the shuffling of the validation set.
I implemented two versions and would appreciate feedback from @gshennvm / @gleibovich-nvidia / @trias702 on what you all prefer:
consumed_samples
in the trainer manuallyPersonally I have a slight preference for v2 because I prefer explicit to implicit -- the implicit update of
consumed_samples
in the sampler can be dangerous (as we've experienced), and also leads to a weird behavior where thelen()
of the sampler decreases as we iterate over it (which is quite unusual for objects we typically iterate on). I realize however that this adds an extra line to the trainers and diverges from what happens within NeMo, so I'm ok with v1 as well if I'm in the minority here.I will also add the sister PR to NeMo but I want to get some feedback first to decide on the final implementation.
(currently marked as draft until we agree and I properly test this, but please do provide feedback now)