Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore shuffling of the validation set #123

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

odelalleau
Copy link
Collaborator

What does this PR do ?

This is a follow-up to #112 (comment) to restore the shuffling of the validation set.

I implemented two versions and would appreciate feedback from @gshennvm / @gleibovich-nvidia / @trias702 on what you all prefer:

  • v1 (first commit 2b24550) is the more "minimal" version
  • v2 (second commit 1c02cdd) enforces setting consumed_samples in the trainer manually

Personally I have a slight preference for v2 because I prefer explicit to implicit -- the implicit update of consumed_samples in the sampler can be dangerous (as we've experienced), and also leads to a weird behavior where the len() of the sampler decreases as we iterate over it (which is quite unusual for objects we typically iterate on). I realize however that this adds an extra line to the trainers and diverges from what happens within NeMo, so I'm ok with v1 as well if I'm in the minority here.

I will also add the sister PR to NeMo but I want to get some feedback first to decide on the final implementation.

(currently marked as draft until we agree and I properly test this, but please do provide feedback now)

@gshennvm
Copy link
Collaborator

thanks for the change. I looked over it and am ok with going with 2.

One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

@gleibovich-nvidia
Copy link
Collaborator

Thanks @odelalleau!
I actually prefer v1, as I find this version clearer and more explicit in terms of the difference between val/train dataloader use-cases. It explicitly defines an extra parameter to the dataloader instantiation here.
For instance, for me, when I first worked with PPO with Aligner, I was surprised to see the same single batch being used for validation again and again as it turned out from the wandb experiment. Observing the explicit difference in the dataloaders parameters would be clearer in this sense, for a new user. With v2, I think it would be harder for a new user going through the code to grasp what's going on between train and val use-cases.

I am fine with v2 as well, though maybe in this case, to make it clearer, we should document the val_dataloader irregular usage (with regard to creating a new iterator every validation run) around the dataloader instantiation or in the run_validation method?

@trias702
Copy link
Collaborator

thanks for the change. I looked over it and am ok with going with 2.

One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something?

@trias702
Copy link
Collaborator

I have a slight personal preference for v1 as it results in less code to maintain, and removes an extra API-binding-contract which an algo writer must obey in terms of providing the update directly to the dataloader/sampler, which is a bit of an odd dynamic.

That being said, the comp sci professor in me prefers the explicit safety of v2, and notes that it is the philosophically more correct of the two.

So honestly, hard to say, I would be happy with either approach. I note that we have 1 vote for v1, 1 vote for v2, and Olivier prefers v2, so to not create a deadlock, count me in for v2 as well.

Copy link
Collaborator

@trias702 trias702 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gshennvm
Copy link
Collaborator

thanks for the change. I looked over it and am ok with going with 2.
One thing that is nice is that now only our trainer is abstractly stateful, the data sampler is no longer keeping any state

Only for the validation sampler, the training sampler is still stateful, unless I'm misunderstanding something?

https://github.com/NVIDIA/NeMo-Aligner/pull/123/files#diff-37bc718ace18b8050161b5dd54c54ac6fd8e58c874db0b3ca4a8649bef80095bR212

It is stateful but the trainer now overwrites the statefulness, so our trainer state determines everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants