Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the "val" data-set is subset of "train" data-set? #36

Open
bhooshan-supe-gmail opened this issue Mar 12, 2020 · 12 comments
Open

Why the "val" data-set is subset of "train" data-set? #36

bhooshan-supe-gmail opened this issue Mar 12, 2020 · 12 comments

Comments

@bhooshan-supe-gmail
Copy link

Hi Xiaodong Yang, Zhedong Zheng,

I am planning to use your model in one of our experimental project as base model for transfer learning.
And while studying I have noticed that your "val" (validation) dtat-set is subset of "train" (training) data-set. (Refer https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111)

And I believe that it is quite against my understanding.
So kindly explain why you have decided to have " validation data-set as subset of training data-set" ?

@bhooshan-supe-gmail
Copy link
Author

BTW, I am software engineer at LG Electronics US.

@bhooshan-supe-gmail bhooshan-supe-gmail changed the title Why there is an "val" data-set is subset of "train" data-set? Why the "val" data-set is subset of "train" data-set? Mar 12, 2020
@layumi
Copy link
Contributor

layumi commented Mar 12, 2020

Hi @bhooshan-supe-gmail
Yes. Since the original dataset do not provide the validation set, we split the validation set from the training set.

@bhooshan-supe-gmail
Copy link
Author

@layumi
I am sorry to be nit picky but you have not split the data-set, but you have some part of training data-set duplicated as validation data-set.
On the other hand, I have made sure that in my data-set training and validation data-set are completely disjoint sets. And the side effect of that is my training and validation curves are not converging.
Please refer following image.
train

So I am wondering is this OK? Is this training reliable?

@layumi
Copy link
Contributor

layumi commented Mar 13, 2020

Hi @bhooshan-supe-gmail

  1. Please check this line https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111
    There are no-overlapping images between the training and validation set.
    If you use train-all, there will be the overlapping images.

  2. I do not know how you split the dataset. Actually, there are two ways to split the dataset.

  • One easy way is as shown in above. We select the first image of every class in the training set as the validation set. We evaluate the performance in a Classification style.

  • Another way is in Retrieval style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen by the model, the model could not classify the images of the 100 classes.

@bhooshan-supe-gmail
Copy link
Author

Hi @layumi

To be honest I am quite new to computer-vision and machine learning.
Thanks a lot for your guidance!

@bhooshan-supe-gmail
Copy link
Author

bhooshan-supe-gmail commented Mar 13, 2020

Hi @layumi ,

We have our own but very small data-set (about 21 person-ids but about 1500 images).
And I am fine tuning on your model using our data-set.
Basically we are looking into how we can re-identify person from almost top-view (from a very steep angle) instead of side and/or front view.

@layumi
Copy link
Contributor

layumi commented Mar 13, 2020

@bhooshan-supe-gmail
You may start from my tutorial, which is more straight forward https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial

And recently I release a dataset and code for satellite-view, drone-view, ground-view geo-localization.
You are welcomed to check out it. https://github.com/layumi/University1652-Baseline

@nikky4D
Copy link

nikky4D commented Apr 20, 2022

Another way is in Retrieval style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen by the model, the model could not classify the images of the 100 classes.

How would you go about adding this retrieval style evaluation? does it make sense here to add retrieval style evaluation in addition to classification evaluation which makes the model to classify images to person/object ids?

@layumi
Copy link
Contributor

layumi commented Apr 20, 2022

Hi @nikky4D
Sorry. What is 00 classes? Could you provide more descriptions?

@nikky4D
Copy link

nikky4D commented Apr 20, 2022

Sorry, I quoted it incorrectly, please see edited comment above

@layumi
Copy link
Contributor

layumi commented Apr 21, 2022

Hi @nikky4D

  1. Validation (Classification Setting)
    I write it with the training code. You do not need to modify the split.

  2. Validation (Retrieval Setting)
    If you want to evaluate on 651 / 100 split (751 ID in total), you need to modified the prepare data to split it.
    Since the id is random, I simply use the first 651 ID as train and late 100 ID as val.
    For validation on retrieval, you need to use the test.py to test the validation like the test setting.
    (The validation result during the training is not correct. )

@nikky4D
Copy link

nikky4D commented Apr 21, 2022

Thank you. Then for the teacher training, is it better to use the retrieval split or classification setting for a more robust dg-net setup or does the dataset setup not matter in the final model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants