Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset in table 5 #13

Open
freedom6927 opened this issue Feb 10, 2022 · 11 comments
Open

dataset in table 5 #13

freedom6927 opened this issue Feb 10, 2022 · 11 comments

Comments

@freedom6927
Copy link

Hi, could you release the dataset in Table5, thank you.

@TheShadow29
Copy link
Owner

@freedom6927 please see https://github.com/TheShadow29/zsgnet-pytorch/blob/master/DATA_README.md

You need to download the annotation files in the drive link.

Let me know if you run into any error.

@freedom6927
Copy link
Author

freedom6927 commented Feb 11, 2022 via email

@TheShadow29
Copy link
Owner

@freedom6927 it is under vg_split / csv_dir.

For training and validation, you could use training_balanced.csv and val_balanced.csv

For testing, you would use test_balanced_c2.csv and test_balanced_c3.csv respectively.

In table 5, we train on train_balanced, and validate using val_balanced and finally test it on test_balanced_c2.csv(VG-2B) andtest_balanced_c3.csv` (VG-3B)

For the distances, you would need the object name from test_...csv file and find the closest object in the training set and then simply bucket them into 3-4, 4-5 and so on, and get results for those each subset.

Let me know if that answers your question.

@freedom6927
Copy link
Author

freedom6927 commented Feb 12, 2022 via email

@freedom6927
Copy link
Author

freedom6927 commented Feb 12, 2022 via email

@TheShadow29
Copy link
Owner

@freedom6927 sorry for the late reply, we only used the balanced validation set.

@freedom6927
Copy link
Author

freedom6927 commented Feb 15, 2022 via email

@TheShadow29
Copy link
Owner

@freedom6927 Sorry, I don't understand your question. What do you mean by dataset division?

@freedom6927
Copy link
Author

freedom6927 commented Feb 15, 2022 via email

@TheShadow29
Copy link
Owner

@freedom6927 I don't have it with me, but something along the following lines of code should be sufficient:

train_df = ... # read train csv
test_df = ... # read test csv

train_objects = train_df['object_name'].unique()
test_objects = train_df['object_name'].unique()

glove_emb = ... # read glove embeddings

train_obj_emb = glove_emb(train_objects)
test_obj_emb = glove_emb(test_objects)

test_dist_dict = {}
for test_obj in test_objects:
    # find closest train object
    closest_train_obj = ....
    closest_train_obj_dist = ....
    test_dist_dict[test_obj] = closest_train_obj_dist

# bucket by distances

# compute scores for each object

Let me know if this answers your question.

@freedom6927
Copy link
Author

freedom6927 commented Feb 15, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants