Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in train_ful, test_full, dev_full files #24

Open
sajidaraz opened this issue Mar 13, 2021 · 4 comments
Open

Issue in train_ful, test_full, dev_full files #24

sajidaraz opened this issue Mar 13, 2021 · 4 comments

Comments

@sajidaraz
Copy link

I prepared the data following dataproc_mimic_III.ipynb file and i got six file i.e train_50, test_50, dev_50, train_full, test_full, dev_full. I am facing problem with train_full, test_full and dev_full such that train_full contain 8686 unique labels, test_full contain 4075 unique labels and dev_full contains 3009 unique labels. I don't know why labels are not of equal size in each file and now how to make them of equal size so that I can train my model.

kindly help me

@airingzhang
Copy link

This is because there are some of the codes only occur once. So none of the three splits contains all unique codes.

@sajidaraz
Copy link
Author

can you kindly guide me on how to make these labels of equal size? so that we can train a model because the model does not accept the different sizes of labels in y_train and y_test, y_valid.

@airingzhang
Copy link

I am not the author. BUT, I guess this is actually the setting of this task (full label scenario) that training set does not see all the unique labels.

@monk1337
Copy link

monk1337 commented Oct 10, 2021

@sajidaraz @sarahwie Have you found the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants