Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison in MIMIC-CXR dataset #16

Open
Markin-Wang opened this issue Sep 13, 2023 · 3 comments
Open

Comparison in MIMIC-CXR dataset #16

Markin-Wang opened this issue Sep 13, 2023 · 3 comments

Comments

@Markin-Wang
Copy link

Hi, thanks for your work.

I have a question about the comparison to previous works in MIMIC-CXR dataset.

Previous methods in report generation utilized the official MIMIC-CXR data split to report the report generation results.

Nonetheless, your work uses the Chest ImaGenome v1.0.0 data split which is different from the MIMIC-CXR data split.

Therefore, rgrg in report generation experiments seems not comparable to previous works?

I am grateful if you could provide more information on this and sorry if I misunderstand the testing procedure.

@ttanida
Copy link
Owner

ttanida commented Sep 16, 2023

Hi,

Thank you for your question.

You're right in noting that we utilized the Chest ImaGenome v1.0.0 split instead of the MIMIC-CXR split. However, since both splits come from the same underlying dataset, they should inherently have a similar data distribution, ensuring the comparability of our results with previous studies.

Best,
Tim

@Markin-Wang
Copy link
Author

Markin-Wang commented Sep 16, 2023

Hi,

Thank you for your question.

You're right in noting that we utilized the Chest ImaGenome v1.0.0 split instead of the MIMIC-CXR split. However, since both splits come from the same underlying dataset, they should inherently have a similar data distribution, ensuring the comparability of our results with previous studies.

Best, Tim

Hi Tim,

Thank you for your reply.
However, I respectfully disagree with you claim as the dataset split in MIMIC-CXR seems not random. The data distribution seems a bit different from the training and validation set. For example, in the paper releasing the dataset, they mentioned that "The test set contains all studies for patients who had at least one report labelled in our manual review." In addition, as shown in Table 3, only ~69% patients in the training /val set have the findings, while this figure is 98.3% in the test set. Moreover, the average length of report on the MIMIC-CXR split test set is 66.4 while 53 and 53.05 in the training and validation set as shown in paper.

@fuying-wang
Copy link

fuying-wang commented Oct 17, 2023

Thanks for the awesome work!

I have also noticed that previous splits contain lateral view images, which may also make the data distribution slightly different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants