Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dataset COVID-19 vs Normal #13

Open
elcronos opened this issue Mar 21, 2020 · 6 comments
Open

Improve dataset COVID-19 vs Normal #13

elcronos opened this issue Mar 21, 2020 · 6 comments
Labels

Comments

@elcronos
Copy link
Owner

elcronos commented Mar 21, 2020

We should try to find more images of COVID-19 and normal cases. Both CTs and Xray.

TO-DO:

  • Create and curate a dataset including images of COVID-19, and Normal cases.
  • Create folders for test, train and validation with their correspondent subfolders (normal, covid19) and split data accordingly.

Recommendations:

  • If you have to split the dataset it should ideally be train: 80%, test: 10%, validation: 10%
  • Images should be resized or crop such that their sizes are 224x224 or 300x300.
  • Please be consistent with the type of images. Only use .jpeg or .jpg images
@davidcp82
Copy link

Hi Camilo here is the link if the database https://stanfordmlgroup.github.io/competitions/chexpert/. Sorry I forgot to put it later.

@davidcp82
Copy link

https://www.ajronline.org/doi/full/10.2214/AJR.20.23034. Trying to find the image data

@divait
Copy link

divait commented Mar 23, 2020

Hi @elcronos, I love to help with this task if no one is working on this yet.

I have a startup called LinkedAI and we help people with all data issues, mostly with labeling, so we be more than glad to help you with this project.

You can assign me this issue to work on it.

@elcronos
Copy link
Owner Author

Yes, please do

@ayhyap
Copy link

ayhyap commented Mar 24, 2020

Hi Camilo here is the link if the database https://stanfordmlgroup.github.io/competitions/chexpert/. Sorry I forgot to put it later.

If you're using that dataset, remember to use the high-resolution images (439GB).
The compressed version (11GB) are not of diagnostic quality.

Unfortunately i've only been able to find 1 open source for COVID-19 positive CXRs/CTs:
https://github.com/ieee8023/covid-chestxray-dataset
The image format, resolution and quality is all over the place, so it will require cleaning.

There are some here as well, but not available for download:
https://bit.ly/BSTICovid19_Teaching_Library
You might have to ask for permission, see: https://www.bsti.org.uk/training-and-education/covid-19-bsti-imaging-database/

Here are some other CXR datasets, similar to Stanford's CheXpert:
NIH ChestXray
https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
PadChest (requires authorization)
http://bimcv.cipf.es/bimcv-projects/padchest/
MIMIC-CXR (requires authorization)
https://physionet.org/content/mimic-cxr/

@arthurfigueiredo
Copy link

I created a repository to gather chest x-ray and CT images, the goal is to create a collection that can be useful for other projects that are analyzing the covid-19 with computer vision.

https://github.com/arthurfigueiredo/covid-dataset/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants