Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic3wdb-matched RECORDS file hast too many entries #466

Open
tecamenz opened this issue Sep 22, 2023 · 1 comment
Open

mimic3wdb-matched RECORDS file hast too many entries #466

tecamenz opened this issue Sep 22, 2023 · 1 comment

Comments

@tecamenz
Copy link

We are trying to download the mimic3wdb-matched database via wfdb.io.dl_database like so:

wfdb.io.dl_database("mimic3wdb-matched", "mimic3wdb-matched", records='all', annotators='all', keep_subdirs=True, overwrite=False)

After a long wait, we get an error indicating a missing file:
wfdb.io._url.NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/3783537_10000.hea

While investigating we found that the corresponding RECORDS file contains more records than there are in the database:
https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/RECORDS

RECORDS file:
image

Actual content:
image

wfdb.io.dl_database generates unique urls using this RECORDS file which then leads to the mentioned error above.

Some questions:

  1. Can someone adapt the RECORDS file to reflect the database content
  2. The download via wfdb.io.dl_database is excruciating slow. Would it make sens to rewrite wfdb.io.dl_database to use multi-threading? Or what approach do you use to dump the whole database efficiently?
@bemoody
Copy link
Collaborator

bemoody commented Sep 29, 2023

Thanks for pointing this out. This is not a bug in wfdb-python, it's a bug in the database.

The RECORDS file is (probably) correct; the set of files on PhysioNet is wrong. It looks like some of the files are present in mimic3wdb but were not properly linked into the mimic3wdb-matched directory.

(One may also ask why on earth this record is split into over 10000 tiny segments. I have no idea.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants