Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIMIC Waveforms #86

Closed
joel1391 opened this issue Mar 10, 2016 · 13 comments
Closed

MIMIC Waveforms #86

joel1391 opened this issue Mar 10, 2016 · 13 comments

Comments

@joel1391
Copy link
Contributor

On the MIMIC-II query builder there were a couple of tables related to the MIMIC waveform database. Is this something that will be implemented in MIMIC-III?

Also, are there any plans to update the waveform database with more matched patients and new waveforms?

@alistairewj
Copy link
Member

There are plans to update the matched database. The matching process is still ongoing.

Regarding the waveform tables, I'm not convinced that was the simplest method of distributing the matches. While we will release a map of some form, it may not be in the form of relative database tables.

I will leave this issue open for now and re-address it when there is an update about the waveforms.

@alistairewj
Copy link
Member

alistairewj commented Jun 16, 2016

I've attached a sample of matched headers for MIMIC-III patients here: if you have time, could you comment on whether this is a useful format, and whether you think additional information (HADM_ID, ICUSTAY_ID) would make things easier. See here for how to use matched waveform headers: http://physionet.org/physiobank/database/mimic2wdb/matched/

We do not currently plan to add tables to the MIMIC-III clinical database to match to the waveforms, but we do plan on releasing headers, such as those in the above file.

@joel1391
Copy link
Contributor Author

joel1391 commented Jul 6, 2016

Thanks Alistair, I think the most important thing for the header would be the ICUSTAY_ID, as that indicates when the patient was admitted to the hospital. The current date listed in the headers is when the actual recording starts as opposed to the date of ICU admission. So if we have the ICUSTAY_ID, I should be able to link the rest of the patient data from there.

Could there be any cases where there is a recording but no ICUSTAY_ID associated with it?

@alistairewj
Copy link
Member

alistairewj commented Jul 6, 2016

Yes, there are. ICUSTAY_ID and the waveform records are collected independently. We have to map them back and that's not always trivial. There is a host of issues that can happen (different clocks, waveform records with erroneous medical record numbers, alignment issues, ...). Also, minor correction, the ICUSTAY_ID starts when the patient enters the ICU, not the hospital. The HADM_ID is associated with the hospital.

From my calculations around 73% of records have an ICUSTAY_ID, and 87% have an HADM_ID.

Here's a map of the above headers to ICUSTAY_ID/HADM_ID: mimic-iii-matched-waveforms-sample.xlsx

@Dubrzr
Copy link

Dubrzr commented Mar 10, 2017

Hi,

I work with @parisni at APHP on MIMIC3 data.
I just found this csv file : https://physionet.org/physiobank/database/mimic3wdb/matched/matched_waveform_info.csv and I would like to know if it is the definitive version of the matches between the waveforms and the HADM_ID/ICUSTAY_ID? Also, can you explain what are the 'hadm_overlap', 'icustay_overlap', 'rih' and 'rii' columns?

The page https://mimic.physionet.org/mimicdata/waveforms/ indicates that the work is not finished yet but it seems to be finished.

In the issue #166, @tompollard states that "The waveform database for MIMIC-III has not yet been released, but we are working on it.", however, it seems to be available at /mimic3wdb.

Thanks! :)

@tompollard
Copy link
Member

Thanks for highlighting this @Dubrzr. Essentially @alistairewj created a header file to match previously released waveforms to the MIMIC-III clinical data, but no additional waveforms have been released yet. We'll update documentation etc to clarify this point.

@Dubrzr
Copy link

Dubrzr commented Mar 13, 2017

Thanks for your answer and also for your work! :D

I am working on getting all the data in the .hea header files to put it in a database and I would like to know if it could be interesting to merge this work in this repository.

It works like this:

  1. Download all .hea files from Physionet into a local directory:
mimic3wdb/
  s00020/
    3544749_0001.hea
    3544749_0002.hea
    3544749_0003.hea
    3544749_0004.hea
    3544749_0005.hea
    3544749_0006.hea
    3544749_0007.hea
    3544749_0008.hea
    3544749_layout.hea
    s00020-2183-04-28-17-47.hea
    s00020-2183-04-28-17-47n.hea
  s00033/
    ....
  ....

  1. Download the matched_waveform_info.csv to get information about each record
  2. Extract all information from all .hea files (each sxxxxx-yyyy-mm-dd-hh-mm{n}.hea file corresponds to one record, and each file listed in this header corresponds to one entry)
  3. Write metadata from the csv file and the .hea files to two separated new csv files:
  • wfr.csv which contains one row by record
  • wfe.csv which contains one row by entry
> wfr.csv: record_id, subject_id, starttime, endtime, starting_hadm, ending_hadm, starting_icustay, ending_icustay, hadmmatch, icumatch, rih, rii, hadm_overlap, icustay_overlap, comments
> wfe.csv: record_id, type, segment_index, start_datedatetime, end_datedatetime, nsamp, nsig, fs, fmt, sampsperframe, skew, byteoffset, gain, units, baseline, initvalue, signame, comments

My scripts are available here: https://github.com/Dubrzr/mimic3-scripts

If you are interested in the resulting files, ask me.

@Dubrzr
Copy link

Dubrzr commented Mar 15, 2017

Hi,

While exploring the data gathered with my script, I found erroneous dates in header files.

Only headers of numerics (s*n.hea) have this problem, for example, in the following file https://physionet.org/physiobank/database/mimic3wdb/matched/s00052/s00052-2191-01-10-02-21n.hea, the date is 14/03/3036 while the filename indicates that the date is 10/01/2191.

There are 888 numerics headers with this problem.
For the files concerned, can I assume that the date in the filename is the correct one? It seems to be concordant with the admission table.

There are also header files that are totally wrong:

You can see all the files with those problems here: https://gist.github.com/Dubrzr/6a22ae48980a549cc5883f3750ec0578

The script that generated this output is here: https://github.com/Dubrzr/mimic3-scripts/blob/master/headers_checker.py

Thanks!

@alistairewj
Copy link
Member

Thanks for the bug report. I will be fixing the data later today - it was a sloppy regex! The date in the filename is the correct one. I'll post again when the data is updated on PhysioNet.

Regarding the crazy years, there are four of them to my knowledge:

  • s27446/s27446-8838-01-26-18-03
  • s27446/s27446-8838-01-26-18-03n
  • s29799/s29799-8921-03-11-17-16
  • s29799/s29799-8921-03-11-17-16n

No idea why the years are ridiculous. Probably a bad setting on a monitor. I would just exclude them like you're doing.

@alistairewj
Copy link
Member

The matched header files on PhysioNet should be updated. Specifically, you should only need to redownload the s#####*.hea files. Let me know if you succeed with your next iteration of the script!

Regarding your scripts, I do think they'd be of interest to the community, but we'd have to think about where best to put them. For now I would tag your repository with mimic-iii and physionet which should help some.

@tompollard
Copy link
Member

Just a quick update: we are pleased to say that a new batch of matched waveforms are being uploaded to PhysioNet right now (~10k patients in total). Once the waveforms are uploaded and checked, they will be made available for analysis.

@Dubrzr
Copy link

Dubrzr commented Jul 28, 2017

This is a super-exciting announcement! Thanks a lot for both of your work!

@tompollard
Copy link
Member

@bemoody and @cx1111 are the guys to thank for this - we'll pass on your praise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants