Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing MIMIC-III in a local Postgres database is slow #181

Closed
postgres-newbie opened this issue Feb 28, 2017 · 22 comments
Closed

Installing MIMIC-III in a local Postgres database is slow #181

postgres-newbie opened this issue Feb 28, 2017 · 22 comments

Comments

@postgres-newbie
Copy link

Hi !

I'm trying to load the MIMIC-III data into a local postgres database by following instructions from this link: https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/

So far despite leaving it overnight, it consistently hangs at this stage:

$ psql -f postgres_load_data.sql -U mimic -v mimic_data_dir='/Documents/MIMIC_III/'
SET

COPY 58976

COPY 34499

COPY 7567

Here's the configuration of my machine:
MacBook Air (13-inch, Early 2014)
Processor: 1.7 GHz Intel Core i7
Memory: 8 GB 1600 MHz DDR3

How long should it take to load this data on a machine with my configuration tentatively? The website states it might take several hours, but I didn't find any explicit benchmarking information.

Should I attempt to load this local instance using a machine with more RAM available?

I look forward to your guidance on this. Thanks!

@postgres-newbie
Copy link
Author

Mac OS - Sierra, version 10.12.3

@postgres-newbie
Copy link
Author

MIMIC-III version 1.4

@tompollard
Copy link
Member

The first few tables are loading, so you are going in the right direction. Loading MIMIC may take a while, particularly the chartevents table, which is the point that you get to. Are you sure that you have enough disk space on the Macbook Air? You'll need around 90GB of free space for the database.

@postgres-newbie
Copy link
Author

Thanks for letting me know I would need to have 90GB of free space available for the database. I have 389 GB available, so space isn't an issue.

When you were loading it, could you please tell me what your machine configuration was and how long it took to load MIMIC in postgres?

@tompollard
Copy link
Member

I have loaded it on several different machines, but the nearest system to yours is a 2013 Macbook Pro 2.9 GHz Intel Core i5 with 16GB RAM and 1TB solid state disk. I don't recall exactly how long the build takes, but overnight is usually sufficient.

As your system is lower spec, you may need to leave it a little longer. Alternatively, try building on a higher spec machine or if you are just looking to explore the data then check out the MIMIC querybuilder: https://mimic.physionet.org/gettingstarted/querybuilder/

@tompollard
Copy link
Member

Hi Krupa, as you posted the question at #182, I assume this issue is now resolved.

@alistairewj
Copy link
Member

alistairewj commented Mar 1, 2017

Just to add to this, I just built MIMIC with Postgres on a mid-2012 MacBook Pro with 8GB of RAM.

image

The longest single command was inserting data into chartevents, which took almost 4 hours. I imagine the entire build did not take longer than ~6 hours (I ran it overnight). I highly recommend disabling any hibernation/sleep that your computer automatically performs as that may interrupt the build. For Mac OS X the application "caffeine" installable via Homebrew is really useful for this purpose.

@postgres-newbie
Copy link
Author

Thanks so much for your help Dr. Pollard and alistairewj!

@lbulgarelli
Copy link
Member

@alistairewj @tompollard Maybe something like this may be useful: (https://github.com/ossc-db/pg_bulkload)

@brokejoker
Copy link

I have the same problem - the postgres_load_data.sql has been running for two days straight! The tables are getting populated as I had the same printout as postgres-newbie above. I also checked inside pgadmin4, and select limit returns a few rows of admissions, callout, and caregivers tables, but nothing for chartevents or chartevents_{N} tables.

When I had halted it the first time (I started it again) it was at around 40 million rows after 1 day, which means it'll take a week to load the whole thing! Do you have any other suggestions for loading this csv into postgres? For my day job, I use spark dataframes, which can easily load 1 billion+ row csv into a df, so this is quite a surprising task for me! Appreciate any and all help you can provide!

@alistairewj
Copy link
Member

alistairewj commented Dec 6, 2017 via email

@brokejoker
Copy link

command line via:
psql 'dbname=mimic user=rohunkshirsagar options=--search_path=mimiciii' -f postgres_load_data.sql -v mimic_data_dir='/Users/rohunkshirsagar/Documents/mimic-iii/data_files'

@eruca
Copy link

eruca commented Dec 6, 2017

similar to my situation, I load the data to the postgresql on windows10 for more than 72hours, I found the chartevents and it's subtable full of data(330712483 rows), but I found the process is stuck for a day, the tables after the table chartevents is empty, and I check the postgresql server is not running(CPU load is near to 0, before about 33%), I don't know how to do, stop it which took me so many times?

@alistairewj
Copy link
Member

There's quite a few things you can do to speed up the import, but I'd check first that your computer isn't hibernating or something similar. It only takes ~4 hours on my laptop, which is nearly 5 years old now. My Windows PC is ~3 years old and it took a similar amount of time.

If you are still having issues, there are a lot of suggestions which you could implement here: https://stackoverflow.com/questions/12206600/how-to-speed-up-insertion-performance-in-postgresql

@eruca
Copy link

eruca commented Dec 10, 2017

It has took more than 24 hours for load data to the postgresql on the macbook pro 2016 i5-8G Ram, a single process postgre with one thread is running.
I follow the tutorial https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/, and install caffiene, why windows 10 and macbook pro both take so many time ? any configure for the postgresql?

@eruca
Copy link

eruca commented Dec 10, 2017

I both install postgresql 9.6.6 on macbook pro and windows10 with default configuration.

@tompollard tompollard changed the title Installing MIMIC-III in a local Postgres database takes forever!!! Installing MIMIC-III in a local Postgres database is slow Dec 11, 2017
@alistairewj
Copy link
Member

I honestly don't know what to say, aside from "well, it works for me!". Your computer sounds more than capable of importing the data so I don't think the issue is with the scripts here or specifically to do with MIMIC-III.

@gscfwid
Copy link

gscfwid commented Dec 13, 2017

I have the same problem in my macbook pro. But I had ever built the database in my PC. And my solution was to backup the database and restore it in my MBP, using 'pg_dump dababase -U username -f dbdump.sql' and 'psql -U username -d database -f dbdump.sql' in the cmd and terminal respectively.

@sanfordbaran
Copy link

I'm running on macOS Sierra on a fairly new iMAC and Postgres 10. What fixed the extreme slowness for me was to use the 'postgres_create_tables_pg10.sql' script to create the Tables instead of the postgres_create_tables.sql script. Since I was using the Makefile to build everything, I edited lines 75 and 115 in the Makefile, substituting 'postgres_create_tables_pg10.sql' for 'postgres_create_tables.sql. My load script then ran in about 2 hours!

@alistairewj
Copy link
Member

alistairewj commented Dec 22, 2017 via email

@david-perez
Copy link

Just wanted to weigh in.

Took 29 hours to run setup.sh within a Docker container. I suspect the reason why it took so long is that the data was written to an HDD (it takes up 71 GiB with indices). Also, the fact that I was unable to get Docker to utilize more than 1 CPU core (on an i7 4770k) when running the COPY commands from postgres (or anything else for that matter).

@alistairewj
Copy link
Member

See #362 where we are discussing this as it's likely related to a recent change around partitioning chartevents.

We are still testing the speed of the build though so would be great to get your times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants