New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to import large CSV, no documentation or reason #901
Comments
Wanted to say the same thing. Running docker container capped at 10G ram. Here is a 2GB csv file download: https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD Tried both with and without sandboxing. With sandbox, I get this on logs pretty quick:
I guess there's a low implicit memory limit for the sandbox? With no sandbox, I get this:
and dmesg
So it's using more than 10GB ram to parse the 2GB CSV file. Let's give it 20GB... aaaand, boom
Hey, it guessed the headers now... Let's give it 40GB No more OOM, that's nice. Container process seems to be using at most 21GB, even through the container group itself peaks at 31GB. Still, after a couple minutes the UI breaks down: And there's no data in the Ok, let's cut down on the file size. After truncating to first 600k rows (180MB): Ram usage tops out at 10G (50X increase from file size). Seems to be fine so far - but then shows this screen with no data, clicking the "new table" button on the left starts this spinner. .... after 15min GREAT SUCCESS! Still, clicking "ok" jumps into another spinner... I guess I'll wait 15min more. Victory! We have almost a million chicago crimes now. Here's the final ram usage for 180MB csv file: Uploading 180MB csv takes 10GB server ram. Opening the doc then takes around 4GB of server ram, regardless of number of opened tabs for the same user. Finally, the sqlite file is 2.1GB (13x amplification):
After uploading finished, I restarted the container with sandboxing enabled and reading & searching it works (while server container still takes 2-4GB of ram). If the sandbox has a low ram limit, I guess these 2-4GB of memory would be used by the nodejs server part? I can see from this two things:
Both of these things can be fixed but it seems this was designed to fit everything into ram.... Related issues and comments
Yes but the crimes are happening at about 4MB CSV/day @fulldecent turn off sandboxing and give your container at least 16gb of ram and it might work like above. Then turn sandboxing back on again. Questions for the devs:
|
I am using Grist as recommended with this omnibus setup.
Importing a small CSV file was successful using this:
However, importing a large CSV file failed.
The file I need to load is: 200MB, 30 columns, 400,000 rows.
Work plan
The text was updated successfully, but these errors were encountered: