Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I change sheet to stream and save it into workbook #46

Open
mut0u opened this issue Apr 15, 2016 · 3 comments
Open

How can I change sheet to stream and save it into workbook #46

mut0u opened this issue Apr 15, 2016 · 3 comments

Comments

@mut0u
Copy link

mut0u commented Apr 15, 2016

I have a big data to save.

So I have to loop to call add-rows! to sheet and then save the sheet to workbook

I guess the data is too big and the clojure throw OutOfMemoryError GC overhead limit exceeded Exception.

So I have to change the sheet into stream and save the workbook with outputstream .

What should I do?
Thanks.

@bagl
Copy link

bagl commented Apr 16, 2016

I would also appreciate a tip how to handle big data.

@mjul
Copy link
Owner

mjul commented Apr 18, 2016

Thanks for the feedback. It sounds like an interesting use cases.

The current stream story of Docjure is just to perform stream IO: the document is still built up in-memory.

I have not run into the memory-problem myself so I can offer no better advice than throwing more memory at it, or rolling up the sleeves and adding streaming to Docjure.

The underlying Apache POI library supports a limited streaming model for big datasets so please have a look at that and see if you can find a way to let Docjure leverage it.

You will find the documentation here: POI documentation - in particular the streaming API, SXSSF

@mut0u
Copy link
Author

mut0u commented Apr 18, 2016

I have a big data from database. I use the SQL limit offset and loop to handle a litte part of data each time. But I have no idea that the data reference maybe exist all the time and the gc will never destroy the expired data. So I use 30G memory for create 20M xlsx file.

I am trying to rewrite my code to find out the way to solve my problem.

At the very beginning, I guess the create sheet will use lots of memory, so I want it can run like stream.

Finally, I figure out that it is due to my code .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants