New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to work with buffer/ReadableStream? #121
Comments
Many questions here. some are actually about hummus, but i think i can help with all of them: Amazon stream, downloading, using, writing to something that hummus can useTo download an amazon stream, you can use the getObject method and then based on that create a read stream that you can use as is or pipe to something else (like a file writing stream).
For hummus, due to the requirement of random access, you cant use readable streams directly and you'll have to either pipe this to a file, or to a memory buffer that can now be used as input to hummus. here's how you'd pipe the readstream into a file:
Parsing with hummus, getting the number of pagesparsing with hummus and getting things like the number of pages, is explained here. e.g.:
SplittingSplitting is done simply be creating a new PDF file per page in the original PDF and copying the original page content to the new page object in the new file. here's a sample script that you can use with a plain file:
|
Insert this code in a file and use it like a PDFRStreamForFile |
@galkahana Could you please give an example how can I use 'memory buffer' that you mentioned instead of a file? I've done a lot of googling/research but still cannot find any constructor for it exported from HummusJs. |
you got it here - PDFRStreamForBuffer |
This might be helpful for someone: http://stackoverflow.com/questions/42512982/node-js-get-the-first-page-of-pdf-buffer |
FYI, I ran into some performance issues with the example provided. It looks like Hummus will read bytes beyond the buffer, so I ended up doing this instead:
(If you're not using latest Nodejs or Babel, just use |
Thankyou so much for this code saved my day! God Bless |
Hi! I have to work with some pdf files coming from Amazon S3 as a buffer. I'm using the S3 getObject method, which returns the file as a buffer in the body property:
How can I parse it? I want to get the total number of pages and then split the whole pdf in separate pages.
Thanks!
The text was updated successfully, but these errors were encountered: