Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we get all PDF data into the String variable, instead of getting data page by page? #8

Open
Phannd7 opened this issue Oct 12, 2016 · 1 comment

Comments

@Phannd7
Copy link

Phannd7 commented Oct 12, 2016

Hi a. Tho,

Currently, I'm using "get" method to get PDF data from specific page. I wonder that can we get all PDF data at once instead of getting data page by page like that?
My code:

public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber) throws IOException {
PDFTableExtractor extractor = new PDFTableExtractor();
List

tables = extractor.setSource(pdfLink).extract();
// get date from page 1 to String html. Page number starts from 0
String html = tables.get(pagePDFNumber).toHtml();

    html = html.substring(html.indexOf("border='1'>") + 11);
    int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr");
    return rowNumber;
}

I would like to get all PDF data into "html" field. Could you please help?

Thanks,
Phan Nguyen

@thoqbk
Copy link
Owner

thoqbk commented Oct 14, 2016

Hi Phan Nguyen,

I think you can do it by getting the html content of tables in all pages
then use html parser such as Jsoup to parse table content and put them all
together. Or you can also loop through all table models which are result of
PDFTableExtractor.extract().

Sorry for my late reply.

Regards,
Tho Q Luong

2016-10-12 9:19 GMT+08:00 Phannd7 notifications@github.com:

Hi a. Tho,

Currently, I'm using "get" method to get PDF data from specific page. I
wonder that can we get all PDF data at once instead of getting data page by
page like that?
My code:

public static int rowNumberOfPDFFile(String pdfLink, int pagePDFNumber)
throws IOException {
PDFTableExtractor extractor = new PDFTableExtractor();
List
tables = extractor.setSource(pdfLink).extract();
// get date from page 1 to String html. Page number starts from 0
String html = tables.get(pagePDFNumber).toHtml();

html = html.substring(html.indexOf("border='1'>") + 11);
int rowNumber = org.apache.commons.lang3.StringUtils.countMatches(html, "/tr");
return rowNumber;

}

I would like to get all PDF data into "html" field. Could you please help?

Thanks,
Phan Nguyen


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#8, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABbAn2ZzaPOdx0HXzydDbJO0nisZvldnks5qzDW2gaJpZM4KURI4
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants