[PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) #427

cyberlord29 · 2020-11-25T09:15:56Z

Description

We have migrated from v 3.0.3 to 3.14.0 to get the table extraction features.

The extractText function behaviour has changed when the line is part of a table, it seems to parse each cell row by row and then move to the next column , instead of just spitting out the whole row as it did in the previous versions.

Please let me know if you are able to recognize this issue , I can add detailed screenshots etc if you aren't.

Thanks a lot.

github-actions · 2020-11-25T09:16:38Z

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

gunnsth · 2020-11-25T10:08:36Z

@cyberlord29 Can you define the problem and provide the actual files where the regression can be clearly seen. As such it is not unexpected that the output can change, as long as the quality, in terms of visual comparison is getting better. The engine is still under active development and being refined continuously.

cyberlord29 · 2020-11-25T11:55:57Z

@gunnsth So here is the PDF Screenshot ,

Here is the output of extractText() of versions v3.0.3 - v3.8.0

Here is the output of extractText() of versions v3.9.0 +

Can we still utilize the legacy behaviour in the newer packages ?

Tables is not an option here as PageText.Tables() is not able to properly parse the table as there are some sparse rows in between, (will add screenshots for those shortly).

Thanks.

gunnsth · 2020-11-25T12:00:15Z

Thanks, that makes sense, we will look into this and get back to you. Might make sense to have some options here as for some cases the tables make sense whereas others not.

cyberlord29 · 2020-11-25T12:21:01Z

@gunnsth yeah thanks , Please let us know , as we are deciding on a license renewal as well.

cyberlord29 · 2020-11-25T12:26:18Z

@gunnsth also please let us know if there is any dirty fix to get this behaviour 😅 , Thanks.

peterwilliams97 · 2020-11-25T21:47:15Z

Hi @cyberlord29
It looks like my table extraction code changes caused this problem. Those changes improve extraction of many other types of tables. We can give you a better experience by fixing those changes to work with your table than by reverting the changes,
Are you able to share the PDF file that contains the table with us?

cyberlord29 · 2020-11-25T22:39:56Z

@peterwilliams97 That sounds great , can you leave an email Id here so I can send it to you ?

peterwilliams97 · 2020-11-30T21:31:02Z

peter.wi

@peterwilliams97 That sounds great , can you leave an email Id here so I can send it to you ?

peter.williams.97@gmail.com

peterwilliams97 · 2021-01-11T11:16:07Z

Hi Maneesh Sorry for the late reply. This is my email.

…

---------------------------------------------- Peter Williams 0488 783 700 / +61 488 783 700

On Thu, Nov 26, 2020 at 9:40 AM Maneesh ***@***.***> wrote: @peterwilliams97 <https://github.com/peterwilliams97> That sounds great , can you leave an email Id here so I can send it to you ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#427 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAZKXMCFQEPOGOZIJZDBCDSRWBURANCNFSM4UCDYNKQ> .

cyberlord29 changed the title ~~[PROBLEM]~~ [PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) #427

[PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) #427

cyberlord29 commented Nov 25, 2020

github-actions bot commented Nov 25, 2020

gunnsth commented Nov 25, 2020 •

edited

cyberlord29 commented Nov 25, 2020

gunnsth commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

peterwilliams97 commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

peterwilliams97 commented Nov 30, 2020

peterwilliams97 commented Jan 11, 2021 via email

[PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) #427

[PROBLEM] ExtractText() Behaviour differences in new version( 3.0.3 to 3.14.0) #427

Comments

cyberlord29 commented Nov 25, 2020

Description

github-actions bot commented Nov 25, 2020

gunnsth commented Nov 25, 2020 • edited

cyberlord29 commented Nov 25, 2020

gunnsth commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

peterwilliams97 commented Nov 25, 2020

cyberlord29 commented Nov 25, 2020

peterwilliams97 commented Nov 30, 2020

peterwilliams97 commented Jan 11, 2021 via email

gunnsth commented Nov 25, 2020 •

edited