Line break within a word leads to 'non-searchability' #1131

nordleuchte · 2019-10-23T09:30:04Z

Hi everybody,

redirected here from Asciidoctor people, because this seems to be an issue with Prawn.

I would like to point out a problem that I noticed during the rendering into a PDF document.

In my asciidoc document I import some csv tables which contain very long property names like the following:

webcontroller.outboundservice.cmis.repositoryId

The problem is that these properties are too long to be displayed within a cell of the table column - the line is therefore wrapped at the end.
In the rendered PDF the table cell looks like this:

webcontroller.outboundservice.cm
is.repositoryId
If I now search for this property in the created pdf document, it cannot be found, because the line break "cuts" the text. So the PDF document interprets each line as a separate text.

My first thought was that it was the PDF format itself. But if I use the same table in Microsoft Word and save it as "accessible" pdf, I can find the property as "whole word".

Are there any workarounds to fix this problem in Asciidoctor/Prawn?

Thanks in advance

mojavelinux · 2019-11-13T10:22:55Z

According the the PDF specification (14.9.4 Replacement Text), the recommended way to deal with this situation is to use the ActualText tag. This is also used to make shy hyphens invisible when searching.

@McFly83 it would probably help if you provided a simple PDF document that uses these feature so the source can be studied.

nordleuchte · 2019-11-14T12:35:10Z

I'm not familiar with the PDF specification or any technical details about that. However, I created a PDF using MS Word that contains a text that is automatically broken into multiple lines but is still searchable. Find it attached. Hope this helps.
MultiLineText.pdf

mojavelinux · 2019-11-14T20:57:52Z

is still searchable

This seems highly dependent on the PDF viewer. The feature you're describing does not work in evince or the default PDF viewer on Windows. It does work in Adobe Reader, but in a very quirky way. Instead of highlighting the search term, it highlights the whole span of text in which the word is found. That tells me that it's searching the accessible text, then highlighting the rendered text with which the accessible text is associated. It seems to have something to do with /StructElem, but I can't find where the text is located in that PDF.

mojavelinux mentioned this issue Nov 13, 2019

Line break within a word leads to 'non-searchability' asciidoctor/asciidoctor-pdf#1253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line break within a word leads to 'non-searchability' #1131

Line break within a word leads to 'non-searchability' #1131

nordleuchte commented Oct 23, 2019

mojavelinux commented Nov 13, 2019

nordleuchte commented Nov 14, 2019

mojavelinux commented Nov 14, 2019

Line break within a word leads to 'non-searchability' #1131

Line break within a word leads to 'non-searchability' #1131

Comments

nordleuchte commented Oct 23, 2019

mojavelinux commented Nov 13, 2019

nordleuchte commented Nov 14, 2019

mojavelinux commented Nov 14, 2019