How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156

wang0331 · 2024-04-28T06:10:00Z

I tried using OpenPDF and appropriate fonts to export Khmer text, but the display results were not entirely correct. OpenPDF and Apache FOP seemed to solve the problem of drawing a single character with multiple byte connections, but there was an error in the drawing order

It is known that itext8 and the itext component pdfcalligraph can export Khmer PDFs normally

This is my example code from: https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts
`public class HelloWorld {

public static void main(String[] args) {

    // Register TrueTypeFont which supports Hindi
    FontFactory.register("C:\\Users\\xr\\Desktop\\fonts\\KhmerOS.ttf");
    Document document = new Document();
    try {
        PdfWriter.getInstance(document,
                new FileOutputStream("C:\\Users\\xr\\Desktop\\fonts\\openPDF.pdf"));

        document.open();
        document.add(new Chunk(
                "យើងខ្ញុំសូមថ្លែងអំ",
                FontFactory.getFont("Khmer OS", "Identity-H",false,10,0,null)));
    } catch (DocumentException de) {
        System.err.println(de.getMessage());
    } catch (IOException ioe) {
        System.err.println(ioe.getMessage());
    }
    document.close();

}

}`

I hope to receive the support of technical personnel

Can OpenPDF support exporting PDF files of Khmer text?
If supported, could you please provide me with relevant version information and usage methods to solve the problem?
If not, is OpenPDF interested in supporting this feature in the future?
Suggestions for implementing other functions

OS: windows，JDK 1.8.362 or JDK 17.0.9
Used font: Khmer OS
OpenPDF version:
<dependency> <groupId>com.github.librepdf</groupId> <artifactId>openpdf</artifactId> <version>2.0.2</version> </dependency> <dependency> <groupId>org.apache.xmlgraphics</groupId> <artifactId>fop</artifactId> <version>2.9</version> </dependency> <dependency> <groupId>org.apache.xmlgraphics</groupId> <artifactId>xmlgraphics-commons</artifactId> <version>2.9</version> </dependency>

From Wang Xueren

The text was updated successfully, but these errors were encountered:

vk-github18 · 2024-04-29T21:02:09Z

See also https://github.com/LibrePDF/OpenPDF/wiki/Accents,-DIN-91379,-non-Latin-scripts

wang0331 · 2024-05-09T08:33:41Z

Thank you very much for your answer, which has made great progress in my questions about Khmer PDFs everywhere!

It looks almost correct, but I noticed a small issue that OpenPDF may not have handled this scene well

Below, I will provide an example image. The OpenPDF version I am using is 1.3.43

`public class HelloWorld {

public static void main(String[] args) {
    LayoutProcessor.enableKernLiga();
    // Register TrueTypeFont which supports Hindi
    FontFactory.register("D:\\devwork\\thirddemo\\openPDF\\src\\main\\resources\\KhmerOSSiemreap.ttf", "khmerFont");


    Document document = new Document();
    try {
        PdfWriter.getInstance(document,
                new FileOutputStream("C:\\Users\\xr\\Desktop\\fonts\\openPDF.pdf"));

        document.open();
        document.add(new Paragraph(
                "បន្ថែមនេះនឹងមានសុពលភាពចាប់ពី  ថ្អែទី ២០ ខែ កញ្ញា ឆ្នាំ ២០២៣ តទៅ។ ក្រុមហ៊ុនមិនតម្រូវឲ្យលោកអ្នកធ្វើអ្វីបន្ថែមឡើយ ហើយបុព្វលាភរ៉ាប់រងរបស់",
                FontFactory.getFont("khmerFont", BaseFont.IDENTITY_H,false,10)));
    } catch (DocumentException de) {
        System.err.println(de.getMessage());
    } catch (IOException ioe) {
        System.err.println(ioe.getMessage());
    }
    document.close();

}

}`
correct：

wrong：

vk-github18 · 2024-05-09T11:32:09Z

@wang0331 , could you provide a smaller example only with the incorrect letters?

Please compare the output of OpenPdf/LayoutProcessor with the output of HarfBuzz hb-view, see https://github.com/harfbuzz/harfbuzz/releases/tag/8.4.0

wang0331 · 2024-05-10T07:08:35Z

@vk-github18

Thank you very much for your reply. For a minimum example, please refer to this:
ហ៍្វ

I compared the outputs of itext8+pdfcalligraphy, and the results they displayed were clearly correct

vk-github18 · 2024-05-10T20:04:00Z

The minimal example is rendered as

with OpenPdf /LayoutProcessor (2.0x trunk)
This should be correct.

wang0331 · 2024-05-11T09:26:05Z

The OpenPDF version I am using is 1.3.43

I did not use version 2.0. x of OpenPDF because I need to use the Java8 development environment to investigate whether OpenPDF can be integrated. If it is not supported in 1.3. xOpenPDF, can the reason be identified and adapted?

I tested 1.4.2 and 2.0.2, and they can export this character normally when paired with the corresponding version of JDK. Only 1.3.43 and Java8 cannot export this character properly. If 1.3. x is still being maintained, can I adapt it?

@vk-github18

vk-github18 · 2024-05-11T09:38:26Z

@wang0331, so you are not talking about displaying the characters in PDF, but about the extraction of text from the PDF file using a PDF viewer.

This task is quit complicated and the exported characters seem incorrect even with the current source code on GitHub.

OpenPDF (master branch, compiled on 2024-05-11)
ហ៍្ LayoutProcessor.setWriteActualText(); ហ៍្វ

Only the output with the experimental option
LayoutProcessor.setWriteActualText();
seems correct.

vk-github18 · 2024-05-11T13:03:51Z

Analysis:

Font used: https://fonts.google.com/specimen/Siemreap
Using LayoutProcessor the input: '0x17a0', '0x17cd', '0x17d2', '0x179c'
is converted by java.awt.Font.layoutGlyphVector to glyph array
[68, 111, 165]
These glyphs map to the following Unicode characters according to
GlyphOrder of the font (converted using ttx):

68  uni17A0
111 uni17CD
165 uni17D2_uni179C.zz02

The glyph 165 is a ligature and corresponds to two Unicode
characters.

The method java.awt.font.GlyphVector.getGlyphCharIndex does not return this correspondence.

I don't see a possibility to store a one to many
correspondence in the toUnicode map of TrueTypeFontUnicode.

So if the PDF text shown in a PDF viewer is selected and copied the last character is lost.

vk-github18 · 2024-05-12T10:30:40Z

Using Branch 1.3 or Branch 1.4 with LayoutProcessor I get a correct visual appearance

and incorrect text export ហ៍¥

wang0331 · 2024-05-13T03:40:51Z

Thank you for your patient answer! @vk-github18

But I think you may have misunderstood my meaning. I didn't try to copy the text from the PDF, I just tried to export the Khmer text copied from Microsoft Office Word correctly

I am unable to export the given minimum example correctly using Java8 and OpenPDF 1.3, but versions 1.4 and 2.0 are acceptable. If you successfully export this minimum Khmer language using 1.3, please provide your OpenPDF 1.3 code example

vk-github18 · 2024-05-13T21:01:58Z

@wang0331 , I tested the minimal example:

OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8
chars: 17a0 17cd 17d2 179c
glyphVector = awtFont.layoutGlyphVector(...)
glyphVector.getNumGlyphs()=5
glyphs: 68 111 694 165 65535
charIndizes=0
charIndizes=1
charIndizes=2
charIndizes=2
charIndizes=3

ttx/GlyphOrder
68 uni17A0
111 uni17CD
694 uni25CC ???
165 uni17D2_uni179C.zz02
65535 ???

The method awt.Font.layoutGlyphVector() in Java 1.8 seems to return incorrect results. Java 11 or newer are correct.
This is a problem with the built in Java classes in version 1.8. I don't see a way to deal with this.

vk-github18 · 2024-05-13T21:54:20Z

Using OpenJDK Java 1.8.0 OpenPDF Branch 1.3-Java8
with FOP dependency the result is:

I used
System.out.println(FopGlyphProcessor.isFopSupported()?"fop is supported":"fop is NOT supported");
to verify that FOP is found. (I had to use Project/Context Menu/Maven/Reload Project that IntelliJ found FOP)

See https://github.com/LibrePDF/OpenPDF/wiki/Multi-byte-character-language-support-with-TTF-fonts

wang0331 · 2024-05-14T02:42:11Z

@vk-github18
Unfortunately, I have imported two Maven dependencies for FOP using JDK 1.8 and OpenPDF 1.3, and the code shows that FOP is already supported.

However, there may still be issues with exporting PDF results. Can you share the code examples for the JDK version? I want to know if I missed some details myself

vk-github18 · 2024-05-14T21:33:51Z

@wang0331 , sure here is the example file:
App.java.txt

Running under Linux:
/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -cp lib/commons-io-2.16.1.jar:lib/commons-logging-1.3.1.jar:lib/fop-core-2.9.jar:lib/openpdf-1.3.43.jar:lib/xmlgraphics-commons-2.9.jar:target/openpdf-khmer-1.0-SNAPSHOT.jar khmer.App

wang0331 · 2024-05-15T02:01:49Z

@vk-github18
I used the code example you provided, but found a very interesting situation

If I don't use LayoutProcessor.enableKernLiga();, The display of 'ហ៍្វ' appears to be correct, but there may be problems exporting text from other Khmer words, resulting in the inability to use it properly

If I use LayoutProcessor.enableKernLiga();. The display of 'ហ៍្វ' is incorrect, but after my simple verification, the export of other Khmer text seems to be correct

Can I conclude that using jdk1.8 and OpenPDF 1.3. x, I am unable to fully export Khmer text correctly

vk-github18 · 2024-05-15T21:05:49Z

@wang0331 , I don't see a simple solution for Java 1.8.
Possibly you could create the PDF file directly with Apache FOP if you can't use a current Java version.

vk-github18 · 2024-05-15T21:51:34Z

Using FOP for your examples looks as follows
k.pdf
Input and configuration file:
fop.xconf.txt
k.fo.txt
fop -c fop.xconf -fo k.fo -pdf k.pdf

wang0331 added the bug label Apr 28, 2024

wang0331 closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156

How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156

wang0331 commented Apr 28, 2024

vk-github18 commented Apr 29, 2024

wang0331 commented May 9, 2024 •

edited

vk-github18 commented May 9, 2024 •

edited

wang0331 commented May 10, 2024

vk-github18 commented May 10, 2024

wang0331 commented May 11, 2024

vk-github18 commented May 11, 2024 •

edited

vk-github18 commented May 11, 2024 •

edited

vk-github18 commented May 12, 2024

wang0331 commented May 13, 2024

vk-github18 commented May 13, 2024

vk-github18 commented May 13, 2024

wang0331 commented May 14, 2024

vk-github18 commented May 14, 2024

wang0331 commented May 15, 2024 •

edited

vk-github18 commented May 15, 2024

vk-github18 commented May 15, 2024 •

edited

How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156

How can I use OpenPDF to make the exported PDF support Khmer language versions? #1156

Comments

wang0331 commented Apr 28, 2024

vk-github18 commented Apr 29, 2024

wang0331 commented May 9, 2024 • edited

vk-github18 commented May 9, 2024 • edited

wang0331 commented May 10, 2024

vk-github18 commented May 10, 2024

wang0331 commented May 11, 2024

vk-github18 commented May 11, 2024 • edited

vk-github18 commented May 11, 2024 • edited

vk-github18 commented May 12, 2024

wang0331 commented May 13, 2024

vk-github18 commented May 13, 2024

vk-github18 commented May 13, 2024

wang0331 commented May 14, 2024

vk-github18 commented May 14, 2024

wang0331 commented May 15, 2024 • edited

vk-github18 commented May 15, 2024

vk-github18 commented May 15, 2024 • edited

wang0331 commented May 9, 2024 •

edited

vk-github18 commented May 9, 2024 •

edited

vk-github18 commented May 11, 2024 •

edited

vk-github18 commented May 11, 2024 •

edited

wang0331 commented May 15, 2024 •

edited

vk-github18 commented May 15, 2024 •

edited