Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Fonts #128

Open
pierre1451 opened this issue Jun 7, 2022 · 18 comments
Open

Unicode Fonts #128

pierre1451 opened this issue Jun 7, 2022 · 18 comments

Comments

@pierre1451
Copy link

Hello Andre, thanks for making and sharing PDFGen. It's clean, small, without dependencies. Great. I see you looked into adding fonts to the PDF. Any update / beta / idea on loading and using fonts supporting unicode characters? Pierre

@AndreRenaud
Copy link
Owner

Hi Pierre. Unfortunately I haven't had a look at this in any real depth. I think it requires quite a lot of font-decoding code to do things like determine character bounding boxes, which I haven't looked at. Unfortunately the built-in PDF fonts are super limited in terms of character set. Do you have a specific requirement?

@pierre1451
Copy link
Author

Here's the story: we have a web app, backend in PHP and C/C++, that needs to produce PDF reports and certificates. Nothing fancy, but with some customization at user level, images, tables, and all kinds of languages. Usually few pages. We used TCPDF, got stuck, moved to wkhtmltopdf, stuck again (project stopped / CSS issues). I'm looking at headless chrome, it works, but it's enormous (268 MB), and feels like an overkill. I'm a low level developer, I like the idea of a simple API. All in all, PDFGen is really close, the ideal improvement would be to load a font (windows for me), and 'pdf_add_text' a unicode string. The other capability, but for later, might be transparency on images: I had to convert a RGBA png to RGB to get the image to render. It's very useful for watermarks and signatures.

@AndreRenaud
Copy link
Owner

Yeah, that sounds reasonable. At the moment I don't have loads of time to look at this feature. If you want to have a go at it, I'm able to assist, but I doubt I'll have time to write it myself in the near future.

@AndreRenaud
Copy link
Owner

As a note on this (possibly to my own future self for implementation). There are some details on how this works in this stackoverflow answer - https://stackoverflow.com/questions/3488042/how-can-i-extract-embedded-fonts-from-a-pdf-as-valid-font-files
We could probably use a hugely cut down version of STB Truetype (https://github.com/nothings/stb/blob/master/stb_truetype.h), with all of the rendering removed, to just extract the font metadata (basically we just need to work out the glyph widths). Another option would be to initially just ignore widths, so that if you're using a custom font you can't do things like word wrapping. This would be a bit poor, but at least it would let you render single lines of text in a custom font. That would mean we wouldn't need STB Truetype at all (I think). It's possible that this implementation is fairly small.

@pierre1451
Copy link
Author

Hi Andre. Doing my homework on PDF: got the spec, v1.7 (2006), got the 'hello world' working, I understand better what you did (objects / offsets). Next for me: see how a 'Hello World' with an embedded ttf font looks like.

@AndreRenaud
Copy link
Owner

Sounds great. If you put together any example stuff, please push it up to a branch/repo on Github and we can discuss it there. After looking at the details in the spec, I think this might be less work than I'd initially worried. I'll try and have a poke around next week if I can find some time.

@pierre1451
Copy link
Author

Hi Andre, I see you're running with this improvement! Going through STB now, good find: I looked into something like that some time ago but fell back to Windows GDI to print text in a bitmap. The ramp-up to embedded fonts in PDF is steep (for me): My ABC in Arial Narrow is reasonably small, but I have to understand the ttf format now, and how the relevant glyphs are extracted. I'd like to be more helpful, but you're running too fast! I'm offline this weekend but let's connect next week, my email is pmissud@spectrum.center

@AndreRenaud
Copy link
Owner

AndreRenaud commented Jun 11, 2022

I had a poke around. It looks like we'd need to extract the font metadata regardless, which essentially means we would need STB TrueType. I dropped all the rendering aspects of it, and it comes in at around 1000 lines, which seems acceptable to me. There is also some question about how UTF-8 text strings get encoded inside the PDF document. It might need to become UTF16, but I'm not clear on it yet. If you've put together a 'hello world' pdf, with embedded fonts, please send it through here, or to me at andre@ignavus.net.

@pierre1451
Copy link
Author

Here's a minimal HTML2PDF Blaec Hello World using headless chrome. I have cloned the ttf_font branch, still have a couple of fixes to look at with Win11/Visual Sudio (fileno, BMP size, location of file), more later
Blaec_HelloWorld.pdf

@pierre1451
Copy link
Author

Hi Andre, I added stb_truetype.h and modified main.c to load and read the font tags. Works pretty much out of the box. I suggest you give me rights to the ttf_fonts branch, but that's your project, let me know how you prefer to work. I'll email you the modified files for now.

@AndreRenaud
Copy link
Owner

The easiest method is for you to fork the repo under your own account, make your changes there, then issue a pull request back to this repo for the final version. I've made some changes to the ttf_fonts branch locally to try and bring things in, but I haven't got the widths working yet, or unicode encoding (so really, very little works 😄 ) I'm probably out of time to look at it for another week or so though.

I've pushed what I've got - if you run the testprog now, it will draw the text with the correct font. But the widths calculations are bogus, and it still doesn't support unicode characters. Supporting TTF fonts & supporting Unicode output are two separate things, so it's possible it's easier to get TTF working first (without unicode, still restricted to the PDFDocEncoding characters), and then deal with Unicode separately. I'm not 100% sure.

@AndreRenaud
Copy link
Owner

In the long run, I think I'll probably end up copy/pasting the cut-down contents of stb_truetype.h inside pdfgen.c. It's a bit horrible, but I don't want to change the installation requirements for users (at the moment it's just two files, pdfgen.c & pdfgen.h, and I'd like to keep it that way).

@pierre1451
Copy link
Author

Nothing wrong with copy/paste. FPDF is the PHP version of what we'd like to get to (http://www.fpdf.org/en/download.php). The fonts folder has functions to read and write a ttf, as a subset, you may find that useful.

@cblc
Copy link

cblc commented Apr 29, 2023

One quick question: Does this issue mean that accented vowels, ñ, ç, and punctuations such as ¿ or ¡ don't work? What about Greek chars? Just like the OP, my use case would be generating reports, but I need to at least support English and Spanish, with perhaps some Greek chars for referring to math symbols. Although, thinking twice, maybe I can just generate the outlines of the text from STB_truetype and make a graphical PDF instead...

@AndreRenaud
Copy link
Owner

Hi. At this stage, since PDFGen doesn't (yet?) support embedded TTF or Type1 fonts, we're stuck with whatever characters Adobe decided to enable in their original encodings. If you have a look at appendix D of this document, you can see what is available:
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf

You should be good for anything in Spanish & Greek. The simplest option would be to edit utf8_to_pdfencoding in pdfgen.c to add in the characters you want - basically you have to put in the utf8 character you're sending, and the corresponding number from the Win encoding column.

Alternatively, if you're not 100% sure what to do, send me a list of the specific characters you need, and I'll try and sort it out.

@pierre1451
Copy link
Author

Here's where I am with PDF: I thought (probably wrongly) that the easiest route to get a working C/C++ API in my app was to port FPDF from PHP. Plenty of sweat and tears later (UTF8, TTF font subsetting and embedding etc.), I have a working 4000 lines cpp API. It's raw. The 'only' Windows dependency I have is the use of fontsub.h, that does the font subsetting: not sure it's a big deal, but I didn't want to dive into this thing. I'm happy to share and contribute if you want to update pdfgen.

@LinArcX
Copy link

LinArcX commented Apr 26, 2024

I want to use a different font installed on my machine which is not part of the default fonts PDFGen supports. How should I do that?

Should I use stb_ttf? if yes, how? Is there any sample for it?

Here's what I get when I try to use Cambria font:

PDF Error: -22 - Unable to determine width for font 'Cambria'

@AndreRenaud
Copy link
Owner

I want to use a different font installed on my machine which is not part of the default fonts PDFGen supports. How should I do that?

Should I use stb_ttf? if yes, how? Is there any sample for it?

Here's what I get when I try to use Cambria font:

PDF Error: -22 - Unable to determine width for font 'Cambria'

At this stage there hasn't really been significant work done on TTF support. It's not a feature of PDFGen at the moment, so there is no support for fonts outside of the current list from the PDF spec.

 * @param font New font to use. This must be one of the standard PDF fonts:
 *  Courier, Courier-Bold, Courier-BoldOblique, Courier-Oblique,
 *  Helvetica, Helvetica-Bold, Helvetica-BoldOblique, Helvetica-Oblique,
 *  Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic,
 *  Symbol or ZapfDingbats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants