Badly rendered Times New Roman PS #10665

dmisdm · 2019-03-22T01:22:25Z

Attach (recommended) or Link to PDF file here:
EMG8 -Cambridge Essentail Gold Maths 8-B 117.pdf

Configuration:

Web browser and its version: Chrome 73
Operating system and its version: ** Macos 10.14.1 **
PDF.js version: 2.2.91
Is a browser extension: No

Steps to reproduce the problem:
1.
Open the PDF (its a single page extracted from a book) and see the badly rendered fonts

What is the expected behavior? (add screenshot)
On any other PDF reader it renders fine:

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
Current github pages viewer: https://mozilla.github.io/pdf.js/web/viewer.html

Notes:
We've managed make it work by editing the fonts within the PDF. It is currently "Times New Roman PS", re-rendering it with just "Times New Roman" seems to fix it.

There are no console errors, or any other visible signs of solutions such as missing CMaps.

Unfortunately we are not allowed to alter PDFs, so re-rendering each one is not a viable solution.

If anyone can give any insight into this and a possible solution, that would be massively appreciated 🤠

janpe2 · 2019-03-26T21:43:15Z

The PDF defines the same fonts many times. For example, font LULQLP+TimesLTStd-Roman is defined nine times. Each one refers to the same FontDescriptor and the same embedded CFF data stream.

There is a font hash computation in PartialEvaluator.prototype.preEvaluateFont() in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ. If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?

Here is a reduced PDF that contains two fonts from the original PDF
issue10665_reduced.pdf

Snuffleupagus · 2019-03-27T00:03:24Z

There is a font hash computation in PartialEvaluator.prototype.preEvaluateFont() in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ.

Really excellent analysis, thank you; this made the bug easy to fix!

If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?

In some badly generated PDF files there can be huge amounts of identical fonts, and the purpose of preEvaluateFont was simply to avoid having to load/parse duplicate ones. Hence loadFont will compare hashes, and if possible use an already loaded/parsed font.
Obviously this all hinges on the fact that the hashes are actually correct/unique, but fortunately there's been relatively few bugs in that code over the years.

dmisdm · 2019-03-28T00:31:28Z

This is awesome thanks so much for the help!

timvandermeij added the font-conversion label Mar 22, 2019

Snuffleupagus mentioned this issue Mar 26, 2019

Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665) #10685

Merged

timvandermeij closed this as completed in #10685 Mar 27, 2019

csalzano mentioned this issue Dec 12, 2023

Embeded PDF Font Style viewed on a cell phone. breakfastco/embed-pdf-wpforms#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Badly rendered Times New Roman PS #10665

Badly rendered Times New Roman PS #10665

dmisdm commented Mar 22, 2019 •

edited

janpe2 commented Mar 26, 2019

Snuffleupagus commented Mar 27, 2019

dmisdm commented Mar 28, 2019

Badly rendered Times New Roman PS #10665

Badly rendered Times New Roman PS #10665

Comments

dmisdm commented Mar 22, 2019 • edited

janpe2 commented Mar 26, 2019

Snuffleupagus commented Mar 27, 2019

dmisdm commented Mar 28, 2019

dmisdm commented Mar 22, 2019 •

edited