Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badly rendered Times New Roman PS #10665

Closed
dmisdm opened this issue Mar 22, 2019 · 3 comments
Closed

Badly rendered Times New Roman PS #10665

dmisdm opened this issue Mar 22, 2019 · 3 comments

Comments

@dmisdm
Copy link

dmisdm commented Mar 22, 2019

Attach (recommended) or Link to PDF file here:
EMG8 -Cambridge Essentail Gold Maths 8-B 117.pdf

Configuration:

  • Web browser and its version: Chrome 73
  • Operating system and its version: ** Macos 10.14.1 **
  • PDF.js version: 2.2.91
  • Is a browser extension: No

Steps to reproduce the problem:
1.
Open the PDF (its a single page extracted from a book) and see the badly rendered fonts

image

What is the expected behavior? (add screenshot)
On any other PDF reader it renders fine:
image

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
Current github pages viewer: https://mozilla.github.io/pdf.js/web/viewer.html

Notes:
We've managed make it work by editing the fonts within the PDF. It is currently "Times New Roman PS", re-rendering it with just "Times New Roman" seems to fix it.

There are no console errors, or any other visible signs of solutions such as missing CMaps.

Unfortunately we are not allowed to alter PDFs, so re-rendering each one is not a viable solution.

If anyone can give any insight into this and a possible solution, that would be massively appreciated 🤠

@janpe2
Copy link
Contributor

janpe2 commented Mar 26, 2019

The PDF defines the same fonts many times. For example, font LULQLP+TimesLTStd-Roman is defined nine times. Each one refers to the same FontDescriptor and the same embedded CFF data stream.

There is a font hash computation in PartialEvaluator.prototype.preEvaluateFont() in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ. If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?

Here is a reduced PDF that contains two fonts from the original PDF
issue10665_reduced.pdf

@Snuffleupagus
Copy link
Collaborator

There is a font hash computation in PartialEvaluator.prototype.preEvaluateFont() in core/evaluator.js. It adds entries Encoding, ToUnicode, and Widths in the hash. Some fonts in the PDF get identical hash codes because all the mentioned entries are identical, even Widths. Only entries FirstChar and LastChar differ.

Really excellent analysis, thank you; this made the bug easy to fix!

If fonts get identical hash codes, could it cause a font to be skipped so that it won't be converted to OpenType?

In some badly generated PDF files there can be huge amounts of identical fonts, and the purpose of preEvaluateFont was simply to avoid having to load/parse duplicate ones. Hence loadFont will compare hashes, and if possible use an already loaded/parsed font.
Obviously this all hinges on the fact that the hashes are actually correct/unique, but fortunately there's been relatively few bugs in that code over the years.

@dmisdm
Copy link
Author

dmisdm commented Mar 28, 2019

This is awesome thanks so much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants