Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disableFontFace mutually-exclusive font rendering #11955

Closed
Snipeye opened this issue Jun 2, 2020 · 29 comments
Closed

disableFontFace mutually-exclusive font rendering #11955

Snipeye opened this issue Jun 2, 2020 · 29 comments

Comments

@Snipeye
Copy link

Snipeye commented Jun 2, 2020

Attach (recommended) or Link to PDF file here:
renderExample.pdf

Configuration:

  • Web browser and its version: Node 12.x
  • Operating system and its version: AmazonLinux2
  • PDF.js version: 2.3.200
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Explicity set "disableFontFace" to "false"
  2. Try to render the PDF

What is the expected behavior? (add screenshot)
image

What went wrong? (add screenshot)
image

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

I've managed - after a LOT of tinkering - to install some fonts on my amazonlinux2 instance. It was not fun. This means that the bulk of the text visible in that PDF, which is Helvetica, renders properly (instead of not at all, like it did before I installed the fonts). Unfortunately, for whatever stupid reason, it also means that the fonts that RENDER CORRECTLY when "disableFontFace" is set to true (like it is by default in Node) cease rendering correctly. I know for a fact that the included PDF has the necessary included subset of Arial to render correctly, but instead we're getting weird... well, to be honest, I don't even know what they are.

In short: disableFontFace false gives me the helvetica I want, but renders the pdf-embedded fonts wrongly. disableFontFace true gives me the embedded fonts, but fails when it tries to fall back on system fonts.

@Snuffleupagus
Copy link
Collaborator

Duplicate of issues such as e.g. #11347, #11311, and #4244.

@Snipeye
Copy link
Author

Snipeye commented Jun 3, 2020

Hey, guys! I've dug into the errors quite thoroughly prior to posting this one, and I actually disagree with the designation of duplicate - though I'd agree that it seems to revolve around similar issues. #11347 is actually closed (and unresolved) so, closing this as a duplicate of that doesn't accomplish much. #11311 is more "how do I determine which to use," which I am not asking, and #4244 is quite old, and was created initially to resolve the issue of system fonts not rendering properly at all - a problem that seems (partially) resolved by "disableFontFaces".

This issue is to address the problems with the "disableFontFaces" feature, specifically recognizing that (A) embedded fonts work properly with "disableFontFaces" set to "true" for node, and (B) System Fonts work properly with "disableFontFaces" set to "false," but there is no happy medium that makes use of BOTH embedded AND system fonts when trying to render in node.

I've provided a PDF that demonstrates this exact behavior, which should help make debugging at least more straightforward (if not "easy" in a library as impressive as this). I'm willing to help out - I'll happily provide a docker container with the exact environment, as well as my programming experience - but closing this as a duplicate of a 6-year-old as-of-yet-unresolved issue seems a bit like a door slam. Is there anything I can do to help move this forward instead?

@timvandermeij timvandermeij reopened this Jun 3, 2020
@timvandermeij
Copy link
Contributor

I think those are valid points. Let's reopen this.

@Snipeye
Copy link
Author

Snipeye commented Jun 3, 2020

What would help resolve this most? I can set up a docker that would be an exact environment in which I see the problem.

@bobsingor
Copy link

I am dealing with exactly the same problem and would love to see this issue resolved! Happy you reopened it.

@Snuffleupagus
Copy link
Collaborator

Unfortunately #4244 (comment) is still (mostly) accurate here, and the only real solution would be to start embedding (standard) fonts in the PDF.js library (there's potentially copyright/filesize reasons that would complicate doing that).
Since the code runs in a brower we thus cannot really load font data directly from the system, which is why the PDF.js library would need to bundle font data such that src/core/fonts.js would be able to fetch fallback font data for fonts which do not include any font program.

Hence why the duplicate is still correct as far as I'm concerned, while that may indeed be unfortunate.

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

I definitely think embedding (standard) fonts would solve the issue, but not head on - it more... avoids the issue. What I'm trying to focus on in this specific issue is the fact that "disableFontFace: false" will allow system fonts to render, while "disableFontFace: true" will allow embedded fonts to render - thus demonstrating that both types of fonts are capable of rendering/loading - but there is no setting that would allow both system AND embedded fonts to render - at least, not in node.

Pre-loading the fonts would be one solution, since then I could just rely on embedded fonts and know that is has 'em all.

Adding an entry point to force-embed fonts would be another solution, and that may well be a more general solution: it would mean that we could "fix" rendering for PDFs that never embedded necessary fonts in the first place, too. (This would have the added benefit of avoiding any sort of copyright/file size issue, too.)

But in this case, I KNOW that pdf.js is capable of rendering every part of the PDF I provided, it just can't seem to do them at the same time - so that's what this issue is about. Perhaps this issue could be addressed those other ways, but...

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

That explains why, when "disableFontFace: true" (as is default for node) the system fonts don't render (unless embedded, of course). That makes total sense.

What doesn't make as much sense is the fact that when "disableFontFace: false", suddenly the embedded fonts from the PDF fail to render properly, instead appearing as some strange font-point glyph. That kind of indicates the "conver[sion] to OpenType fonts and load[ing] via font face rules" (referenced in the documentation you provided) is failing. I'll definitely dig a little bit more around the code that manages that, IIRC I saw it doing some verification/loading via dataurl - unless I'm thinking of the wrong thing.

@Snuffleupagus
Copy link
Collaborator

Note how disableFontFace is described in the JSDocs:

pdf.js/src/display/api.js

Lines 135 to 138 in 96ad60f

* @property {boolean} [disableFontFace] - By default fonts are
* converted to OpenType fonts and loaded via font face rules. If disabled,
* fonts will be rendered using a built-in font renderer that constructs the
* glyphs with primitive path commands. The default value is `false`.

As a consequence of drawing glyphs manually, there needs to be font data present to create said path commands from; see e.g. https://github.com/mozilla/pdf.js/blob/master/src/core/font_renderer.js

Unless the font program is embedded in the PDF file, we thus have no way of accessing the necessary data to build the paths; hence why we'd need to bundle (standard) font data in the library such that things would work.


Finally, note that, when disableFontFace = false (i.e. the default value) then the environment itself (normally the browser) is then falling back on whatever fonts are available in the system. (As mentioned above, we cannot directly access fonts from the system.)

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

I just want to make sure I'm understanding the implications correctly here - when disableFontFace = false we fall back on system fonts, and don't even try to use the pdf-embedded fonts? Could we not try to use a pdf-embedded font first, and, if/when that fails, THEN 'fall back' to system?

@Snuffleupagus
Copy link
Collaborator

when disableFontFace = false we fall back on system fonts,

Only for those fonts without embedded font data, since browsers are able to handle that situation (as opposed to Node.js).

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

It seems like something, then is going awry:

In my environment - the one I described in this issue - I've INSTALLED system fonts. When disableFontFace = false (which I have to force, since node defaults it to true), the system fonts render just fine - so I know they're working. The embedded fonts, however - namely a subset of Arial for my PDF - only load weird glyphs/fontpoint icons. Perhaps this is because for the embedded fonts we NEED to use some sort of path-generation approach (at least, in node), but for system fonts we don't? Maybe we just need to create a hybrid mode that uses path generation for embedded fonts, and falls back for system fonts? I'm just trying to figure out where it's going wrong, honestly.

@Snuffleupagus
Copy link
Collaborator

Please keep in mind that the PDF.js library was developed for use in browsers, and whatever Node.js support there is was "bolted on" afterwards so to speak; this obviously shows unfortunately :-(

I remember seeing other issues, which I (obviously) cannot find right now, where it was suggested that https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js doesn't really support Node.js which is probably a fairly likely explanation for the troubles.

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

That reasoning makes sense - for both the difficulties with node and the explanation of the fontloader.

IIRC when the environment is node the "isFontLoadingAPISupported" is false, so it wouldn't surprise me if that one of the problems. If we KNOW that the path-based approach works for the embedded fonts, and the system fonts/fontface rules work for the non-embedded fonts, how difficult would it be to create a hybrid approach that checked the registry ("commonObjs" or something IIRC from digging around) for an embedded font and used path-based rendering, and if that failed tried the normal (font face) rendering?

@Snuffleupagus
Copy link
Collaborator

The correct approach would be to extend https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js to be able to register custom fonts in Node.js environments.

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

Looks like some of the code is in place already - check out https://github.com/mozilla/pdf.js/blob/master/src/display/font_loader.js#L158 . It says you can treat node as if sync loading is supported.

I'm curious what advantage this would give - correct me if I'm wrong, but the font loader isn't used when we have disableFontFace = true, and that's when the embedded fonts work properly. Are you suggesting that by enabling the font loader, you could then load in system default fonts in addition to the embedded fonts?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jun 4, 2020

but the font loader isn't used when we have disableFontFace = true

That's when the glyphs are rendered as path operators, as explained above, i.e. no font data is actually being loaded/registered in the browser/environment.
If the FontLoader supported Node.js properly, I'm assuming that things should "just work" (with the caveat that I don't know how feasible registering custom fonts is in Node.js environments).

Edit: Note how the code is essentially assuming a browser-compatible environment in e.g.

insertRule(rule) {
let styleElement = this.styleElement;
if (!styleElement) {
styleElement = this.styleElement = document.createElement("style");
styleElement.id = `PDFJS_FONT_STYLE_TAG_${this.docId}`;
document.documentElement
.getElementsByTagName("head")[0]
.appendChild(styleElement);
}
const styleSheet = styleElement.sheet;
styleSheet.insertRule(rule, styleSheet.cssRules.length);
}

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

OK, so what you're saying - and correct me if I'm wrong, it's very early in the morning here - is this:

Right now, we don't do font loading. If we see embedded fonts, we can use the data from them to generate paths. This happens when disableFontFace = true. We cannot, however, generate paths for the system fonts, because we can't actually pull the font data down from the system - we can only humbly request that the system render things with its knowledge about its own fonts. Thus embedded fonts work because we can draw paths, and non-embedded fonts fail because we can't draw paths. Correct so far?

When disableFontFace = false, we ALWAYS just ask the system to render stuff. This means the embedded fonts, if they aren't also included in the system, will fail. The fontloader, however, allows us to take an embedded font and tell the system about it, so when we ask the system to render stuff it knows how.

That's my understanding, is that reasonable?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jun 4, 2020

disableFontFace = true
[...]
we can only humbly request that the system render things with its knowledge about its own fonts.

There's no attempt to fallback in this mode, we'll only render glyphs as path commands (which requires an existing font program to generate) and nothing else; note

createNativeFontFace() {
if (!this.data || this.disableFontFace) {
return null;
}
and
createFontFaceRule() {
if (!this.data || this.disableFontFace) {
return null;
}
and finally

pdf.js/src/display/canvas.js

Lines 1503 to 1554 in 96ad60f

var addToPath;
if (font.disableFontFace || isAddToPathSet || patternFill) {
addToPath = font.getPathGenerator(this.commonObjs, character);
}
if (font.disableFontFace || patternFill) {
ctx.save();
ctx.translate(x, y);
ctx.beginPath();
addToPath(ctx, fontSize);
if (patternTransform) {
ctx.setTransform.apply(ctx, patternTransform);
}
if (
fillStrokeMode === TextRenderingMode.FILL ||
fillStrokeMode === TextRenderingMode.FILL_STROKE
) {
ctx.fill();
}
if (
fillStrokeMode === TextRenderingMode.STROKE ||
fillStrokeMode === TextRenderingMode.FILL_STROKE
) {
ctx.stroke();
}
ctx.restore();
} else {
if (
fillStrokeMode === TextRenderingMode.FILL ||
fillStrokeMode === TextRenderingMode.FILL_STROKE
) {
ctx.fillText(character, x, y);
}
if (
fillStrokeMode === TextRenderingMode.STROKE ||
fillStrokeMode === TextRenderingMode.FILL_STROKE
) {
ctx.strokeText(character, x, y);
}
}
if (isAddToPathSet) {
var paths = this.pendingTextPaths || (this.pendingTextPaths = []);
paths.push({
transform: ctx.mozCurrentTransform,
x,
y,
fontSize,
addToPath,
});
}
},


The fontloader, however, allows us to take an embedded font and tell the system about it, so when we ask the system to render stuff it knows how.

That sounds about right, and it works perfectly well in browsers. Most likely, this part simply isn't working in Node.js environments.

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

It may be time for me to sleep, but I'm still hung up on a couple things.

The fontloader serves to take the embedded font, and teach the system about it. Doesn't that mean, at the times we would be trying to use it, that "disableFontFace" would be false? So we should make it past those checks you mentioned.

@Snuffleupagus
Copy link
Collaborator

The fontloader serves to take the embedded font, and teach the system about it.

Yes, but as already mentioned there's no Node.js-specific code in the FontLoader so it's not really surprising if things don't work :-)

@Snipeye
Copy link
Author

Snipeye commented Jun 4, 2020

I think I'm understanding. Was the font loader written specifically for this project, or was it a derivative of something else? I'm trying to figure out where I need to go to dig into it to see if I can find anybody who's got it (or something similar) working on Node. I really appreciate your feedback so far!

@Snuffleupagus
Copy link
Collaborator

Was the font loader written specifically for this project,

Yes, at least as far I know.


A quick search seem to suggest that loading fonts in Node.js, and preferably in an at least similar way to what's possible in the browser, is perhaps not that straightforward in general (which is probably why there's no support for Node.js in the FontLoader).

The closest I can find is the node-canvas package, which apparently has some support for registering fonts. However, I've got absolutely no idea if that would be useful/sufficient for the PDF.js use-case.

@Snuffleupagus
Copy link
Collaborator

Duplicate of #4244

@Snuffleupagus Snuffleupagus marked this as a duplicate of #4244 Jun 8, 2021
@reite
Copy link

reite commented Mar 17, 2023

@Snipeye wondering if you ever managed to solve this issue and render a PDF with both embedded and non-embedded fonts on nodejs?

@Snipeye
Copy link
Author

Snipeye commented Mar 17, 2023 via email

@ojtramp
Copy link

ojtramp commented Jun 20, 2023

Also struggling with this issue. Is there an alternative library people would recommend for creating images from a PDF in memory, without saving to the file system?

Thanks for your help

@snowfluke
Copy link

What about creating a headless browser e.g pupetteer to simulate browser behavior?

@jkgenser
Copy link

Also struggling with this issue. Is there an alternative library people would recommend for creating images from a PDF in memory, without saving to the file system?

Thanks for your help

I have done this with pdfium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants