Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

Wrong image paths when HTML input comes from stdin #4981

Open
getreu opened this issue Mar 27, 2021 · 8 comments
Open

Wrong image paths when HTML input comes from stdin #4981

getreu opened this issue Mar 27, 2021 · 8 comments

Comments

@getreu
Copy link

getreu commented Mar 27, 2021

wkhtmltopdf version(s) affected: 0.12.6

OS information
Debian 10

Description

When HTML input comes from stdin the relative image path
<img src="installation-error.png" alt="Sorry, an error has occurred" /> is wrongly
transformed into file:///tmp/installation-error.png (see below).

$ cat input.html | wkhtmltopdf --enable-local-file-access - tmp.pdf
Loading pages (1/6)
Warning: Failed to load file:///tmp/installation-error.png (ignore)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done

How to reproduce

  1. Create the file `input.html"
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">

<p><img src="installation-error.png" alt="Sorry, an error has occurred" /></p>
Anleitung zur Installation des Produktupdates
</body>
</html>
  1. Create an image file installation-error.png in the same directory.
  2. Execute cat "input.html" | wkhtmltopdf --enable-local-file-access - tmp.pdf

Expected behavior
That the image is rendered and included in the resulting pdf.

Possible Workaround
none

@PhilterPaper
Copy link

  1. Does /tmp come from the output PDF file name, or is it apparently an internal working directory?
  2. Does this work correctly if you create an .html file and use that as input?
  3. You may have to enable local file access (default behavior changed with 0.12.6).

@getreu
Copy link
Author

getreu commented Mar 28, 2021

  1. My working dir is Pictures, so tmp must be some internal thing.
  2. Yes, the following works fine:
$ wkhtmltopdf --enable-local-file-access Hueber\ Lernmaterialen.md.html Hueber\ Lernmaterialen.md.html.pdf
Loading pages (1/6)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done

This does NOT work:

$ cat Hueber\ Lernmaterialen.md.html | wkhtmltopdf --enable-local-file-access - Hueber\ Lernmaterialen.md.html.pdf
Loading pages (1/6)
Warning: Failed to load file:///tmp/installation-error.png (ignore)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done
  1. I did

@dmjohnsson23
Copy link

dmjohnsson23 commented Jun 10, 2021

I have the same error happening with 0.12.6 on Windows.

Warning: Failed to load file:///C:/WINDOWS/TEMP/hwhl-logo.png (ignore)

The working directory when the command is run is C:\wamp64\www, not C:\WINDOWS\TEMP, so I'm not sure why it's looking in the temp folder. I did pass the --enable-local-file-access flag, so that isn't the issue.

I notice that, when you read a file from the filesystem rather than stdin, paths are relative to the file and not the working directory (as expected). I wonder if wkhtmltopdf assumes that the system temp folder is the "real" location of the "file" read from stdin, and so assigns that as the base url. Personally, I would expect when reading from stdin that the current working directory would be considered the base url that images are referenced from.

@PhilterPaper
Copy link

I'm trying to understand what you're describing here. Apparently you are piping in the input HTML file. Within that file, you have an image file -- is that specified with a relative address such as src="hwhl-logo.png" (or ./ ), rather than an absolute path to the image? And what you're seeing is that it's looking for the image file relative to some system temp folder rather than, say, the HTML's location? I don't think this is going to work anyway -- the act of piping (or redirecting) a file should destroy all evidence of from where it came, wouldn't it? It should just be a stream of bytes. It would be reasonable for the cwd to be assumed to be the base from which a relative path is calculated. I think that's the normal practice, so wkHTMLtoPDF may have an error.

@dmjohnsson23
Copy link

That is correct. The piped HTML is programmatically generated from a template, and the paths are relative to the template's folder rather than absolute. Relative paths are much easier to work with, especially as the development environment is Windows and the server environment is Linux; absolute paths would be totally different for each OS. When I call wkhtmltopdf, I am doing so with the template's folder as the current working directory. My expectation would be that this would allow wkhtmltopdf to understand relative paths as being relative to the current working directory. Having paths relative to the system temp folder doesn't make much sense to me.

I ended up using weasyprint instead for the specific project I'm working on currently, as it's behavior in this regard was more in line with what I expected. However, I'm happy to provide any further details you'd like; I think there is a lot to like about wkhtmltopdf.

@PhilterPaper
Copy link

It's starting to sound like wkHTMLtoPDF is either using a fixed "working directory" of /WINDOWS/TEMP/, or Windows is passing the wrong working directory. Either is an error. Perhaps you can wrap the whole thing in a script (.bat file) to explicitly cd to /WINDOWS/TEMP/ and run everthing from there? You would have to move the resulting PDF file back to your real working directory. Clumsy to have to do this, but maybe it could work for you. There is another ticket open about problems piping to stdout or something similar (#3119) that perhaps has a similar root cause?

@gboddin
Copy link

gboddin commented Jul 15, 2021

Crossed that behavior too when trying to generate a template from a golang io.writer. ( so sending to wkhtml stdin ).

When stdin is used as html input, wkhtml assumes /tmp to be the current folder.

Instead, it could assume the current folder is the running process's current folder or provide a switch to define the root for assets.

This makes it difficult to secure local files when used it in a stream, and might unexpectedly disclose information from /tmp if --allow ./ is used.

@apcaselli
Copy link

I'm facing the same issue, currently my only workaround is to store the input into a temporary file and pass it as a parameter to the process.
Using the current directory as default instead of /tmp/ would be great (though having a "base folder" directory would be ideal)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

5 participants