Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8' codec can't decode byte 0xff in position 0: invalid start byte #82

Open
adiptamartulandi opened this issue Mar 31, 2022 · 1 comment
Assignees
Labels

Comments

@adiptamartulandi
Copy link

iam using macbook air m1
python 3.7
imgkit==1.2.2
wkhtmltopdf==0.2
wkhtmltoimage 0.12.6

hello i want to read html code but getting error utf-8' codec can't decode byte 0xff in position 0: invalid start byte

here is my code

import imgkit
import base64
from IPython.display import display, HTML

body = '''
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>equations</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
</head>
<body>
<p>Professional Format</p>
<meta charset="utf-8" />
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>∫</mo><mn>0</mn><mn>1</mn></msubsup><mi>x</mi></mrow><annotation encoding="application/x-tex">\int_{0}^{1}x</annotation></semantics></math></p>
<p>Linear Format</p>
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∖</mo><mi>i</mi><mi>n</mi><mi>t</mi><mi>_</mi><mo stretchy="false" form="prefix">{</mo><mn>0</mn><mo stretchy="false" form="postfix">}</mo><mover><mrow></mrow><mo accent="true">̂</mo></mover><mo stretchy="false" form="prefix">{</mo><mn>1</mn><mo stretchy="false" form="postfix">}</mo><mi>x</mi></mrow><annotation encoding="application/x-tex">\backslash int\_\{ 0\}\hat{}\{ 1\} x</annotation></semantics></math></p>
<p>Linear Format with lt</p>
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∖</mo><mi>i</mi><mi>n</mi><msub><mi>t</mi><mrow><mo stretchy="true" form="prefix">{</mo><mn>0</mn><mo stretchy="true" form="postfix">}</mo></mrow></msub><mo>&lt;</mo><mrow><mo stretchy="true" form="prefix">{</mo><mn>1</mn><mo stretchy="true" form="postfix">}</mo></mrow><mi>x</mi><mo>&lt;</mo><mn>5</mn></mrow><annotation encoding="application/x-tex">\backslash int_{\left\{ 0 \right\}} &lt; \left\{ 1 \right\} x &lt; 5</annotation></semantics></math></p>
</body>
</html>
'''

options = {
    "quiet": ""
}

img = imgkit.from_string(body, False, options=options)
@UrbanKeith
Copy link

UrbanKeith commented Jun 27, 2022

I had the same issue. Having rummaged in the library, I found that, when using the --quiet flag, apparently, wkhtmltopdf passes the same thing to the stder as to the stdout, that is, a byte string with a picture. However, in the library, the stder stream is decoded to utf-8, which causes the error.
Until the bug is fixed, it can be bypassed as follows:

try:
  jpeg = imgkit.from_string(html_string, False, options={'quiet': ''})
except UnicodeDecodeError as err:
  jpeg = err.args[1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants