UTF-8 characters in `lstlisting` breaks pdf conversion #131

rossbar · 2020-02-08T19:19:20Z

Bug Report

Describe the bug

This is not necessarily an IPyPublish bug, but a limitation in the lstlisting LaTeX package causes pdf conversion to fail if unicode characters are used within an lstlisting environment. I stumbled upon this using the %timeit ipython magic in a code cell, as the output of %timeit includes unicode characters (the plus-minus sign, greek characters for second-prefixes, etc.)

To Reproduce

Steps to reproduce the behavior:

Create a file called example.Rmd with the following contents

\```{python}
%timeit a = 2 + 2
\```

nbpublish -f latex_ipypublish_all.exec -pdf example.Rmd

Minimal Notebook Example

timeit_nb.ipynb.txt

Same build instructions as above (with the different filename of course). Note that this issue is downstream in the build process (at the latex -> pdf step) so is insensitive to whether the input file is .Rmd, .ipynb, etc.

Expected Behaviour

Currently, the conversion fails with errors from pdflatex. The desired behavior is a successful build with unicode characters properly represented in lstlisting environments.

Runtime Information

(please complete the following information)

IPyPublish: 0.10.10
Python: 3.8.1
OS: Arch linux (5.5.2-arch1-1)
Pandoc: 2.8
(optional for pdf issues) texlive: 3.14159265
(optional for pdf issues) latexmk: 4.65

Additional context

The .log file provided by pdflatex is not particularly helpful as it makes it seem as though the problem is with the utf8x or ucs packages/options. After some digging, I was able to trace the problem back to a limitation with lstlisting. A simple procedure for confirming this:

Open the converted/timeit.tex file generated by the nbpublish process
Navigate to the lstlisting environment around the output from the code cell
Comment out the lstlisting environment
Build with pdflatex: pdflatex timeit.tex

The build will complete without errors and the output from the code cell will be properly rendered, albeit in plain LaTeX.

Proposed solution

The limitations of lstlisting with respect to unicode input are documented, and there is a proposed solution in section 2.5 of the documentation. It involves including an escapeinside= parameter in the lstlisting environment to pass the handling of characters in the environment back to latex. For example, here is the original lstlisting in timeit.tex as generated by the build process:

\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt]
11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)   
                                                                                
\end{lstlisting}

Here is the modified version that includes escapeinside that fixes the issue:

\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt,escapeinside={*(}{)*}]
*(11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)   )*

\end{lstlisting}

Note that the characters that define the escaped section (*( and )* in my example) are configurable and could be specified for the entire document with \lstset.

If the proposed solution sounds workable to you, I'm happy to attempt to implement it. Some discussion would be required to hammer out details (e.g. appropriate escape characters). I wanted to create an issue first to see if there were any additional insights/ideas.

Logging

nbpublish log for minimal example: timeit.nbpub.log
Latexmk log for minimal example: timeit.log

The text was updated successfully, but these errors were encountered:

chrisjsewell · 2020-02-19T11:37:38Z

Hey @rossbar thats funny I forgot you had raised an issue here. Thanks for the feedback, obviously I am busy working on the myst parser at the moment, so won't be able to look into this too much in the immediate future. But this will probably end up feeding into that project 😄

@choldgraf, this is related to the conversion of source code to LaTex, which will obviously be part of ExecutableBookProject/sphinx-notebook at some point.

choldgraf · 2020-02-19T14:57:26Z

@rossbar if you like you could open an issue in the sphinx-notebook repo to flag this as a future item to tackle

choldgraf · 2020-02-19T14:58:04Z

@rossbar if you like you could open an issue in the sphinx-notebook repo to flag this as a future item to tackle

rossbar · 2020-02-19T22:16:24Z

Thanks for the reply! This is not a high priority, especially in light of all the fantastic work being done with the ExecutableBookProject.

Thanks for the suggestions @choldgraf , I don't think this is a general issue, just a limitation of LaTeX's lstlisting package. The nbconvert project manages to side-step this problem by using different LaTeX environments for code input/output. Just something to keep in mind for the -> latex component of the build chain down the road. If there is a repo where concerns about the -> latex conversion step will live, I'm happy to open a "caveat" or "suggestion" issue there to document the limitations.

rossbar added the bug label Feb 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 characters in `lstlisting` breaks pdf conversion #131

UTF-8 characters in `lstlisting` breaks pdf conversion #131

rossbar commented Feb 8, 2020

chrisjsewell commented Feb 19, 2020 •

edited

choldgraf commented Feb 19, 2020

choldgraf commented Feb 19, 2020

rossbar commented Feb 19, 2020

UTF-8 characters in lstlisting breaks pdf conversion #131

UTF-8 characters in lstlisting breaks pdf conversion #131

Comments

rossbar commented Feb 8, 2020

Bug Report

Describe the bug

To Reproduce

Minimal Notebook Example

Expected Behaviour

Runtime Information

Additional context

Proposed solution

Logging

chrisjsewell commented Feb 19, 2020 • edited

choldgraf commented Feb 19, 2020

choldgraf commented Feb 19, 2020

rossbar commented Feb 19, 2020

UTF-8 characters in `lstlisting` breaks pdf conversion #131

UTF-8 characters in `lstlisting` breaks pdf conversion #131

chrisjsewell commented Feb 19, 2020 •

edited