Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 characters in lstlisting breaks pdf conversion #131

Open
rossbar opened this issue Feb 8, 2020 · 4 comments
Open

UTF-8 characters in lstlisting breaks pdf conversion #131

rossbar opened this issue Feb 8, 2020 · 4 comments
Labels

Comments

@rossbar
Copy link

rossbar commented Feb 8, 2020

Bug Report

Describe the bug

This is not necessarily an IPyPublish bug, but a limitation in the lstlisting LaTeX package causes pdf conversion to fail if unicode characters are used within an lstlisting environment. I stumbled upon this using the %timeit ipython magic in a code cell, as the output of %timeit includes unicode characters (the plus-minus sign, greek characters for second-prefixes, etc.)

To Reproduce

Steps to reproduce the behavior:

  1. Create a file called example.Rmd with the following contents
\```{python}
%timeit a = 2 + 2
\```
  1. nbpublish -f latex_ipypublish_all.exec -pdf example.Rmd

Minimal Notebook Example

timeit_nb.ipynb.txt

Same build instructions as above (with the different filename of course). Note that this issue is downstream in the build process (at the latex -> pdf step) so is insensitive to whether the input file is .Rmd, .ipynb, etc.

Expected Behaviour

Currently, the conversion fails with errors from pdflatex. The desired behavior is a successful build with unicode characters properly represented in lstlisting environments.

Runtime Information

(please complete the following information)

  • IPyPublish: 0.10.10

  • Python: 3.8.1

  • OS: Arch linux (5.5.2-arch1-1)

  • Pandoc: 2.8

  • (optional for pdf issues) texlive: 3.14159265

  • (optional for pdf issues) latexmk: 4.65

Additional context

The .log file provided by pdflatex is not particularly helpful as it makes it seem as though the problem is with the utf8x or ucs packages/options. After some digging, I was able to trace the problem back to a limitation with lstlisting. A simple procedure for confirming this:

  1. Open the converted/timeit.tex file generated by the nbpublish process
  2. Navigate to the lstlisting environment around the output from the code cell
  3. Comment out the lstlisting environment
  4. Build with pdflatex: pdflatex timeit.tex

The build will complete without errors and the output from the code cell will be properly rendered, albeit in plain LaTeX.

Proposed solution

The limitations of lstlisting with respect to unicode input are documented, and there is a proposed solution in section 2.5 of the documentation. It involves including an escapeinside= parameter in the lstlisting environment to pass the handling of characters in the environment back to latex. For example, here is the original lstlisting in timeit.tex as generated by the build process:

\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt]
11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)   
                                                                                
\end{lstlisting}

Here is the modified version that includes escapeinside that fixes the issue:

\begin{lstlisting}[language={},postbreak={},numbers=none,xrightmargin=7pt,belowskip=5pt,aboveskip=5pt,breakindent=0pt,escapeinside={*(}{)*}]
*(11.1 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)   )*

\end{lstlisting}

Note that the characters that define the escaped section (*( and )* in my example) are configurable and could be specified for the entire document with \lstset.

If the proposed solution sounds workable to you, I'm happy to attempt to implement it. Some discussion would be required to hammer out details (e.g. appropriate escape characters). I wanted to create an issue first to see if there were any additional insights/ideas.

Logging

@rossbar rossbar added the bug label Feb 8, 2020
@chrisjsewell
Copy link
Owner

chrisjsewell commented Feb 19, 2020

Hey @rossbar thats funny I forgot you had raised an issue here. Thanks for the feedback, obviously I am busy working on the myst parser at the moment, so won't be able to look into this too much in the immediate future. But this will probably end up feeding into that project 😄

@choldgraf, this is related to the conversion of source code to LaTex, which will obviously be part of ExecutableBookProject/sphinx-notebook at some point.

@choldgraf
Copy link

@rossbar if you like you could open an issue in the sphinx-notebook repo to flag this as a future item to tackle

1 similar comment
@choldgraf
Copy link

@rossbar if you like you could open an issue in the sphinx-notebook repo to flag this as a future item to tackle

@rossbar
Copy link
Author

rossbar commented Feb 19, 2020

Thanks for the reply! This is not a high priority, especially in light of all the fantastic work being done with the ExecutableBookProject.

Thanks for the suggestions @choldgraf , I don't think this is a general issue, just a limitation of LaTeX's lstlisting package. The nbconvert project manages to side-step this problem by using different LaTeX environments for code input/output. Just something to keep in mind for the -> latex component of the build chain down the road. If there is a repo where concerns about the -> latex conversion step will live, I'm happy to open a "caveat" or "suggestion" issue there to document the limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants