Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code does not show up in pdf since v5.1.0 #10719

Closed
astonzhang opened this issue Jul 28, 2022 · 28 comments
Closed

Code does not show up in pdf since v5.1.0 #10719

astonzhang opened this issue Jul 28, 2022 · 28 comments

Comments

@astonzhang
Copy link

astonzhang commented Jul 28, 2022

Describe the bug

Given the following latex content produced by Sphinx from jupyter notebooks:

\sphinxAtStartPar
Next, we train the model.

\begin{sphinxVerbatim}[commandchars=\\\{\}]
\PYG{n}{model} \PYG{o}{=} \PYG{n}{DropoutMLP}\PYG{p}{(}\PYG{o}{*}\PYG{o}{*}\PYG{n}{hparams}\PYG{p}{)}
\PYG{n}{trainer}\PYG{o}{.}\PYG{n}{fit}\PYG{p}{(}\PYG{n}{model}\PYG{p}{,} \PYG{n}{data}\PYG{p}{)}
\end{sphinxVerbatim}

\begin{figure}[H]
\centering

\noindent\sphinxincludegraphics{{output_dropout_880ae5_11_0}.pdf}
\end{figure}

sphinx==5.0.2 produces pdf nicely:
Screen Shot 2022-07-28 at 3 55 30 PM

However, sphinx==5.1.0 produces pdf without code:

Screen Shot 2022-07-28 at 3 56 01 PM

How to reproduce this bug

git clone git@github.com:d2l-ai/d2l-en.git
cd d2l-en
pip install d2lbook==0.2.0
pip install sphinx==5.1.0

Modify https://github.com/d2l-ai/d2l-en/blob/master/config.ini#L32 as

eval_notebook = False

so that code is not evaluated for generating output (thus no need to install more libraries to run code & save time in reproducing the error).

Then run

d2lbook build pdf

Finally, access the output pdf under _build/pdf/d2l-en.pdf

Expected behavior

No response

Your project

https://github.com/d2l-ai/d2l-en/
https://github.com/d2l-ai/d2l-book

Screenshots

No response

OS

Linux

Python version

3.8

Sphinx version

5.1.0

Sphinx extensions

No response

Extra tools

No response

Additional context

No response

@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

Thanks for reporting. It would have been easier with a smaller sample project but here is the cause of the problem: the produced .tex file contains in its preamble the lines

\usepackage[draft]{minted}
\fvset{breaklines=true, breakanywhere=true}

I guess this comes from Jupyter Book template.

If you want to fix this at your locale try to update your post-processing of tex file to remove those two consecutive lines, I expect this will cure the problem, can you please confirm later?

The minted package modifies fancyvrb (from which \fvset command originates).which is used by Sphinx Sphinx has its own documented interface for breaking long lines. Loading minted for that is very ill-advised and not tested by Sphinx maintainers.

Please report the issue upstream to Jupyter-Book. EDITED: this was based on a misunderstanding that the project was using jupyter-book but in fact it uses d2l-book which is source of the above two lines in a LaTeX template.

However note the following:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sphinx-thebe 0.1.2 requires sphinx<5,>=3.5, but you have sphinx 5.1.0 which is incompatible.
sphinx-jupyterbook-latex 0.4.6 requires sphinx<5,>=3, but you have sphinx 5.1.0 which is incompatible.
sphinx-external-toc 0.2.4 requires sphinx<5,>=3, but you have sphinx 5.1.0 which is incompatible.
sphinx-design 0.1.0 requires sphinx<5,>=3, but you have sphinx 5.1.0 which is incompatible.
sphinx-book-theme 0.3.3 requires sphinx<5,>=3, but you have sphinx 5.1.0 which is incompatible.
pydata-sphinx-theme 0.8.1 requires sphinx<5,>=3.5.4, but you have sphinx 5.1.0 which is incompatible.
myst-parser 0.15.2 requires sphinx<5,>=3.1, but you have sphinx 5.1.0 which is incompatible.
myst-nb 0.13.2 requires sphinx<5,>=3.1, but you have sphinx 5.1.0 which is incompatible.
jupyter-book 0.13.0 requires sphinx<5,>=4, but you have sphinx 5.1.0 which is incompatible.
Successfully installed PyYAML-5.4.1 astor-0.8.1 awscli-1.25.40 botocore-1.27.40 colorama-0.4.4 commonmark-0.9.1 d2lbook-0.2.0 docutils-0.16 fasteners-0.17.3 isort-5.10.1 jmespath-1.0.1 mu-notedown-2.0.3 mxtheme-0.3.17 numpydoc-1.4.0 pandoc-attributes-0.1.7 pyasn1-0.4.8 pybtex-apa-style-1.3 recommonmark-0.7.1 regex-2022.7.25 rsa-4.7.2 s3transfer-0.6.0 sphinx-autodoc-typehints-1.19.0 sphinxcontrib-svg2pdfconverter-1.2.0 yapf-0.32.0

all those things are pinning Sphinx to <5 for I guess some good reasons. The above abusive use of minted package should be added to the list, please try reporting to jupyter-book (I have not checked if really it originates the minted lines but a prori jupyter-book must be responsible for latex template).

The above came from pip install d2lbook==0.2.0, I had already Sphinx==5.1.0. I noticed that installing d2lbook does not lead pip to install numpy nor torch, which look as missing dependency.

I had to modifiy the latex font config because Inconsolate is not at my locale. Finally, git clone git@github.com:d2l-ai/d2l-en.git does not work for me I had to use https protocol for git clone.

@jfbu jfbu changed the title Code does not show up in pdf since v5.1.0 [Jupyter Book] Code does not show up in pdf since v5.1.0 Jul 29, 2022
@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

@astonzhang

I notice file chapter_convolutional-modern/batch-norm.md contains

was to standardize our input features to have
zero mean $\mathbf{\mu} = 0$ and unit variance $\mathbf{\Sigma} = \mathbf{1}$ across multiple observations :cite:`friedman1987exploratory`.

The \mathbf mark-up is wrong for LaTeX (it may work in MathJax for html). In LaTeX \mathbf applies a priori only to (Latin, not Greek) letters and not to digits (except with certain font packages) and never to math symbols. LaTeX offers various ways for bold math, and I think in the case at hand using \boldsymbol is the way to go:

was to standardize our input features to have
zero mean $\boldsymbol{\mu} = 0$ and unit variance $\boldsymbol{\Sigma} = \boldsymbol{1}$ across multiple observations :cite:`friedman1987exploratory`.

is a way compatible with HTML output (via MathJaX, untested) and PDF via LaTeX (tested see screenshot).

With it PDF will look like
Capture d’écran 2022-07-29 à 12 36 37

Else, the rendering in pdf is:
Capture d’écran 2022-07-29 à 12 38 32

and the console output of latex run complains:

[257] [258] [259] [260] [261] [262] [263] [264] [265]
Missing character: There is no ^^F (U+0006) in font Source Serif Pro Bold/OT:sc
ript=latn;language=dflt;!
[266] [267] [268] [269] [270] [271] [272] [273] [274] [275] [276] [277]

For bold math, maybe this tex.sx answer is useful especially this comment although it dates somewhat (2017).

Alternative might be for LaTeX (untested here) using unicode-math package and \symbf mark-up. But will MathJaX understand that?

EDIT: regarding \boldsymbol and MathJax see https://docs.mathjax.org/en/latest/input/tex/extensions/boldsymbol.html

@jfbu jfbu changed the title [Jupyter Book] Code does not show up in pdf since v5.1.0 Code does not show up in pdf since v5.1.0 Jul 29, 2022
@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

Wait! After some digging where I grepped throughout the executablebooks works and was not finding minted I finally found out the lines with minted come from:
https://github.com/d2l-ai/d2l-book/blob/3e5cf2fd8b689bafe95fbd95a74fdcc2991e71dd/d2lbook/sphinx_template.py#L93-L94

\usepackage[draft]{minted}
\fvset{breaklines=true, breakanywhere=true}

Remove the lines. Now if you want linebreaks at every character in code cells you should modify line
https://github.com/d2l-ai/d2l-book/blob/3e5cf2fd8b689bafe95fbd95a74fdcc2991e71dd/d2lbook/sphinx_template.py#L106
to read

'sphinxsetup': 'verbatimwithframe=false, verbatimsep=2mm, VerbatimColor={rgb}{.95,.95,.95}, verbatimforcewraps'

You may also consider adding , verbatimmaxunderfull=0. (I do not recall precisely what minted breakanywhere exactly does, so I don't know if you really want modifying the default setting of verbatimmaxunderfull.).

OTHER thing : I must investigate why with verbatimwithframe=false one still sees a border in your pdf output with Sphinx 5.1.0 though.

However I will now close this issue as not due to a Sphinx bug.

@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

Thanks to your reporting I became aware of a bug of 5.1.0 #10723 which will be fixed in next release (code-blocks always with frame border even with verbatimwithframe=false 'sphinxsetup' option)

@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

Now if you want linebreaks at every character in code cells you should modify line https://github.com/d2l-ai/d2l-book/blob/3e5cf2fd8b689bafe95fbd95a74fdcc2991e71dd/d2lbook/sphinx_template.py#L106 to read

'sphinxsetup': 'verbatimwithframe=false, verbatimsep=2mm, VerbatimColor={rgb}{.95,.95,.95}, verbatimforcewraps'

You may also consider adding , verbatimmaxunderfull=0. (I do not recall precisely what minted breakanywhere exactly does, so I don't know if you really want modifying the default setting of verbatimmaxunderfull.).

@astonzhang Update. I checked the documentation of https://github.com/gpoore/fvextra (which is used for the minted added breakanywhere) and it appears that in the context of a Jupyter Book, where highlighting is done by https://github.com/pygments/pygments, hence involves the \PYG macro with arguments, the minted added optional line-breaks can not occur within highlighted tokens (strings for example) except at spaces.

So, after dropping minted , (mandatory if with Sphinx 5.1.0 or later), you don't need to add verbatimforcewraps contrarily to what I said above. And you don't need to worry about verbatimmaxunderfull as it matters only with verbatimforcewraps.

verbatimforcewraps is a custom Sphinx addition which manages to allow linebreaks also inside highlighted words. This is beyond what the minted breakanywhere allowed, thus you don't necessarily want to add it.

You can, if you so desire... (if you work in genomics with long DNA strains then you will want to do that... ;-) ).

@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

To sum up the original report is explained by an incompatibility showing up at Sphinx 5.1.0 with a preamble\fvset{breaklines}. The latter also requires \usepackage[draft]{minted} or \usepackage{fvextra} as the base fancyvrb.sty package used by Sphinx does not provide this breaklines option. Activating the breaklines option (which is provided by fvextra or minted) is a truly bad thing which is not compatible with Sphinx. It must be a legacy trick from old epoch.

edit https://github.com/gpoore/minted added a first version of breaklines at its v2.0 (2015/01/31) release. It receives various updates and ultimately got separated into https://github.com/gpoore/fvextra which was first released v1.0 (2016/06/28).

Please take note that:

  • Sphinx PDF output knows how to wrap long code lines since 1.4.2 (released May 29, 2016).
  • It even can break forcefully in the middle of highlighted tokens since 3.5.0 (released Feb 14, 2021) using verbatimforcewraps.

For this reason I will mark this incompatiblity with extra LaTeX preamble material as "wont-fix".

@jfbu
Copy link
Contributor

jfbu commented Jul 29, 2022

@humitos I notice your thumbs up so I am pinging you to read this again because my initial comment was written at a stage when I had misunderstood the structure of the project and thought it was using jupyter-book as its sole document builder. As I explained in other comments I was right that the minted stuff came from some LaTeX template but I was too quick to blame jupyter-book... and understood my mistake only after having searched in vain for minted in https://github.com/executablebooks/jupyter-book and elsewhere.

@astonzhang
Copy link
Author

astonzhang commented Jul 29, 2022

@jfbu Many thanks for diving deep into this issue! I just removed

\usepackage[draft]{minted}
\fvset{breaklines=true, breakanywhere=true}

and lifted the sphinx version constraint in d2l-ai/d2l-book@cbfbdf4

Now the code shows up in our latest preview PDF using d2lbook with the above changes:

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/d2l-en.pdf

e.g. (note the red arrow for wrapping up),

Screen Shot 2022-07-29 at 3 28 30 PM

Screen Shot 2022-07-29 at 3 30 21 PM

@astonzhang
Copy link
Author

@astonzhang

I notice file chapter_convolutional-modern/batch-norm.md contains

was to standardize our input features to have
zero mean $\mathbf{\mu} = 0$ and unit variance $\mathbf{\Sigma} = \mathbf{1}$ across multiple observations :cite:`friedman1987exploratory`.

The \mathbf mark-up is wrong for LaTeX (it may work in MathJax for html). In LaTeX \mathbf applies a priori only to (Latin, not Greek) letters and not to digits (except with certain font packages) and never to math symbols. LaTeX offers various ways for bold math, and I think in the case at hand using \boldsymbol is the way to go:

was to standardize our input features to have
zero mean $\boldsymbol{\mu} = 0$ and unit variance $\boldsymbol{\Sigma} = \boldsymbol{1}$ across multiple observations :cite:`friedman1987exploratory`.

is a way compatible with HTML output (via MathJaX, untested) and PDF via LaTeX (tested see screenshot).

With it PDF will look like Capture d’écran 2022-07-29 à 12 36 37

Else, the rendering in pdf is: Capture d’écran 2022-07-29 à 12 38 32

and the console output of latex run complains:

[257] [258] [259] [260] [261] [262] [263] [264] [265]
Missing character: There is no ^^F (U+0006) in font Source Serif Pro Bold/OT:sc
ript=latn;language=dflt;!
[266] [267] [268] [269] [270] [271] [272] [273] [274] [275] [276] [277]

For bold math, maybe this tex.sx answer is useful especially this comment although it dates somewhat (2017).

Alternative might be for LaTeX (untested here) using unicode-math package and \symbf mark-up. But will MathJaX understand that?

Thanks! I agree that boldsymbol should be used for greek symbols. Would you like to send a PR to D2L: https://github.com/d2l-ai/d2l-en/ ? We acknowledge every community fix: https://d2l.ai/chapter_preface/index.html#acknowledgments :)

@astonzhang
Copy link
Author

astonzhang commented Jul 29, 2022

@jfbu

Besides, a major pain point for our generated PDF is that it's hard to distinguish code input and output:
Screen Shot 2022-07-29 at 3 41 09 PM

Actually in earlier versions of Sphinx (in 2018), it's easier to distinguish input and output. See an example below with "In [xx]: and Out[xx]:" at the beginning of code blocks:

Screen Shot 2022-07-29 at 4 04 00 PM

Since you mentioned configurations of verbatim, how can I configure to distinguish code input and output? For example, even different background greyscales/colors for code input and code output will suffice. Unfortunately, now both code input and code output share VerbatimColor={rgb}{.95,.95,.95} in conf.py.

@jfbu
Copy link
Contributor

jfbu commented Jul 30, 2022

Since you mentioned configurations of verbatim, how can I configure to distinguish code input and output? For example, even different background greyscales/colors for code input and code output will suffice. Unfortunately, now both code input and code output share VerbatimColor={rgb}{.95,.95,.95} in conf.py.

I will give a look to your d2l-book processing. You can always add LaTeX code locally to redefine VerbatimColor either via an injected \sphinxsetup or by a direct use of \definecolor or \colorlet.

Maybe the following is relevant:

@jfbu
Copy link
Contributor

jfbu commented Jul 30, 2022

Now the code shows up in our latest preview PDF using d2lbook with the above changes:

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/d2l-en.pdf

e.g. (note the red arrow for wrapping up),

Good. This is the expected rendering when Sphinx wraps long code lines. Since Sphinx 5.1.0 in case that happens near bottom of page, the very long wrapped code lines can continue at top of next page. Prior, it did not (see #8686).

For customization of the little red arrow check https://www.sphinx-doc.org/en/master/latex.html#the-sphinxsetup-configuration-setting, particularly verbatimvisiblespace and verbatimcontinued. This works since Sphinx 1.4.2.

@jfbu
Copy link
Contributor

jfbu commented Jul 30, 2022

@astonzhang The fact that you see a border around the code-block with Sphinx 5.1.0 despite verbatimwithframe=false is an unfortunate bug which has been fixed in development branch and will be part of next release (#10726).

@jfbu
Copy link
Contributor

jfbu commented Jul 30, 2022

@astonzhang Since Sphinx 4.1.0 (released Jul 12, 2021) the rST container directive is supported in latex output.

i.e. using rst mark-up, an input such as:

.. container:: red

   contents

will render in LaTeX as

\begin{sphinxuseclass}{red}
contents
\end{sphinxuseclass}

Now you only need to add to preamble a suitable

\newenvironment{sphinxclassred}
   {... do whatever appropriate, e.g. \color{red}... }
   { stuff at end of environment}

So in your case if you can create classes say input and output and ensure input code cells have of this class and output code cells are of the output class then adding to the latex template via the 'preamble' key of latex_elements something such as

\newenvironment{sphinxclassinput}
   {\definecolor{VerbatimColor}{RGB}{200,200,200}}
   {}
\newenvironment{sphinxclassoutput}
   {\definecolor{VerbatimColor}{RGB}{225,225,225}}
   {}

should achieve your goal.

In place of \definecolor (which can also use {HTML}, cf texdoc xcolor), you can do some \colorlet having defined some InputVerbatimColor respectively OutputVerbatimColor also in the LaTeX preamble.

Now, it you can't do that but have some means to insert code in latex output before input or output cell, all you need to do is inject at right places local usage of \sphinxsetup , or \definecolor, or \colorlet.

For example

\sphinxsetup{VerbatimColor={RGB}{225,225,225}}

you only need to find a way to inject such things. But the "container class" idea is probably simpler as it can be done without worrying if for HTML or LaTeX output.

BTW, I get the feeling HTML input cells already have some such CSS class, but I am too little equipped and could not check how output cells look because of

AssertionError: Not enough resources (CPU 2, GPU 0 ) to run the task (CPU 1, GPU 1)

I also have various problem with PyTorch at my locale...

@astonzhang
Copy link
Author

astonzhang commented Jul 30, 2022

@jfbu

Happy Saturday!

Our Sphinx-built book is being used by many professors and students who prefer reading on PDF (eye-friendly). Since our code input block and code output block use the same style (VerbatimColor={rgb}{.95,.95,.95}), we constantly got feedback that it'll be nice for them to easily identify which code blocks belong to inputs and which are outputs.

Thanks for looking into this problem. Let me unblock you to evaluate our notebooks for code output generation on your locale (cd ../d2l-en):

  1. [IMPORTANT] rm -rf _build to clear cache.
  2. Make sure eval_notebook = True at https://github.com/d2l-ai/d2l-en/blob/master/config.ini#L32
  3. On https://github.com/d2l-ai/d2l-en/blob/master/config.ini#L23: set notebooks = *.md chapter_preliminaries/*.md (So only notebooks under the Preliminaries chapter will be evaluated. They don't need GPUs to run so you won't encounter the "Not enough resources" error that complains your locale does not have GPUs to run code)
  4. On https://github.com/d2l-ai/d2l-en/blob/master/index.md, only keep the Preliminaries chapter:

Dive into Deep Learning
========================

```eval_rst
.. raw:: html
   :file: frontpage.html
```


```toc
:maxdepth: 2
:numbered:

chapter_preliminaries/index
```


```toc
:maxdepth: 1

chapter_references/zreferences
```

Then you will get a very small pdf that only consists of the Preliminaries chapter with both code input and code output.

@astonzhang
Copy link
Author

astonzhang commented Jul 30, 2022

For more context on our d2lbook, it starts from input .md files with code and here is how these source files on our GitHub repo are transformed into output HTML and PDF (see our doc: https://book.d2l.ai/develop/pipeline.html):
build

Thus, when you run d2lbook build pdf (source code), you'll see many intermediate results under _build, such as:
Screen Shot 2022-07-30 at 1 22 15 PM

The specific build pipeline for pdf generation is:

  1. User writes code in .md
  2. Code in .md is evaluated with generated output, then converted into .ipynb
  3. .ipynb is converted into .rst by Sphinx
  4. .rst is converted into .tex by Sphinx
  5. .tex is built for .pdf by Sphinx

@astonzhang
Copy link
Author

astonzhang commented Jul 30, 2022

@jfbu Before discussing the solution, let's see what rst and tex files look like using sphinx 5.1.1.

For example, in _build/rst/chapter_preliminaries/ndarray.rst on my locale, it's pretty nice that code input and code output can be easily distinguished by checking whether :class:output is present:

Screen Shot 2022-07-30 at 1 31 23 PM

Unfortunately, in _build/pdf/d2l-en.tex on my locale, the corresponding code input and code output use the same \being{sphinxVerbatim} ... \end{sphinxVerbatim} block so it's hard to distinguish between code input and code output in the .tex file:

Screen Shot 2022-07-30 at 1 36 47 PM

So, even if we can edit the .tex file (actually I did post-edit the above d2l-en.tex file with this script for other ad-hoc hacking) by following your suggestion below

Now, it you can't do that but have some means to insert code in latex output before input or output cell, all you need to do is inject at right places local usage of \sphinxsetup , or \definecolor, or \colorlet.

For example

\sphinxsetup{VerbatimColor={RGB}{225,225,225}}
you only need to find a way to inject such things.

it's still very hard to tell which \being{sphinxVerbatim} ... \end{sphinxVerbatim} is input and which is output to apply separate environment properly.

Therefore, is it possible to show some special tag like :class:output of rst in the generated .tex file to indicate that a specific \being{sphinxVerbatim} ... \end{sphinxVerbatim} block belongs to code output? If this is possible, I can simply add a new function in my script to apply code-output-specific local environment (e.g., different background color from that of input) to \being{sphinxVerbatim} ... \end{sphinxVerbatim} with that special tag.

Besides, my tex-processing script is too hacky and strongly depends on sphinx output/syntax. For example, if sphinx changes any latex-specific syntax in the future, my script will break...

Any suggestions are welcome. Thanks!

@jfbu
Copy link
Contributor

jfbu commented Jul 30, 2022

@astonzhang
I am at

[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./chapter_preliminaries/linear-algebra.md" on CPU [1] is running for 00:29:37

and will now get some sleep...

I encountered PyTorch problems on my macos x such as pytorch/pytorch#36941, pytorch/pytorch#20030, https://stackoverflow.com/questions/61525299/cannot-import-torch-image-not-found and will address those also tomorrow. (after making the mistake to start some brew install stuff; I will rather go the macports way and possibly use install_name_tool is that is the way to work around difficultiels for PyTorch to find the libomp.dylib.

I will examine build pipeline tomorrow too. The https://github.com/spatialaudio/nbsphinx processing is able to define all needed environment wrappers to customize locally the sphinxVerbatim.

as a last resort I have some crazy idea about hijacking Sphinx LaTeX writer handling of captions for code-blocks. We can provide a title only to signal for example that this is an output cell. Then modify the Sphinx latex macro to not at all prepare a title but rather change VerbatimColor.

but this is no good, because probably if we can do that while keep same source for both HTML and PDF outputs, which clearly must be a requirement, then we have a way to inject surely some LaTeX code at the right place.

Update, now at 35:45 build time

[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./chapter_preliminaries/linear-algebra.md" on CPU [1] is running for 00:35:45

As you can see, I don't have access to a mainframe or even a decently fast computer...

@astonzhang
Copy link
Author

astonzhang commented Jul 30, 2022

@astonzhang I am at

[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./chapter_preliminaries/linear-algebra.md" on CPU [1] is running for 00:29:37

and will now get some sleep...

I encountered PyTorch problems on my macos x such as pytorch/pytorch#36941, pytorch/pytorch#20030, https://stackoverflow.com/questions/61525299/cannot-import-torch-image-not-found and will address those also tomorrow. (after making the mistake to start some brew install stuff; I will rather go the macports way and possibly use install_name_tool is that is the way to work around difficultiels for PyTorch to find the libomp.dylib.

I will examine build pipeline tomorrow too. The https://github.com/spatialaudio/nbsphinx processing is able to define all needed environment wrappers to customize locally the sphinxVerbatim.

as a last resort I have some crazy idea about hijacking Sphinx LaTeX writer handling of captions for code-blocks. We can provide a title only to signal for example that this is an output cell. Then modify the Sphinx latex macro to not at all prepare a title but rather change VerbatimColor.

but this is no good, because probably if we can do that while keep same source for both HTML and PDF outputs, which clearly must be a requirement, then we have a way to inject surely some LaTeX code at the right place.

Update, now at 35:45 build time

[d2lbook:resource.py:L164] INFO       - Task "Evaluating ./chapter_preliminaries/linear-algebra.md" on CPU [1] is running for 00:35:45

As you can see, I don't have access to a mainframe or even a decently fast computer...

Quick reply:

Can you simply remove that linear-algebra.md file and the corresponding linear-algebra entry from https://github.com/d2l-ai/d2l-en/blob/master/chapter_preliminaries/index.md? Will that just make your computer feel happier? :)

Good night!

Update:

@jfbu To allow you to play with d2lbook without relying on pytorch/tensorflow/mxnet, I just created a new sphinx-latex branch containing the minimal code to test any crazy idea. Could you replace all your d2l-en files with those from the sphinx-latex branch on your locale? Then it takes < 1 min to execute d2lbook build pdf to produce a minimal complete pdf with both code input and code output:

Screen Shot 2022-07-30 at 3 49 51 PM

The corresponding _build/rst/chapter_preliminaries/ndarray.rst now looks like

Screen Shot 2022-07-30 at 3 52 54 PM

and the _build/pdf/d2l-en.tex:

Screen Shot 2022-07-30 at 3 53 41 PM

Actually, if you run bash static/build_html.sh, you can also see how the output HTML looks like under _build/html (you can see that the code input and output are distinguishable by background color. Thus I think the rst -> html pipeline is handled by sphinx well, possibly by using the :class:output tag in the aforementioned rst file):

Screen Shot 2022-07-30 at 3 59 03 PM

All in all, now we'll need to make the code output 2 look different from the code input 1 + 1 in the PDF 😃

@jfbu
Copy link
Contributor

jfbu commented Jul 31, 2022

@astonzhang

Thanks for the trimmed-down set-up which was indeed much helpful. The build was stalled yesterday due to PyTorch not finding libomp, an issue I will address perhaps later at my locale.

edit: PyTorch officially does not support my too old mac os x; besides I don't have a GPU and don't have access to cloud services from my non-financed location. So that's pretty much all to it until I decide to buy some new hardware as noone will do it for me... Finally I understand PyTorch originates in FaceBook AI team, which I do not intend to share anything with for reasons of ethics.

Try this:

diff --git a/d2lbook/rst.py b/d2lbook/rst.py
index 43819f4..faec73a 100644
--- a/d2lbook/rst.py
+++ b/d2lbook/rst.py
@@ -110,7 +110,12 @@ def _process_rst(body):
                     break
                 j += 1
             i = j
+        elif line.startswith('.. code::'):
+            # reset LaTeX code-block rendering parameters
+            lines[i] = '.. raw:: latex\n\n   \\diilbookstyleinputcell\n\n' + lines[i]
         elif line.startswith('.. parsed-literal::'):
+            # reset LaTeX code-block rendering parameters
+            lines[i] = '.. raw:: latex\n\n   \\diilbookstyleoutputcell\n\n' + lines[i]
             # add a output class so we can add customized css
             lines[i] += '\n    :class: output'
             i += 1
diff --git a/d2lbook/sphinx_template.py b/d2lbook/sphinx_template.py
index f662bdb..8f16a5d 100644
--- a/d2lbook/sphinx_template.py
+++ b/d2lbook/sphinx_template.py
@@ -100,9 +100,46 @@ MONO_FONT
         \fancyhead[LE,RO]{{\py@HeaderFamily }}
      }
 \makeatother
+
+% Defines macros for code-blocks styling
+\definecolor{d2lbookOutputCellBackgroundColor}{RGB}{239,254,255}	
+\definecolor{d2lbookOutputCellBorderColor}{RGB}{204,204,204}	
+\def\diilbookstyleoutputcell
+   {\sphinxcolorlet{VerbatimColor}{d2lbookOutputCellBackgroundColor}%
+    \sphinxcolorlet{VerbatimBorderColor}{d2lbookOutputCellBorderColor}%
+    \sphinxsetup{verbatimwithframe,verbatimborder=3pt}%
+   }%
+%
+\definecolor{d2lbookInputCellBackgroundColor}{rgb}{.95,.95,.95}
+\def\diilbookstyleinputcell
+   {\sphinxcolorlet{VerbatimColor}{d2lbookInputCellBackgroundColor}%
+    \sphinxsetup{verbatimwithframe=false,verbatimborder=0pt}%
+   }%
+% memo: as this mark-up uses macros not environments we have to reset all changed
+%       settings at each input cell to not inherit those or previous output cell
+% memo: Sphinx 5.1.0, 5.1.1 ignore verbatimwithframe Boolean, so for this
+%       reason we added an extra verbatimborder=0pt above.
 ''',
-'sphinxsetup': 'verbatimwithframe=false, verbatimsep=2mm, VerbatimColor={rgb}{.95,.95,.95}'
+
+'sphinxsetup': '''verbatimsep=2mm,
+                  VerbatimColor={rgb}{.95,.95,.95},
+                  VerbatimBorderColor={rgb}{.95,.95,.95},
+                  pre_border-radius=3pt,
+               ''',
 }
+# memo: Sphinx 5.1.0+ has a "feature" that if we don't set VerbatimColor to
+# some value via the sphinxsetup key or via \sphinxsetup raw macro, it
+# considers no colouring of background is required.  Above we by-passed usage
+# of \sphinxsetup, because \sphinxcolorlet was more convenient.  So we set
+# VerbatimColor in 'sphinxsetup' global key to work around that "feature".
+# The exact same applies with VerbatimBorderColor: it has to be set at least
+# once via 'sphinxsetup' or via \sphinxsetup raw macro else frame is black.
+#
+# memo: the Sphinx 5.1.0+ added pre_border-radius must be used in 'sphinxsetup'
+# (it can be modified later via extra  raw \sphinxsetup)
+# because at end of preamble Sphinx decides whether or not to load extra package
+# for rendering boxes with rounded corners.  N.B.: pre_border-radius is
+# unknown in Sphinx < 5.1.0 and will cause breakage.
 
 SPHINX_CONFIGS
 

At my locale with Sphinx 5.1.0 and with current Sphinx 5.x it will produce

Capture d’écran 2022-07-31 à 10 48 50

I used a 5.1.0 added feature of rounded corners, simply remove the pre_border-radius=3pt thing for compatibility with Sphinx <5.1.0.

(I don't have Inconsolata at my locale so I used FreeMono which is font in code cells in above screenshot)

Indeed your setup adds an "output" class, but one would have to modify

sphinx/sphinx/writers/latex.py

Lines 1751 to 1800 in c2f1f89

def visit_literal_block(self, node: Element) -> None:
if node.rawsource != node.astext():
# most probably a parsed-literal block -- don't highlight
self.in_parsed_literal += 1
self.body.append(r'\begin{sphinxalltt}' + CR)
else:
labels = self.hypertarget_to(node)
if isinstance(node.parent, captioned_literal_block):
labels += self.hypertarget_to(node.parent)
if labels and not self.in_footnote:
self.body.append(CR + r'\def\sphinxLiteralBlockLabel{' + labels + '}')
lang = node.get('language', 'default')
linenos = node.get('linenos', False)
highlight_args = node.get('highlight_args', {})
highlight_args['force'] = node.get('force', False)
opts = self.config.highlight_options.get(lang, {})
hlcode = self.highlighter.highlight_block(
node.rawsource, lang, opts=opts, linenos=linenos,
location=node, **highlight_args
)
if self.in_footnote:
self.body.append(CR + r'\sphinxSetupCodeBlockInFootnote')
hlcode = hlcode.replace(r'\begin{Verbatim}',
r'\begin{sphinxVerbatim}')
# if in table raise verbatim flag to avoid "tabulary" environment
# and opt for sphinxVerbatimintable to handle caption & long lines
elif self.table:
self.table.has_problematic = True
self.table.has_verbatim = True
hlcode = hlcode.replace(r'\begin{Verbatim}',
r'\begin{sphinxVerbatimintable}')
else:
hlcode = hlcode.replace(r'\begin{Verbatim}',
r'\begin{sphinxVerbatim}')
# get consistent trailer
hlcode = hlcode.rstrip()[:-14] # strip \end{Verbatim}
if self.table and not self.in_footnote:
hlcode += r'\end{sphinxVerbatimintable}'
else:
hlcode += r'\end{sphinxVerbatim}'
hllines = str(highlight_args.get('hl_lines', []))[1:-1]
if hllines:
self.body.append(CR + r'\fvset{hllines={, %s,}}%%' % hllines)
self.body.append(CR + hlcode + CR)
if hllines:
self.body.append(r'\sphinxresetverbatimhllines' + CR)
raise nodes.SkipNode

to let the LaTeX writer detect the class in the node and react to it.

The above patch is not good style. I considered adding a usage of container directive as alluded to earlier but then I have to add code to your rst.py to indent all lines. But that would be the cleaner way. Here injecting macros requires that on each cell we must re-issue the styling because the previous one was not extinguished by an environment scope-limiting context.

About

Besides, my tex-processing script is too hacky and strongly depends on sphinx output/syntax. For example, if sphinx changes any latex-specific syntax in the future, my script will break...

Indeed it is no good to rely on Sphinx output syntax... I don't have specific suggestion as I have not looked in details at what your post-processing script does. I see now you e.g. add a \protect in front of \hyperlink, if this is a Sphinx issue, please report it.

@astonzhang
Copy link
Author

astonzhang commented Jul 31, 2022

@jfbu Thanks a ton!!

@jfbu
Copy link
Contributor

jfbu commented Aug 1, 2022

@astonzhang you are welcome ;-)

  • If you keep usage of pre_border-radius option in the 'sphinxsetup' you should pin Sphinx >=5.1.0. The option is unknown with earlier releases and will cause LaTeX to report a build error.

  • The .rst files use parsed-literal directive for output cells. The handling by Docutils+Sphinx is a bit peculiar (this applies both to HTML and PDF output). It tries to detect if the directive contents contain some formatting mark-up and if not it will highlight the block with default highlighting language. This has always surprised me... but internally both parsed-literal and code-block directives are dispatched to the same python function handler (be it for HTML, LaTeX, writers) which then applies a bit strange test to check if the contents used some inline mark-up. For example consider this reST input:

.. parsed-literal::

   def foo():
       return None

.. parsed-literal::

   def foo():
       return *None*

and here is its rendering to HTML (using 'classic' theme):
Capture d’écran 2022-08-01 à 09 25 30

Notice that the first instance was handled exactly as if code-block directive had been used. In the second instance, there is no highlighting. The LaTeX writer activates a branch entirely different from the one of highlighted code-blocks if it decides to handle the contents really as parsed literal, so there can be then no background colouring, no frame...

...it is very peculiar that one can not be sure fully in advance of what happens. Here is what html5 writer uses as test (same as html and latex):

def visit_literal_block(self, node: Element) -> None:
if node.rawsource != node.astext():
# most probably a parsed-literal block -- don't highlight
return super().visit_literal_block(node)
lang = node.get('language', 'default')

Now, in your use case it seems always the parsed-literal will be interpreted as code-block and use default language highlighting. We can see that at https://book.d2l.ai/user/create.html for example. Here is a snapshot from the this web page:
Capture d’écran 2022-08-01 à 09 39 44

Notice how some keywords are highlighted, but this highlighting is a bit strange in this context.

My suggestion is that perhaps you may consider replacing the

.. parsed-literal::

lines with

.. code-block:: none

lines. The current code incorporating #10719 (comment) looks

https://github.com/d2l-ai/d2l-book/blob/ab958bf666599b5d067be08f1ad883334675513b/d2lbook/rst.py#L116-L121

        elif line.startswith('.. parsed-literal::'):
            # reset LaTeX code-block rendering parameters
            lines[i] = '.. raw:: latex\n\n   \\diilbookstyleoutputcell\n\n' + lines[i]
            # add a output class so we can add customized css
            lines[i] += '\n    :class: output'
            i += 1

and may thus be modified to do the replacement.

This will remove any highlighting but keep the architecture for background colouring, addition of a frame, both in HTML and PDF.

However it will inihibit rendering of actual reST mark-up for emphasis or boldness or for embedded links.

I thus don't know what is best for d2l-book.

@jfbu
Copy link
Contributor

jfbu commented Aug 1, 2022

@astonzhang

In future we may incorporate #10736 suggestion. If this was already in place your issue with distinguishing code-blocks in PDF would have been much easier to handle: you already add an "output" class for HTML, and with #10736 suggestion the LaTeX is aware of that class. it is only a matter to define in conf.py latex preamble an environment sphinxclasspre.output (if #10736 suggestion is used as is), which would set the styling parameters. As this would be done in a LaTeX environment, its scope is automatically limited to the output cell. Maybe at Sphinx 6.0.0 we will have something like that.

@jfbu
Copy link
Contributor

jfbu commented Aug 1, 2022

Actually in earlier versions of Sphinx (in 2018), it's easier to distinguish input and output. See an example below with "In [xx]: and Out[xx]:" at the beginning of code blocks:

I am not sure this is related to Sphinx. The prompts are not in the .rst files, something perhaps having to do with nbconvert? You may refer nbsphinx to see how they handle prompts there. It required special LaTeX extras to put the info the .tex file and manage how to handle it.

At my locale in the sphinx-latex branch of d2l-en you created, I can see of course prompts when using Jupyter to visit the evaluated _build/eval/chapter_preliminaries/ndarray.ipynb notebook. But the ndarray.rst file has none, so maybe something in the ipynb to rst conversion in the build pipeline is related to this.

@jfbu
Copy link
Contributor

jfbu commented Aug 1, 2022

@astonzhang

Notice that the first instance was handled exactly as if code-block directive had been used. In the second instance, there is no highlighting.

I have identified an instance of this phenomenon at https://book.d2l.ai/user/create.html#project-from-scratch

Consider the following screenshot from this html web page rendering:

Capture d’écran 2022-08-01 à 11 26 08

This has no highlighting. In the PDF output we can see it has no background colouring and no frame. It is an instance where the Sphinx mechanism decided that parsed-literal contained actually parsed literal contents. Something inside this outpu cell (which contains some humongous LaTeX console output [1]) triggered this.

.. [1]: for reducing Sphinx LaTeX console output refer our docs

The rst source looks like this

Let build the PDF output, you will find
``Output written on mybook.pdf (7 pages).`` in the output logs.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    !cd mybook && d2lbook build pdf


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    [d2lbook:config.py:L12] INFO   Load configure from config.ini
    [d2lbook:build.py:L143] INFO   0 notebooks are outdated
    [d2lbook:build.py:L149] INFO   Evaluating notebooks in parallel with 4 CPU workers and 4 GPU workers
    [d2lbook:build.py:L53] INFO   === Finished "d2lbook build eval" in 00:00:02

So this is indeed an example where parsed-literal did not end up with the styling other output cells normally trigger (both in HTML and PDF). The HTML contains

<pre class="output literal-block">[d2lbook:config.py:L12] INFO   Load configure from config.ini
[d2lbook:build.py:L143] INFO   0 notebooks are outdated
[d2lbook:build.py:L149] INFO   Evaluating notebooks in parallel with 4 CPU workers and 4 GPU workers
[d2lbook:build.py:L53] INFO   === Finished &quot;d2lbook build eval&quot; in 00:00:02
[d2lbook:build.py:L289] INFO   0 rst files are outdated
[d2lbook:build.py:L53] INFO   === Finished &quot;d2lbook build rst&quot; in 00:00:02
Running Sphinx v5.1.1

which we can contrast with how the previous output cell is rendered

<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">d2lbook</span><span class="p">:</span><span class="n">config</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="n">L12</span><span class="p">]</span> <span class="n">INFO</span>   <span class="n">Load</span> <span class="n">configure</span> <span class="kn">from</span> <span class="nn">config.ini</span>

If parsed-literal was replaced by code-block (and suitable highlighting language, which a priori should be simply none) then this output cell would look like the others.

Trying out the rst in a Sphinx dummy project I receive a bunch of

/path/to/index.rst:33: WARNING: Inline interpreted text or phrase reference start-string without end-string.

warnings (7 of them) where the parsed-literal starts at line 33 but the warnings are not useful they do no say where is the problem exactly.

I identified that the trigger is the presence of the backtick ` in the LaTeX console output:

...
    (use `make latexpdf' here to do that automatically).
...
    For additional information on amsmath, use the `?' option.
...
    Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix 
...
    For additional information on amsmath, use the `?' option.
...
    Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix 
...
    For additional information on amsmath, use the `?' option.
...
    Style option: `fancyvrb' v2.7a, with DG/SPQR fixes, and firstline=lastline fix 
...

So these lines trigger warnings about bad mark-up !!! But there are other things. Lines such as these ones also contribute to Sphinx using the parsed literal branch and not the code-block branch:

    *geometry* driver: auto-detecting
    *geometry* detected driver: xetex

Finally I identified after some search that presence of this in the contents

    This is XeTeX, Version 3.14159265-2.6-0.99998 (TeX Live 2017/Debian) (preloaded format=xelatex)
     restricted \write18 enabled.

by itself causes that Sphinx HTML and LaTeX writers do not style as highlighted block! The cause is the backslash in \write...

...this is good illustration of what I have always considered a strange handling by Sphinx of .. parsed-literal:: ...

edit: I mean it is surprising that parsed-literal will behave like code-block when no inline reST mark-up is seen in the contents. As a result some tool (perhaps "nbconvert"?) uses parsed-literal in a misguided way. Notice how the above example triggers build Warning about Inline interpreted text or phrase reference start-string without end-string. which are very hard to understand.

Ideally the conversion tool from ipynb to rst should mark the output cell not with parsed-literal which is a priori not appropriate but with code-block and a suitable highlighting language which will often be chosen to be none because the output cell is not something in a programming language to highlight but may sometimes be python or whatever.

@astonzhang
Copy link
Author

astonzhang commented Aug 1, 2022

My suggestion is that perhaps you may consider replacing the

.. parsed-literal::
lines with

.. code-block:: none
lines. The current code incorporating #10719 (comment) looks

https://github.com/d2l-ai/d2l-book/blob/ab958bf666599b5d067be08f1ad883334675513b/d2lbook/rst.py#L116-L121

    elif line.startswith('.. parsed-literal::'):
        # reset LaTeX code-block rendering parameters
        lines[i] = '.. raw:: latex\n\n   \\diilbookstyleoutputcell\n\n' + lines[i]
        # add a output class so we can add customized css
        lines[i] += '\n    :class: output'
       i += 1

and may thus be modified to do the replacement.

Great suggestion. I'll take a look at the whole book and decide if to keep parsed-literal or switch to .. code-block:: none

@astonzhang
Copy link
Author

@astonzhang

In future we may incorporate #10736 suggestion. If this was already in place your issue with distinguishing code-blocks in PDF would have been much easier to handle: you already add an "output" class for HTML, and with #10736 suggestion the LaTeX is aware of that class. it is only a matter to define in conf.py latex preamble an environment sphinxclasspre.output (if #10736 suggestion is used as is), which would set the styling parameters. As this would be done in a LaTeX environment, its scope is automatically limited to the output cell. Maybe at Sphinx 6.0.0 we will have something like that.

Agreed. Please keep me posted of this change. Once it's available in a later version of Sphinx we may make our d2lbook implementation cleaner.

@jfbu
Copy link
Contributor

jfbu commented Aug 1, 2022

I'll take a look at the whole book and decide if to keep parsed-literal or switch to .. code-block:: none

Ideally it would be .. code-block:: <some language> with a language appropriate to the output cell, or often none. In some cases perhaps you do want .. parsed-literal:: for some output containing reST formatted things with links for example.

Globally it looks like a rather complex topic...

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants