Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPUB: Links to anchors like #&&/2 cause a fatal error while parsing the file #1851

Open
milmazz opened this issue Jan 25, 2024 · 0 comments · May be fixed by #1854
Open

EPUB: Links to anchors like #&&/2 cause a fatal error while parsing the file #1851

milmazz opened this issue Jan 25, 2024 · 0 comments · May be fixed by #1854
Assignees

Comments

@milmazz
Copy link
Member

milmazz commented Jan 25, 2024

After checking the Elixir.epub file with epubcheck I got the following summary:

$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json

Check finished with errors
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos

So, I will start listing here the issue with the highest severity, the one that's causing a fatal error while parsing the XHTML document.

Filtering a little bit the result with jq

$ jq '.messages[] | select(.severity=="FATAL") | {id: .ID, message: .message, locations: .locations | map({path, line, column})}'  elixir_docs.json

We got the following:

{
  "id": "RSC-016",
  "message": "Fatal Error while parsing file: The entity name must immediately follow the '&' in the entity reference.",
  "locations": [
    {
      "path": "OEBPS/Bitwise.xhtml",
      "line": 25,
      "column": 26
    },
    {
      "path": "OEBPS/Function.xhtml",
      "line": 38,
      "column": 46
    },
    {
      "path": "OEBPS/Kernel.SpecialForms.xhtml",
      "line": 67,
      "column": 20
    },
    {
      "path": "OEBPS/Kernel.xhtml",
      "line": 116,
      "column": 38
    },
    {
      "path": "OEBPS/anonymous-functions.xhtml",
      "line": 94,
      "column": 409
    },
    {
      "path": "OEBPS/basic-types.xhtml",
      "line": 84,
      "column": 335
    },
    {
      "path": "OEBPS/code-anti-patterns.xhtml",
      "line": 257,
      "column": 275
    },
    {
      "path": "OEBPS/operators.xhtml",
      "line": 31,
      "column": 781
    },
    {
      "path": "OEBPS/patterns-and-guards.xhtml",
      "line": 158,
      "column": 534
    }
  ]
}

When I started inspecting each of these files I noticed a pattern that matches with the error description of the entity name must immediately follow the '&' in the entity reference.

  • anonymous-functions.xhtml -> <a href="Kernel.SpecialForms.xhtml#&/1">its documentation</a>
  • basic-types.xhtml -> <a href="Kernel.xhtml#&&/2"><code class="inline">&amp;&amp;/2</code></a>
  • Bitwise.xhtml -> <a href="#&&&/2"><code class="inline">&amp;&amp;&amp;/2</code></a>

So, the problem here in particular are the links to anchors like &/1, &&/2 and so on.

Why is this important?

In readers like Apple Books, you get the following warning at the beginning of the document:

Screenshot 2024-01-25 at 11 43 58 a m

And more importantly, once you reach the end of that document you will notice is truncated, at least if you compare that result with the HTML version:

Screenshot 2024-01-25 at 12 02 40 p m

Solution / Discussion

I'm putting this out there to start a discussion to see the approach we want to take for the EPUB formatter, I think we can first try changing those anchors from #&/1 to #&amp;/1 and see if that works, otherwise, given that for the EPUB format the anchor name and links to it are all internal, we can change the anchor generation to be a hash instead.

@milmazz milmazz self-assigned this Jan 26, 2024
milmazz added a commit that referenced this issue Jan 26, 2024
After generating the EPUB file for the Elixir docs with this version,
and reviewing the result with `epubcheck`, I got the following summary:

```console
$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json                                                                                                    (base)

Check finished with errors
Messages: 0 fatals / 141 errors / 0 warnings / 0 infos
```

If you compare the previous result with what we had on #1851

```
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos
```

you can see that now we don't have messages with `fatal` severity and we
have reduced considerably the number of errors =)

I manually checked the generated EPUB on Apple Books and the previous
truncated sections are solved, I don't see the banner _Below is a
rendering of the page up to the first error_ and also the links to
anchor different anchor seems to work.

Fixes: #1851
@milmazz milmazz linked a pull request Jan 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant