Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callout with table produces problem in docx #6357

Open
vtraag opened this issue Jul 27, 2023 · 15 comments
Open

callout with table produces problem in docx #6357

vtraag opened this issue Jul 27, 2023 · 15 comments
Assignees
Labels
bug Something isn't working callouts Issues with Callout Blocks. docx Issues with the docx format triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Milestone

Comments

@vtraag
Copy link

vtraag commented Jul 27, 2023

Bug description

When a callout contains a table, the generated docx contains "unreadable content" according to Microsoft Word (Office 365).

afbeelding

Steps to reproduce

  1. Create a file test.qmd containing the following Quarto
::: {.callout-note}

| Column A | Column B |
|----------|----------|
|        A |        B |

:::

Produces a problem when converted to `.docx`
  1. Generate docx using quarto render test.qmd --to docx
  2. Open file in Microsoft Word.

Expected behavior

The generated docx should not contain "unreadable content".

Actual behavior

The generated docx contains "unreadable content".

Your environment

Run from the command line from Ubuntu (22.04) and Windows both reproduce this problem. Note that the error in the generated .docx is reported by Microsoft Word, not by Libre Office.

The problem appears with quarto version1.3.433. Double checked with latest pre-release 1.4.268 and the problem persists.

Quarto check output

[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.1: OK
      Dart Sass version 1.55.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.3.433
      Path: /opt/quarto/bin

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.9 (Conda)
      Path: /home/vtraag/mambaforge/envs/science/bin/python
      Jupyter: 4.11.1
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.2.3
      Path: /opt/R/4.2.3/lib/R
      LibPaths:
        - /home/vtraag/R/x86_64-pc-linux-gnu-library/4.2
        - /opt/R/4.2.3/lib/R/library
      knitr: 1.43
      rmarkdown: 2.23

[✓] Checking Knitr engine render......OK
@vtraag vtraag added the bug Something isn't working label Jul 27, 2023
vtraag added a commit to PathOS-project/indicator_handbook that referenced this issue Jul 27, 2023
@mcanouil
Copy link
Collaborator

mcanouil commented Jul 27, 2023

The problem appears with quarto version1.3.433.

You seem to imply, this was working before, but I cannot see this working in 1.2.475 nor 1.1.251.

Anyway, I can confirm it's not working with the latest dev version. Thanks for the report.

Note that enabling collapse cannot work for obvious reason in Word document.

@mcanouil mcanouil added docx Issues with the docx format callouts Issues with Callout Blocks. triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. labels Jul 27, 2023
@vtraag
Copy link
Author

vtraag commented Jul 27, 2023

You seem to imply, this was working before, but I cannot see this working in 1.2.475 nor 1.1.251.

Apologies, my wording might be ambiguous indeed. I meant that the problem is showing in 1.3.433, not that it first appears there.

Note that enabling collapse cannot work for obvious reason in Word document.

Yes, indeed, this won't work, I'm aware. This was a leftover from the bigger project where this problem surfaced. It can be safely removed, and the problem will still appear. I've edited it now accordingly.

@cscheid cscheid self-assigned this Jul 28, 2023
@cscheid
Copy link
Collaborator

cscheid commented Jul 28, 2023

Indeed, this document doesn't validate:

./tmp/document-pretty.xml:8: element tblBorders: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblBorders': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).
./tmp/document-pretty.xml:17: element tr: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tr': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblGrid ).

@schwa021
Copy link

This problem is still present. Any estimate on a fix date?

@mcanouil
Copy link
Collaborator

When there is an update it's documented on the relevant issues.

@dragonstyle
Copy link
Collaborator

This appears to only occur in specific situations - for example - the above repro case only occurs when a title is omitted (at least in my testing using Quarto 1.4). There are OOXML validation errors in the XML that we are producing, but the validation errors are identical when testing a version of the above document with and without a title (and only the one produced without a title indicates it is corrupt when opened).

@dragonstyle dragonstyle modified the milestones: v1.4, v1.5 Dec 12, 2023
@edwintorok
Copy link
Contributor

There is a duplicate w:pPr tag in the document (one emitted by pandoc itself, the other by Quarto's Lua filter):

       <w:pPr>
              <w:pStyle w:val="BodyText"/>
            </w:pPr>
            <w:pPr>
              <w:spacing w:before="16" w:after="16"/>
            </w:pPr>

Probably due to this code:

if hasIcon and calloutImage ~= nil then
    -- Create a paragraph with the icon, spaces, and text
    local image_title = pandoc.List({
        pandoc.RawInline("openxml", '<w:pPr>\n<w:spacing w:before="0" w:after="0" />\n<w:textAlignment w:val="center"/>\n</w:pPr>'), 
        calloutImage,
        pandoc.Space(), 
        pandoc.Space()})
    tappend(image_title, title)
    calloutContents:insert(pandoc.Para(image_title))
  else
    local titleRaw = openXmlPara(pandoc.Para(title), 'w:before="16" w:after="16"')
    calloutContents:insert(titleRaw)  
  end

I don't know whether that is causing the original problem, but might be worth trying to fix this and see whether Word will then open the file.

@cscheid
Copy link
Collaborator

cscheid commented Dec 18, 2023

@edwintorok Thank you for catching that! That's a good theory. It will be quite tricky for us to fix if Word disallows multiple property tags (we'd have to do it in an XML postprocessor, since Pandoc is emitting one of them..) I'll test it this week.

@edwintorok
Copy link
Contributor

If it helps I have a workaround script here that attempts to fix some of these errors (e.g. by merging the duplicate tag into a single one): https://gist.github.com/edwintorok/27b90e6f5f8f3b3e9f89372f05df1b6c#file-gistfile1-txt-L303-L403 (it depends on python-docx). You can try to see whether running that on the broken document creates a document that is accepted by Word.

@edwintorok
Copy link
Contributor

Here are 2 files:

  • the first one is using latest master pandoc + quarto 1.3 with some xml tag ordering fixes and adding a missing tblGrid. This fails to validate due to duplicate pPr: test.quarto.docx
  • the second is after running my fix.py from above to merge the duplicate pPr tags: test.docx, and passes validation according to the XML schema

They both open fine in LibreOffice and Google Docs.

Could you try to see whether either of these opens in Word?

@vtraag
Copy link
Author

vtraag commented Dec 20, 2023

Sorry, both files (test.quarto.docx and test.docx) yield the same error as reported above in Word: "Word found unreadable content ...".

@edwintorok
Copy link
Contributor

edwintorok commented Dec 20, 2023

Thanks, I can actually reproduce the problem with the free version of Microsoft365 (web version of Word).
A quick workaround seems to be to open the .docx and Libreoffice and save it again, which produces something that Word does open (well at least the web version, I don't have the desktop variant):
test3.docx

The changes made by LibreOffice are quite substantial: the original document has 2 nested tables, but the final document only has one, so apparently it has merged the tables.

According to this https://stackoverflow.com/questions/4485225/openxml-nested-tables you need an empty <w:p/> when nesting tables, which quarto's output doesn't contain.

@edwintorok
Copy link
Contributor

After inserting an empty paragraph the online version seems happy with this: test2.docx

Here is the codechange: v1.3...edwintorok:v1.3

Could you confirm whether this works with the desktop version of Word? I can then open a PR for Quarto.

@vtraag
Copy link
Author

vtraag commented Jan 22, 2024

Sorry for taking longer to get back to this @edwintorok, but it now seems to work well, and I do not get any error when opening the file, neither in MS Word, nor LibreOffice. So from that point of view, this look good, thanks!

edwintorok added a commit to edwintorok/quarto-cli that referenced this issue Jan 22, 2024
Apparently an empty paragraph is needed.

Fixes quarto-dev#6357

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor

Thanks, I've opened a PR here #8392.

@dragonstyle dragonstyle removed their assignment Feb 22, 2024
@cscheid cscheid modified the milestones: v1.5, Future May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working callouts Issues with Callout Blocks. docx Issues with the docx format triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Projects
None yet
Development

No branches or pull requests

6 participants