Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc_self_contained_html() does not produce a valid HTML file #2087

Open
3 tasks done
RLesur opened this issue Mar 31, 2021 · 6 comments
Open
3 tasks done

pandoc_self_contained_html() does not produce a valid HTML file #2087

RLesur opened this issue Mar 31, 2021 · 6 comments
Assignees
Labels
bug an unexpected problem or unintended behavior next to consider for next release

Comments

@RLesur
Copy link
Contributor

RLesur commented Mar 31, 2021

Hi,

I got the same bug as in #2066: the HTML file produced with pandoc_self_contained_html() is invalid (it begins with an opening p tag).

Here's a reprex:

library(rmarkdown)

writeLines("", input <- tempfile(fileext = ".md"))
pandoc_convert(
  input, 
  output = html1 <- tempfile(fileext = ".html"),
  options = "--standalone")
html2 <- pandoc_self_contained_html(html1, tempfile(fileext = ".html"))

# pandoc_self_contained_html() does not return a valid HTML file
cat(readLines(html2)[1])
#> <p>&lt;!DOCTYPE html&gt; <html xmlns="http://www.w3.org/1999/xhtml" lang xml:lang> <head> <meta charset="utf-8" /> <meta name="generator" content="pandoc" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> <title>fileb9cd6adf80da</title> <style> html { line-height: 1.5; font-family: Georgia, serif; font-size: 20px; color: #1a1a1a; background-color: #fdfdfd; } body { margin: 0 auto; max-width: 36em; padding-left: 50px; padding-right: 50px; padding-top: 50px; padding-bottom: 50px; hyphens: auto; word-wrap: break-word; text-rendering: optimizeLegibility; font-kerning: normal; } @media (max-width: 600px) { body { font-size: 0.9em; padding: 1em; } } @media print { body { background-color: transparent; color: black; font-size: 12pt; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3, h4 { page-break-after: avoid; } } p { margin: 1em 0; } a { color: #1a1a1a; } a:visited { color: #1a1a1a; } img { max-width: 100%; } h1, h2, h3, h4, h5, h6 { margin-top: 1.4em; } h5, h6 { font-size: 1em; font-style: italic; } h6 { font-weight: normal; } ol, ul { padding-left: 1.7em; margin-top: 1em; } li > ol, li > ul { margin-top: 0; } blockquote { margin: 1em 0 1em 1.7em; padding-left: 1em; border-left: 2px solid #e6e6e6; color: #606060; } code { font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace; font-size: 85%; margin: 0; } pre { margin: 1em 0; overflow: auto; } pre code { padding: 0; overflow: visible; } .sourceCode { background-color: transparent; overflow: visible; } hr { background-color: #1a1a1a; border: none; height: 1px; margin: 1em 0; } table { margin: 1em 0; border-collapse: collapse; width: 100%; overflow-x: auto; display: block; font-variant-numeric: lining-nums tabular-nums; } table caption { margin-bottom: 0.75em; } tbody { margin-top: 0.5em; border-top: 1px solid #1a1a1a; border-bottom: 1px solid #1a1a1a; } th { border-top: 1px solid #1a1a1a; padding: 0.25em 0.5em 0.25em 0.5em; } td { padding: 0.125em 0.5em 0.25em 0.5em; } header { margin-bottom: 4em; text-align: center; } #TOC li { list-style: none; } #TOC a:not(:hover) { text-decoration: none; } code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} span.underline{text-decoration: underline;} div.column{display: inline-block; vertical-align: top; width: 50%;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} .display.math{display: block; text-align: center; margin: 0.5rem auto;} </style> <!--[if lt IE 9]>

Created on 2021-04-01 by the reprex package (v1.0.0)

If you are interested in, I wrote an alternative version here.


By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.org/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('rmarkdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/rmarkdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@yihui yihui added the bug an unexpected problem or unintended behavior label Apr 1, 2021
@yihui
Copy link
Member

yihui commented Apr 1, 2021

I don't know why we were forcing the input format to be markdown here:

rmarkdown/R/pandoc.R

Lines 453 to 456 in a11240d

from <- if (pandoc_available("1.17"))
"markdown_strict"
else
"markdown"

It seems to be more sensible to use html as the input format if the input file has the extension .html.

@cderv
Copy link
Collaborator

cderv commented Apr 1, 2021

So looking at this function more closely, I think this is a very old function that does not do what the documentation said. It will only convert if the input is markdown and not html.

This function is 7 years old, and did not change much since then (a tweak 5 years ago). it seems it was made to convert from markdown_strict only. (markdown was added in the tweak to fix a bug)

I don't really know what this function was for at the time. From a quick search, it seems this function was used in RStudio IDE but it has been internalized since 1.1
rstudio/rstudio@40bc3d1

There is now a comment there that gives us some hints:

convert from markdown to html to get base64 encoding. note there is no markdown in the source
document but we still need to do this "conversion" to get the base64 encoding; we also don't
want to convert from HTML since that will cause pandoc to convert only the

So it seems at the time, using --from markdown_strict was a trick, and converting directly from HTML to HTML was not working.

If we dig more, it seems that some fixes have been done in the RStudio version, specifically for the issue in #2066 (fixed by rstudio/rstudio#1756). Last version is here: https://github.com/rstudio/rstudio/blob/master/src/cpp/session/modules/ModuleTools.R#L375

htmlwidgets use its own version for that and it seems Pandoc 2.0 caused issue leading to the need of passing by markdown input. See ramnathv/htmlwidgets#292 and the issue in the first place ramnathv/htmlwidgets#289 (comment).

The rmarkdown version of pandoc_self_contained_html has not been fixed it seems since a long time, and does not seem to be a working version for Pandoc 2.0+.

It is possible the this current function is no more used - maybe it would be better to deprecate it and work on a new function that does correctly was we want it to do. (Convert HTML content to a self contained version ?)

The minimum we could do is port the fix that are in the other packages for making a working version. But is rmarkdown the right place for this function ?

Fun history research - thanks @RLesur !

Regarding your usage in your project, I believe to create a HTML file for a HTMLwidget, I believe htmlwidgets::saveWidget() is the right tool. It is self_contained by default.

@RLesur
Copy link
Contributor Author

RLesur commented Apr 1, 2021

Thanks @cderv for this archivist's work!

This was the first time I tried to use pandoc_self_contained_html(), I wasn't aware of its existence before that. IMO, this function is legacy and could be deprecated then removed.

@yihui
Copy link
Member

yihui commented Apr 1, 2021

I also found that this function was not used anywhere in this package. I'm okay with deprecating it and removing it in a future version of rmarkdown.

@cderv cderv added the next to consider for next release label Jan 4, 2022
@ismirsehregal
Copy link

ismirsehregal commented Feb 15, 2022

Just stumbled over this issue - pandoc_self_contained_html() allows workarounds like this:

rstudio/htmltools#73

Which saveWidget doesn't:

library(htmltools)
myTagList <- tagList(p("A"), p("B"))
tempHTML <- tempfile(fileext = ".html")
save_html(myTagList, tempHTML)

rmarkdown::pandoc_self_contained_html(input = tempHTML, output = tempHTML) # works
utils::browseURL(tempHTML)
htmlwidgets::saveWidget(myTagList, "myTagList.html", selfcontained = TRUE) # doesn't work

There is a similar situation regarding crosstalk::bscols - Please see this:

library(crosstalk)
library(plotly)
library(rmarkdown)
library(datasets)
library(htmltools)

shared_iris <- SharedData$new(iris)
fig <- bscols(
  plot_ly(shared_iris, x = ~Petal.Length, y = ~Petal.Width, colors = ~Species),
  plot_ly(shared_iris, x = ~Sepal.Length, y = ~Sepal.Width, colors = ~Species)
)
htmlwidgets::saveWidget(fig, "fig.html", selfcontained = TRUE) # doesn't work
tempPlotly <- tempfile(pattern = "plotly", fileext = ".html")
htmltools::save_html(fig, tempPlotly)
rmarkdown::pandoc_self_contained_html(input = tempPlotly, output = tempPlotly)
utils::browseURL(tempPlotly)

@cderv
Copy link
Collaborator

cderv commented Jun 15, 2022

Possibly this future change would help us rework this function: #2382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior next to consider for next release
Projects
Status: Backlog
Development

No branches or pull requests

4 participants