Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji Rendering Discrepancy Between Inline and Block Elements #309

Open
bzaczynski opened this issue Jul 18, 2022 · 1 comment
Open

Emoji Rendering Discrepancy Between Inline and Block Elements #309

bzaczynski opened this issue Jul 18, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@bzaczynski
Copy link

Version of Marp Tool

v1.0.0

Operating System

Linux

Environment

Running in a Docker container. (The latest version 2.0.4 seems to suffer from the same issue.)

How to reproduce

Create the following Markdown file named slide-deck.md:

---
---

<span>Inline: &#128578;</span>

<div>Block: &#128578;</div>

Run the following CLI command with Docker to generate HTML:

$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md

Run the following CLI command with Docker to generate PDF:

$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md --pdf

Expected behavior

Both emojis rendered the same way and visible in the resulting PDF document.

Actual behavior

actual

The inline emoji is rendered as an image element: <img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">, while the block element emoji is rendered literally: 🙂

This is a problem when targeting PDF as the output format:

pdf

Additional information

No response

@bzaczynski bzaczynski added the bug Something isn't working label Jul 18, 2022
@yhatt
Copy link
Member

yhatt commented Jul 18, 2022

https://markdown-it.github.io/#md3=%7B%22source%22%3A%22%3Cspan%3EInline%3A%20%26%23128578%3B%3C%2Fspan%3E%5Cn%5Cn%3Cdiv%3EBlock%3A%20%26%23128578%3B%3C%2Fdiv%3E%22%2C%22defaults%22%3A%7B%22html%22%3Afalse%2C%22xhtmlOut%22%3Afalse%2C%22breaks%22%3Afalse%2C%22langPrefix%22%3A%22language-%22%2C%22linkify%22%3Atrue%2C%22typographer%22%3Atrue%2C%22_highlight%22%3Atrue%2C%22_strict%22%3Atrue%2C%22_view%22%3A%22debug%22%7D%7D

markdown-it AST of the provided example will become as below:

[
  {
    "type": "paragraph_open",
    "tag": "p",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "inline",
    "tag": "",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 0,
    "level": 1,
    "children": [
      {
        "type": "html_inline",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "<span>",
        "markup": "",
        "info": "",
        "meta": null,
        "block": false,
        "hidden": false
      },
      {
        "type": "text",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "Inline: 🙂",
        "markup": "&#128578;",
        "info": "entity",
        "meta": null,
        "block": false,
        "hidden": false
      },
      {
        "type": "html_inline",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "</span>",
        "markup": "",
        "info": "",
        "meta": null,
        "block": false,
        "hidden": false
      }
    ],
    "content": "<span>Inline: &#128578;</span>",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "paragraph_close",
    "tag": "p",
    "attrs": null,
    "map": null,
    "nesting": -1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "html_block",
    "tag": "",
    "attrs": null,
    "map": [
      2,
      3
    ],
    "nesting": 0,
    "level": 0,
    "children": null,
    "content": "<div>Block: &#128578;</div>",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  }
]

Marp Core will transform an emoji within the content of inline markdown-it token into marp_unicode_emoji token, and render marp_unicode_emoji token as a twemoji SVG image.

md.core.ruler.after('inline', 'marp_unicode_emoji', ({ tokens, Token }) => {
for (const token of tokens) {
if (token.type === 'inline') {
const newChildren: any[] = []
for (const t of token.children) {
if (t.type === 'text') {
const splittedByEmoji = t.content.split(regexForSplit)
newChildren.push(
...splittedByEmoji.reduce(
(splitedArr, text, idx) =>
text.length === 0
? splitedArr
: [
...splitedArr,
Object.assign(new Token(), {
...t,
content: text,
type: idx % 2 ? 'marp_unicode_emoji' : 'text',
}),
],
[]
)
)
} else {
newChildren.push(t)
}
}
token.children = newChildren
}
}
})

On the other hand, the block element and its children are parsed as a single html_block token. Marp Core does not transform emojis within html_block token because may break raw HTML elements in some cases.

For emoji transformation in html_block token correctly, should implement a robust HTML parser and entity resolver, that are working in both Node.js and the browser. Unfortunately, we have not yet implemented them due to a lot of concerns:

  • html_block token may have only a part of the completed HTML block. So well-known HTML compliant parsers, such as browser's DOMParser, htmlparser2, and parse5 cannot use in our use case.

    <div class="😄">
    
    # Markdown content 👍
    
    </div>

    In above case, html_block token will be split into <div class="😄"> and </div>. When tried to parse and tranform these fragments with a known parser, the opening element will be unnecessarily closed due to HTML compliant behavior of auto-closing tags, and parsing the closing element will fail as invalid HTML.

  • If applied a simple string replacement, the raw HTML block may break in some edge cases.

    • Raw JS: <script>document.title = "🙂";</script> ➡️ <script>document.title = "<img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">";</script>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants