Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include custom HTML attributes #170

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Conversation

hyperking
Copy link

@hyperking hyperking commented Dec 9, 2022

This is similar to this PR#134

I would like to propose an alternative solution that will not alter existing render methods.

Html attribute block Proposed Spec

Line containing the following string
${html_attr_name:value, another_html_attr:value}
will describe the html attributes for the element proceeding it.

Contents within the ${...} string will be a comma separated list of key/value pairs.
The > character will separate parent attributes from child attributes.

Example Html attribute block
INPUT

${id:my-value, class:some-class}
# Mistletoe is Awesome

${id:my-list, class:foo >class:bar-items}
- Item One
- Item Two
- Item Three\

OUTPUT

<h1 id="my-value" class="some-class">Mistletoe is Awesome</h1>
<ul id="my-list" class="foo">
    <li class="bar-items">Item One</li>
    <li class="bar-items">Item Two</li>
    <li class="bar-items">Item Three</li>
</ul>

@hyperking hyperking changed the title Include any custom HTML attributes on any Token Include custom HTML attributes Dec 9, 2022
@hyperking hyperking marked this pull request as draft December 9, 2022 16:39
@hyperking hyperking marked this pull request as ready for review December 10, 2022 02:13
@anderskaplan
Copy link
Contributor

Hi, a few comments from someone who's been messing a bit with this code base recently.

I think the approach with a new block token is a good one, and it fits nicely in the markdown syntax.
It definitely has an advantage over the other proposed method to add attributes, in that you don't need to
write a custom renderer to use it.

Here are a few suggestions:

  1. The way this is added into block_tokenizer.make_tokens() feels a bit hacky. Why not treat it like
    any other block token and handle it in HTMLRenderer.render_html_attributes() instead?
    That's more in line with the extensible design of mistletoe.
  2. Since this is optional and non-standard functionality, I'd suggest to not add it to HTMLrenderer but rather to create a new
    renderer class for it in the contrib folder, derived from HTMLRenderer of course. The new
    block token class could go there as well.
  3. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...?
    Not saying it needs to be more complicated than necessary -- just that it should be well defined.
  4. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists,
    for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example,
    if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list
    because that would be invalid html.

Hope this can be helpful to make it a good addition to mistletoe!

@hyperking
Copy link
Author

Hi, a few comments from someone who's been messing a bit with this code base recently.

Thank you for the response and suggestions.

  1. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...?
    Not saying it needs to be more complicated than necessary -- just that it should be well defined.

Using the {...} characters was causing conflicts with other renderers namely the test_XWiki20Renderer.py macros test failing tests. Although your suggestion to create a new HTMLrenderer may help fix those issues here.

Per the block syntax. Lets get inline working first before multi-line. I'd imagine we could repurpose the code block functionality

  1. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists,
    for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example,
    if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list
    because that would be invalid html.

Great point, id attributes should always be unique to pass accessibility standards. We could add a numeric suffix onto the nest element id each time props are passed down from parents. I could also experiment with making the apply_props recursive for child tokens.

@hyperking
Copy link
Author

I've updated the branch to include adding unique id attribute values for deeply nested lists.

Below is an example currently in place
INPUT

${id:todos, tabindex:100 > class:list-item}
- Item One
- Item Two
    - item two.one
    - item two.two
        - apples
        - oranges
- Item Three

OUTPUT

<ul id="todos" tabindex="100">
    <li class="list-item" tabindex="1">Item One</li>
    <li class="list-item" tabindex="1">Item Two
        <ul id="todos-2" tabindex="1">
            <li class="list-item" tabindex="1">item two.one</li>
            <li class="list-item" tabindex="1">item two.two
                <ul id="todos-2-3" tabindex="1">
                    <li class="list-item" tabindex="1">apples</li>
                    <li class="list-item" tabindex="1">oranges</li>
                </ul>
            </li>
        </ul>
    </li>
    <li class="list-item" tabindex="1">Item Three</li>
</ul>

@anderskaplan
Copy link
Contributor

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

${id:todos}
- Item One
- Item Two
    - item two.one
    - item two.two
        ${id:shopping-list}
        - apples
        - oranges
- Item Three

@hyperking
Copy link
Author

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

Hmm, I had that idea in the beginning, however the current tokenization processing was a struggle and we need a system to pass parent props to children.

Or perhaps an Emmet style syntax at the root level may function better.

🐊 ${id:todos > id:shopping-list > class:shopping-list-items}

@hyperking
Copy link
Author

@anderskaplan There is still a few things I need to update in this PR and I'd like to know what is the criteria for getting code merged into master? Thanks for your help and feedback.

things todo:

  1. Implement attributes for all other html elements
  2. Add test cases for each html element
  3. Update Readme? ( not sure If the readme should include this feature )

@anderskaplan
Copy link
Contributor

Ok, that's a question for @pbodnar. I can only help with some guidance on how to make the code fit in with the overall design of the codebase.

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan There is still a few things I need to update in this PR and I'd like to know what is the criteria for getting code merged into master? Thanks for your help and feedback.

@hyperking, thank you for your contribution (and thanks to @anderskaplan for giving some useful feedback :)). I have gone through it quickly and gave you some feedback. There are no exactly described criteria. Generally, the code should be production-ready (or "fully baked" ;)) before being merged into master.

It also helps if commits and their messages are already as close to the final state as possible. So rebasing (squashing etc.) and force-pushing is usually OK to achieve that, even though this may lead to losing links from conversations to the code...


Some further quick thoughts (as I don't have much time for maintaining mistletoe these days and some open PRs are still waiting there for me ;)):

  1. PR description

This is similar to this #134

I would like to propose an alternative solution that will not alter existing render methods.

OK, it would be also good to highlight that unlike #134, this PR lets regular users specify the additional attributes, so it is not just an API extension.

  1. Selecting the syntax for attributes

Using the {...} characters was causing conflicts with other renderers namely the test_XWiki20Renderer.py macros test failing tests. Although your suggestion to create a new HTMLrenderer may help fix those issues here.

Provided we would want to use just {...} instead of ${...} and provided a test is failing, just fixing that test could be a solution?

Anyway, I wonder if we couldn't inspire at some alternative Markdown parsers or elsewhere, regarding the syntax (e.g. XWiki uses (% ... %))? Also from many tools and languages, I am quite used to read ${...} as something that should be evaluated, that's why I would like to consider some other variants.

  1. HTMLAttributes, or ExtraAttributes?

Any thoughts on making this mechanism, or token, general? I.e. the attributes maybe don't have to be necessarily just for HTML?

Also, HTMLAttributes are being added into block_token.__all__ by this PR now, so they would be already available to all the other renderers, right?

And a related question: shouldn't we do this feature / parsing optional (thinking about performance and backwards compatibility here)?

README.md Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
k, v = get_props(prop.strip())
if k and v: attr_map[k] = v
return attr_map
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to catch any exception and print it here. Unless the exception is really expected and we can do something useful with it. Otherwise we let it propagate, so that client code is informed about a potential bug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see commit (e8256f6)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would suggest though to remove the try-except block altogether then - less code, less errors, faster reading. :)

DITTO for the case below.

mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
title = ' title="{}"'.format(html.escape(token.title))
else:
title = ''
attr = '' if not hasattr(token,'html_props') else token.html_props
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract this repeated line to a dedicated helper method, as in #134, so that the logic is kept and can be changed at one place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes are serialized ahead of time already. See here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yet I meant it literally though. :) Something like:

Suggested change
attr = '' if not hasattr(token,'html_props') else token.html_props
attr = self._render_attributes(token)
  • you can even inline this attr variable on the next line then (i.e. remove it).


class TestHTMLAttributes(TestCase):

def test_document(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically OK, but It will be better to test every aspect (variant) of this new feature in a dedicated method (and move all the common code like imports to the top).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 I knew this moment would come. Sure thing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice with some unit tests for the HTMLAttribute options as well.

test/test_html_attributes_renderer.py Outdated Show resolved Hide resolved
@hyperking
Copy link
Author

hyperking commented Dec 18, 2022

@hyperking, thank you for your contribution (and thanks to @anderskaplan for giving some useful feedback :)). I have gone through it quickly and gave you some feedback. There are no exactly described criteria. Generally, the code should be production-ready (or "fully baked" ;)) before being merged into master.

🤓 Awesome! With the holidays underway I may have some in between time to address any loose ends with this PR.

It also helps if commits and their messages are already as close to the final state as possible. So rebasing (squashing etc.) and force-pushing is usually OK to achieve that, even though this may lead to losing links from conversations to the code...

🤓 Yes, I'll make an effort to clean up the PR as close to upstream/master as possible without any conflicts. I've made a test run locally and git didn't seem to complain.

  1. Selecting the syntax for attributes
    Provided we would want to use just {...} instead of ${...} and provided a test is failing, just fixing that test could be a solution?

🤓 I agree, the original { ... } syntax would be ideal. However it does conflict with other aforementioned renders. I did make this renderer configurable so that the end user could specify an alternative character literals.

See configure statement here

  1. HTMLAttributes, or ExtraAttributes?
    Any thoughts on making this mechanism, or token, general? I.e. the attributes maybe don't have to be necessarily just for HTML?

🤓 HTMLAttributes are the common term used in web development for describing attributes on a given DOM node. per this feature, It's currently a general token. The token currently only supports basic markdown that renders HTML DOM nodes and Users are required to specify the HTMLAttributesRenderer vs the HTMLRenderer

Also, HTMLAttributes are being added into block_token.__all__ by this PR now, so they would be already available to all the other renderers, right?

🤓 Yes, It's also dependent on if the qualifying token string is present in the markdown ( outside of any code blocks )

And a related question: shouldn't we do this feature / parsing optional (thinking about performance and backwards compatibility here)?

I'd love to see how this performs, but generally this feature would come at a cost, but Its optional though 🤷‍♂️

mistletoe/base_renderer.py Outdated Show resolved Hide resolved
@@ -0,0 +1,86 @@
HTMLAttributesRenderer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docs!

I'd suggest to change the name of the file to something along the lines of "Extension to attach custom html attributes" to make it more clear what it's about. "Features" is so very generic.

You could also describe the feature in a paragraph in the README.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we rename the file name from features.md into extensions.md
and within extensions.md we link to any available docs for new extensions or just add extension details to a single file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we really should think this through, I think there is no docs like this in mistletoe yet... Maybe keeping most of the description right at the class itself (in pydoc) could also be the way (although I have suggested creating a dedicated md myself firstly, I know)?

features.md Outdated
HTMLAttributesRenderer
----------------------

This feature allows you to write Markdown that will render 508 compliant html attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice with an explanation (maybe a link) of what "508 compliant" means. It's just a googling away but still.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See (48c8134)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, me too had a look at what the "508 compliance" means - if I understand it correctly, this standard is all about writing "accessible documents", which happens to include using concrete HTML attributes. So while this is somewhat related to this PR, I would stay generic here, focusing just on the ability to add custom HTML attributes to the output, like this:

This renderer allows you to specify extra attributes within your Markdown text, outputting them as HTML attributes of corresponding HTML tags.

I think the docs are basically fine, but I, as a quite demanding reader myself, would like to make it a little bit more "appealing". I'm just not sure if I will have enough time to do a rewrite myself... :P

recon_tokens.append(token_type)
doc_token.children = recon_tokens

def render_strong(self, token: span_token.Strong) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the base class implementation (HTMLRenderer). Maybe I'm missing something, but if that's the case then I'd suggest to remove it from here. Less code is better.

(The same comment applies to other methods too: render_emphasis, render_inline_code, etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See (a7828be)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yet I can still see some identical methods in both classes - @hyperking, is it that you are about to add support for HTML attributes to those methods later on in this PR?

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi guys, I'm sorry, thanks for your efforts, but I don't really have much time for mistletoe nowadays.

So I have tried to give at least some new feedback today. I haven't gone through all the code in detail yet though...

mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
recon_tokens.append(token_type)
doc_token.children = recon_tokens

def render_strong(self, token: span_token.Strong) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yet I can still see some identical methods in both classes - @hyperking, is it that you are about to add support for HTML attributes to those methods later on in this PR?

mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/base_renderer.py Outdated Show resolved Hide resolved
test/test_html_attributes_renderer.py Outdated Show resolved Hide resolved
mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
title = ' title="{}"'.format(html.escape(token.title))
else:
title = ''
attr = '' if not hasattr(token,'html_props') else token.html_props
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yet I meant it literally though. :) Something like:

Suggested change
attr = '' if not hasattr(token,'html_props') else token.html_props
attr = self._render_attributes(token)
  • you can even inline this attr variable on the next line then (i.e. remove it).

@@ -143,6 +143,9 @@ def render_raw_text(self, token) -> str:
"""
return token.content

def render_html_attributes(self, token: block_token) -> str:
Copy link
Collaborator

@pbodnar pbodnar Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After moving the new stuff out of the "base" code, also move this method to HTMLAttributesRenderer, amend its type hint and return '' from it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants