Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: support rewriting links in code blocks #52938

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

crawford
Copy link
Contributor

Within the API documentation for modules, there is a section of pseudo-code which contains links to another file. This pseudo-code is inside of a pre-formatted element (<pre>), but that prevents the build process from rewriting those links from the Markdown source to their corresponding HTML output.

This addresses that by adding a new language, "pre", whose output remains unformatted but allows code fences to be annotated with metadata - code fences must have a language before the metadata. When the right metadata ("html") is present, it indicates that the code block should be processed so that anchor tags are rewritten just like other references.

The actual change to the documentation will happen in a later commit.

(This was split out from #52883)

Within the API documentation for modules, there is a section of
pseudo-code which contains links to another file. This pseudo-code is
inside of a pre-formatted element (`<pre>`), but that prevents the
build process from rewriting those links from the Markdown source to
their corresponding HTML output.

This addresses that by adding a new language, "pre", whose output
remains unformatted but allows code fences to be annotated with
metadata - code fences must have a language before the metadata. When
the right metadata ("html") is present, it indicates that the code
block should be processed so that anchor tags are rewritten just like
other references.

The actual change to the documentation will happen in a later commit.
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/web-infra

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels May 10, 2024
@@ -1,6 +1,7 @@
import { visit } from 'unist-util-visit';

export const referenceToLocalMdFile = /^(?![+a-z]+:)([^#?]+)\.md(#.+)?$/i;
export const referenceToLocalMdFileInPre = /<a href="([^#"]+)\.md(#[^"]+)?">/g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd way of using the capture groups also as a way to only run the replace if such capture groups match... This regex seems a bit expensive, IMO...

Since you're using AST (visit), wouldn't eventually the href elements also be looped? I wonder if a simple check of if the node lang tag is "a" when it is iterating over it and then checking if the href contains a .md; There's also the issue that this current regex will also match external links ending with .md 🤷 and that's definitely a no-go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd way of using the capture groups also as a way to only run the replace if such capture groups match

I don't follow this comment (I'm not a JavaScript developer, so it's likely I'm missing something). Is there a better way to do this? I was using the constant right above as a guide.

Since you're using AST (visit), wouldn't eventually the href elements also be looped? I wonder if a simple check of if the node lang tag is "a" when it is iterating over it and then checking if the href contains a .md

It looks like mdast can visit HTML nodes (https://github.com/syntax-tree/mdast?tab=readme-ov-file#html), of which there are roughly 4500 (mostly comments), but not the anchor tag specifically. I only see eight occurrences of HTML containing <a href, four of which are the ones I'm targeting, so it seems manageable to rewrite those. I like that approach better, since I won't have to introduce a new language tag or change that <pre> tag (though it does mean that this won't work if someone uses such a link in a code fence).

There's also the issue that this current regex will also match external links ending with .md 🤷 and that's definitely a no-go.

Yeah, good catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like mdast can visit HTML nodes (syntax-tree/mdast#html), of which there are roughly 4500 (mostly comments), but not the anchor tag specifically. I only see eight occurrences of HTML containing <a href, four of which are the ones I'm targeting, so it seems manageable to rewrite those. I like that approach better, since I won't have to introduce a new language tag or change that

 tag (though it does mean that this won't work if someone uses such a link in a code fence).

That's genuinely odd (the fact it is not going through anchor elements 🤷)

I don't follow this comment (I'm not a JavaScript developer, so it's likely I'm missing something). Is there a better way to do this? I was using the constant right above as a guide.

Regular Expressions are not a JavaScript thing. My point here is that you're running that regex through all the node.values with .replaceAll, which of course will do nothing if there are no matches, but I'm not sure this is the best of achieving this (performance-wise at least 🤔)

It worries me this whole tooling has no unit testing at all, and such regexe's could create unattended effects, is not like you've manually tested every link... (or have you?) the Regex is pretty clear, but I'd still probably add an if statement to verify if the string matches the existence of such anchor links and then running the replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point here is that you're running that regex through all the node.values with .replaceAll, which of course will do nothing if there are no matches, but I'm not sure this is the best of achieving this (performance-wise at least 🤔)

Gotcha. The current implementation only runs over a single node, since that's the only one that matches pre html, and as a result I saw a 2‰ increase in the time taken to generate the docs. With the new implementation, it will run over ~4500 elements. I haven't implemented or tested that yet to measure the performance impact.

is not like you've manually tested every link... (or have you?)

I have. I put the results in the description for #52883 (though, that also includes the changes from f125d7c). I guess I forgot to include that context here. Once I do the new implementation, I'll be sure to post the results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I wonder what would be more perfomatic:

  • Multiple iterations on small nodes
  • One big string search/replacement on the whole HTML

Feel free to do some testing here, but otherwise, I'm fine with this change :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants