tools: support rewriting links in code blocks #52938

crawford · 2024-05-10T18:48:13Z

Within the API documentation for modules, there is a section of pseudo-code which contains links to another file. This pseudo-code is inside of a pre-formatted element (<pre>), but that prevents the build process from rewriting those links from the Markdown source to their corresponding HTML output.

This addresses that by adding a new language, "pre", whose output remains unformatted but allows code fences to be annotated with metadata - code fences must have a language before the metadata. When the right metadata ("html") is present, it indicates that the code block should be processed so that anchor tags are rewritten just like other references.

The actual change to the documentation will happen in a later commit.

(This was split out from #52883)

Within the API documentation for modules, there is a section of pseudo-code which contains links to another file. This pseudo-code is inside of a pre-formatted element (`<pre>`), but that prevents the build process from rewriting those links from the Markdown source to their corresponding HTML output. This addresses that by adding a new language, "pre", whose output remains unformatted but allows code fences to be annotated with metadata - code fences must have a language before the metadata. When the right metadata ("html") is present, it indicates that the code block should be processed so that anchor tags are rewritten just like other references. The actual change to the documentation will happen in a later commit.

nodejs-github-bot · 2024-05-10T18:48:19Z

Review requested:

@nodejs/web-infra

ovflowd · 2024-05-10T23:09:53Z

tools/doc/markdown.mjs

@@ -1,6 +1,7 @@
 import { visit } from 'unist-util-visit';

 export const referenceToLocalMdFile = /^(?![+a-z]+:)([^#?]+)\.md(#.+)?$/i;
+export const referenceToLocalMdFileInPre = /<a href="([^#"]+)\.md(#[^"]+)?">/g;


This is an odd way of using the capture groups also as a way to only run the replace if such capture groups match... This regex seems a bit expensive, IMO...

Since you're using AST (visit), wouldn't eventually the href elements also be looped? I wonder if a simple check of if the node lang tag is "a" when it is iterating over it and then checking if the href contains a .md; There's also the issue that this current regex will also match external links ending with .md 🤷 and that's definitely a no-go.

This is an odd way of using the capture groups also as a way to only run the replace if such capture groups match

I don't follow this comment (I'm not a JavaScript developer, so it's likely I'm missing something). Is there a better way to do this? I was using the constant right above as a guide.

Since you're using AST (visit), wouldn't eventually the href elements also be looped? I wonder if a simple check of if the node lang tag is "a" when it is iterating over it and then checking if the href contains a .md

It looks like mdast can visit HTML nodes (https://github.com/syntax-tree/mdast?tab=readme-ov-file#html), of which there are roughly 4500 (mostly comments), but not the anchor tag specifically. I only see eight occurrences of HTML containing <a href, four of which are the ones I'm targeting, so it seems manageable to rewrite those. I like that approach better, since I won't have to introduce a new language tag or change that <pre> tag (though it does mean that this won't work if someone uses such a link in a code fence).

There's also the issue that this current regex will also match external links ending with .md 🤷 and that's definitely a no-go.

Yeah, good catch.

It looks like mdast can visit HTML nodes (syntax-tree/mdast#html), of which there are roughly 4500 (mostly comments), but not the anchor tag specifically. I only see eight occurrences of HTML containing <a href, four of which are the ones I'm targeting, so it seems manageable to rewrite those. I like that approach better, since I won't have to introduce a new language tag or change that
tag (though it does mean that this won't work if someone uses such a link in a code fence).

That's genuinely odd (the fact it is not going through anchor elements 🤷)

I don't follow this comment (I'm not a JavaScript developer, so it's likely I'm missing something). Is there a better way to do this? I was using the constant right above as a guide.

Regular Expressions are not a JavaScript thing. My point here is that you're running that regex through all the node.values with .replaceAll, which of course will do nothing if there are no matches, but I'm not sure this is the best of achieving this (performance-wise at least 🤔)

It worries me this whole tooling has no unit testing at all, and such regexe's could create unattended effects, is not like you've manually tested every link... (or have you?) the Regex is pretty clear, but I'd still probably add an if statement to verify if the string matches the existence of such anchor links and then running the replacement.

My point here is that you're running that regex through all the node.values with .replaceAll, which of course will do nothing if there are no matches, but I'm not sure this is the best of achieving this (performance-wise at least 🤔)

Gotcha. The current implementation only runs over a single node, since that's the only one that matches pre html, and as a result I saw a 2‰ increase in the time taken to generate the docs. With the new implementation, it will run over ~4500 elements. I haven't implemented or tested that yet to measure the performance impact.

is not like you've manually tested every link... (or have you?)

I have. I put the results in the description for #52883 (though, that also includes the changes from f125d7c). I guess I forgot to include that context here. Once I do the new implementation, I'll be sure to post the results.

Right, I wonder what would be more perfomatic:

Multiple iterations on small nodes

One big string search/replacement on the whole HTML

Feel free to do some testing here, but otherwise, I'm fine with this change :)

nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels May 10, 2024

crawford mentioned this pull request May 10, 2024

doc: fix links to ECMAScript module resolution #52883

Closed

ovflowd reviewed May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools: support rewriting links in code blocks #52938

tools: support rewriting links in code blocks #52938

crawford commented May 10, 2024

nodejs-github-bot commented May 10, 2024

ovflowd May 10, 2024

crawford May 13, 2024

ovflowd May 19, 2024

crawford May 20, 2024

ovflowd May 21, 2024

tools: support rewriting links in code blocks #52938

Are you sure you want to change the base?

tools: support rewriting links in code blocks #52938

Conversation

crawford commented May 10, 2024

nodejs-github-bot commented May 10, 2024

ovflowd May 10, 2024

Choose a reason for hiding this comment

crawford May 13, 2024

Choose a reason for hiding this comment

ovflowd May 19, 2024

Choose a reason for hiding this comment

crawford May 20, 2024

Choose a reason for hiding this comment

ovflowd May 21, 2024

Choose a reason for hiding this comment