Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not catch bad links that use absolute paths to repo root #75

Open
4 tasks done
eatonphil opened this issue Apr 10, 2023 · 5 comments
Open
4 tasks done

Does not catch bad links that use absolute paths to repo root #75

eatonphil opened this issue Apr 10, 2023 · 5 comments
Labels
🤞 phase/open Post is being triaged manually

Comments

@eatonphil
Copy link

Initial checklist

Affected packages and versions

12.1.0

Link to runnable example

No response

Steps to reproduce

I have a repo: https://github.com/eatonphil/docs-link-check-test. If you run ./remark-check.sh it will catch one bad link but not another bad link.

$ cat README.md
[link](/src/test.md)

This should fail
[link](/src/test.md#doesnt-exist)

But only this one actually fails
[link](./src/test.md#doesnt-exist)

Expected behavior

GitHub allows you to use absolute paths starting with / to mean the root directory of the repo. remark-validate-links should handle looking at the repo root to resolve these files.

Actual behavior

Root-absolute paths are not validated.

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Apr 10, 2023
@eatonphil
Copy link
Author

This looks like #57 but I think #57 is talking about the ability to rewrite certain root paths.

I'm not talking about that. I just want to be able to resolve files relative to the root of the same repo. No rewrites.

@wooorm
Copy link
Member

wooorm commented Apr 10, 2023

Hey!

I don’t think this was supported before.
But your repo shows that it does work.
I was under the impressions that they used to be related to github.com, like normal.

See this handling:

// Absolute paths: `/wooorm/test/blob/main/directory/example.md`.
.

A PR to solve it would likely around that line.

And it might need some more testing, to see if /wooorm/markdown-rs/... still points to some other repo on GH, or to a file in the current repo.
And perhaps some testing on Gitlab / Bitbucket?

Interested in working on this?

@eatonphil
Copy link
Author

I've got it working with this diff, but you're right there are some edge cases. I don't really care to test on gitlab/bitbucket since I'm not as familiar with them. So how do you feel about this?

git diff
diff --git a/lib/find/find-references.js b/lib/find/find-references.js
index 953dec3..f930f4d 100644
--- a/lib/find/find-references.js
+++ b/lib/find/find-references.js
@@ -217,14 +217,21 @@ export async function findReferences(ctx) {
  */
 // eslint-disable-next-line complexity
 function urlToPath(value, config, type) {
-  // Absolute paths: `/wooorm/test/blob/main/directory/example.md`.
+  // Absolute paths: `/wooorm/test/blob/main/directory/example.md` or `/directory/example.md`.
   if (value.charAt(0) === slash) {
     if (!config.urlConfig.hostname) {
       return
     }

-    // Create a URL.
-    value = https + slashes + config.urlConfig.hostname + value
+    // A root absolute path without the repo, "blob", and branch name: `/directory/example.md`.
+    // TODO: Figure out how this applies, if at all, to GitLab and Bitbucket.
+    if (!config.urlConfig.prefix.startsWith(value) && config.urlConfig.hostname === 'github.com') {
+      const valueRelativeToRoot = value.slice(1)
+      value = path.resolve(config.root, valueRelativeToRoot)
+    } else {
+      // Create a URL.
+      value = https + slashes + config.urlConfig.hostname + value
+    }
   }

   /** @type {URL|undefined} */

If that's ok I'll open a PR.

@wooorm
Copy link
Member

wooorm commented Apr 14, 2023

I recommend adding an option to urlConfig, similar to lines: boolean, which is then checked for here. Something along the lines of: resolveAbsolutePathsInRepo: boolean? That way, folks can configure whether their hosted git supports this or not.

Furthermore, you are now creating a path inside value, in your if branch.
But the value we are making is a (serialized) URL.
path.resolve and config.root are all about paths. (note: see my last paragraph of this comment later touches on this again)

For reference, I checked the following in a private repo (wooorm/private) on GitHub:

readme.md

[alpha](/remarkjs/remark-validate-links/readme.md)
[bravo](/remarkjs/remark-validate-links/blob/readme.md)
[charlie](/remarkjs/remark-validate-links/blob/main/readme.md)
[delta](/remarkjs/remark-validate-links)
[echo](/remarkjs)
[foxtrot](/x/y/z.md)

x/y/z.md (empty)

Github generates the following HTML for the readme:

<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/readme.md">alpha</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/blob/readme.md">bravo</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/blob/main/readme.md">charlie</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links">delta</a>
<a href="/wooorm/private/blob/main/remarkjs">echo</a>
<a href="/wooorm/private/blob/main/x/y/z.md">foxtrot</a>

That is to say, all absolute paths are prefixed with /wooorm/private/blob/main.
so !config.urlConfig.prefix.startsWith(value) and the current handling are not needed.
The information needed to construct /wooorm/private/blob/main, other than the branch,
is in urlConfig.prefix.
Missing a branch is fine, we don’t use it yet:

value = value.split(slash).slice(1).join(slash)
.

So, I personally would do something like the following pseudocode:

  // Absolute paths: `/path/to/file.md`.
  if (value.charAt(0) === slash) {
    if (!config.urlConfig.hostname) {
      return
    }

    // Create a URL.
    const pathname = config.urlConfig.resolveAbsolutePathsInRepo && config.urlConfig.prefix
      ? config.urlConfig.prefix + 'unknown' + value
      : value.slice(1)
    value = https + slashes + config.urlConfig.hostname + pathname
  }

…where unknown is a temporary value for a branch name, which is dropped later!

Finally, the function as I look at it now is a bit of a mix between path and URL handling.
That’s ambiguous and probably points to some bugs.
It’s probably better to investigate all that and look at it some more some other time though!

@iainelder
Copy link

Just discovered that this is missing! I thought it was working until I renamed the file that was referenced by relative-to-repo-root syntax. Looking forward to a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

No branches or pull requests

3 participants