Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link checker fails internal links #1046

Open
klieret opened this issue Jan 18, 2022 · 15 comments · Fixed by #1051
Open

Link checker fails internal links #1046

klieret opened this issue Jan 18, 2022 · 15 comments · Fixed by #1051
Labels
Framework Everything related to the framework of the homepage and its documentation. help wanted The OP needs help to solve this issue. A call for everyone to take a look.

Comments

@klieret
Copy link
Member

klieret commented Jan 18, 2022

The link checker fails all links in recent PRs, see e.g. #1045 👍

All the link check failures seem to be the same kind as yesterday and all false positives. Maybe @klieret has an idea - it seems that {{site.baseurl}} is expanding to an empty string and then the URL checker is trying for an absolute path, instead of a relative one.

@klieret
Copy link
Member Author

klieret commented Jan 18, 2022

We had to slightly tweak the link checker before to resolve internal links (see this change).

I wonder if something goes wrong with the regular expression that is used there to cause this...

@klieret
Copy link
Member Author

klieret commented Jan 18, 2022

Hmm, I have no idea why this isn't working.

I confirmed with a find statement for example for the existence of

/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html

or even with file:

/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html: HTML document, UTF-8 Unicode text, with very long lines

But then later the link checker complains that

2022-01-18T17:37:10.3403364Z [✖] /__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'] {
2022-01-18T17:37:10.3403782Z   errno: -2,
2022-01-18T17:37:10.3404010Z   code: 'ENOENT',
2022-01-18T17:37:10.3404232Z   syscall: 'access',
2022-01-18T17:37:10.3404597Z   path: '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'
2022-01-18T17:37:10.3404854Z }

It's as if the markdown link checker can't see the files generated by the previous steps, but that shouldn't be....

@klieret
Copy link
Member Author

klieret commented Jan 18, 2022

This is in fact the same issue that is reported here: tcort/markdown-link-check#96 (but it doesn't provide much information that we don't know already)

@klieret
Copy link
Member Author

klieret commented Jan 18, 2022

I'm currently out of ideas @graeme-a-stewart

What we could do as a non-perfect solution is to use the replacement pattern to prefix https://hepsoftwarefoundation.org. This would fail if you add a new markdown file and then directly link to it in the same PR, but should work for all the other links.

@klieret klieret added help wanted The OP needs help to solve this issue. A call for everyone to take a look. Framework Everything related to the framework of the homepage and its documentation. labels Jan 18, 2022
@graeme-a-stewart
Copy link
Member

Just spitballing here, would it help if instead of

    "pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
    "replacement": "/_site/$1"

we made it a relative path, viz.

    "pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
    "replacement": "./_site/$1"

@klieret
Copy link
Member Author

klieret commented Jan 21, 2022

Nope, that also doesn't work:

[✖] _site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/github/workspace/_workinggroups/_site/training/curriculum.html'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'access',
  path: '/github/workspace/_workinggroups/_site/training/curriculum.html'
}

In fact, it looks like the link checker actually changes directories to the current file.
I don't think it did that in the past. This is probably made the previous solution fail.

@graeme-a-stewart
Copy link
Member

Thanks @klieret. So do we need to construct the full absolute path then? Slipping a $(pwd) in there somewhere? In the Github Action CI we do know what the absolute path is, right?

@klieret
Copy link
Member Author

klieret commented Jan 21, 2022

The thing is, I already tried with full absolute paths (it's static, so we can just had code it) and it failed as well. That's what's confusing me.

I just tried with a local installation of markdown-link-check and there it works with absolute paths.

Let me try again on the gh action

@klieret
Copy link
Member Author

klieret commented Jan 21, 2022

No, absolute paths don't work either. Reproduced my previous comment again.

klieret added a commit that referenced this issue Jan 21, 2022
This is only a half-hearted fix: It will fail if you create a new page
and link to it before it is published.
See #1046 for more
information.
@klieret klieret linked a pull request Jan 21, 2022 that will close this issue
hegner pushed a commit that referenced this issue Jan 25, 2022
* Fix link checker for most cases

This is only a half-hearted fix: It will fail if you create a new page
and link to it before it is published.
See #1046 for more
information.
@klieret klieret reopened this Jan 25, 2022
@klieret
Copy link
Member Author

klieret commented Jan 25, 2022

(Though linked to this issue, the merged PR is only a partial fix, so I'm keeping this open)

@hegner
Copy link
Member

hegner commented Feb 14, 2023

Isn't this issue solved now?

@klieret
Copy link
Member Author

klieret commented Feb 14, 2023

There should still be one loophole, though it doesn't seem to come up often in practice:

This is only a half-hearted fix: It will fail if you create a new page
and link to it before it is published.

(from my notes to #1051)

@klieret
Copy link
Member Author

klieret commented Feb 14, 2023

Note that this edge case is not triggered by e.g., new GSoC pages, because there the interlinking (to project/organization etc.) is generated from the yaml frontmatter, so the markdown link checker doesn't find anything to check.

@klieret
Copy link
Member Author

klieret commented Feb 14, 2023

Though looking at this again, I wonder if we could use absolute local paths for the replacement and then set baseURL, projectBaseURL to ensure that they are not with respect to the base directory but really to the root of the file system. I think this might be a setting that we missed previously.

@testgithubsonika
Copy link

I want to contribute to this issue .please assign me if have seen this then let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework Everything related to the framework of the homepage and its documentation. help wanted The OP needs help to solve this issue. A call for everyone to take a look.
Projects
None yet
4 participants