You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an HTML file contains a lot of links that refer to anchors in the same (or a different) file, linkchecker wastes a lot of time and bandwidth because it downloads the file for each link to the anchor it checks. This isn't necessary and slows down testing significantly.
For example, an HTML file contains 1000 anchors and, additionally, one reference to each of these anchors, then linkchecker downloads the file 1000 times to check all anchors. This take more than 2 minutes.
Steps to reproduce
Create an HTML file that contains 1000 sections with anchors, and a link to each of the anchors. You can use the following script to generate such an HTML file:
echo "<html><head><title>Demo</title></head><body>" > test.html
for i in `seq 1 1000` ; do
echo "<h1 id=\"anchor$i\">Example $i</h1>" >> test.html
echo "<a href=\"#anchor$i\">Link to section Example $i.</a>" >> test.html
done
echo "</body></html>" >> test.html
Store the generated HTML file on a web server.
Ensure that the web server sends a LinkChecker header to prevent that linkchecker throttles the connection.
If multiple referenced anchors are within the same file, it would be much more efficient to download that file only once and perform all anchor checks at once.
For example, linkchecker should test main.html. This file contains 100 links to anchors that are in the same file + 100 links to anchors in external.html, then the following would be efficient:
Aggregate all links with anchors by file name (main.html, external.html)
Check all anchors that refer to the file itself <a href="#..."> (main.html - at this time, the content was already loaded, because that's the file we test)
Download external.html.
Check all anchors that were linked in main.html and reference to an anchor in external.html
Environment
Operating system: Linux demo 6.6.6-200.fc39.x86_64 # 1 SMP PREEMPT_DYNAMIC Mon Dec 11 17:29:08 UTC 2023 x86_64 GNU/Linux
Linkchecker version: 10.4.0
Python version: 3.12.0
Install method: Cloned from git repository
The text was updated successfully, but these errors were encountered:
I have also noticed that my documentation takes a very long time to check despite all of the files being local. They have a lot of anchor links, so I suspect this is the root cause.
With all of that being said, thank you so much to the authors of the anchor check plugin. So few linkchecks support checking anchors..
Summary
If an HTML file contains a lot of links that refer to anchors in the same (or a different) file, linkchecker wastes a lot of time and bandwidth because it downloads the file for each link to the anchor it checks. This isn't necessary and slows down testing significantly.
For example, an HTML file contains 1000 anchors and, additionally, one reference to each of these anchors, then linkchecker downloads the file 1000 times to check all anchors. This take more than 2 minutes.
Steps to reproduce
Create an HTML file that contains 1000 sections with anchors, and a link to each of the anchors. You can use the following script to generate such an HTML file:
Store the generated HTML file on a web server.
Ensure that the web server sends a LinkChecker header to prevent that linkchecker throttles the connection.
Create
/tmp/linkcheckerrc
with the following content:Run linkchecker:
Actual result
Linkchecker downloads the test.html for each link to an anchor within that file again (1000x), which is unnecessary.
On the web server, you can also see that the file was downloaded 1000 times:
Expected result
If multiple referenced anchors are within the same file, it would be much more efficient to download that file only once and perform all anchor checks at once.
For example, linkchecker should test main.html. This file contains 100 links to anchors that are in the same file + 100 links to anchors in external.html, then the following would be efficient:
<a href="#...">
(main.html - at this time, the content was already loaded, because that's the file we test)Environment
The text was updated successfully, but these errors were encountered: