Make print page (print.html) links link to anchors on the print page #1738

HollowMan6 · 2022-02-02T13:39:09Z

Resolves #1736

Let all the anchors id on the print page to have a path id prefix to
help locate.

e.g. bar/foo.md#abc -> #bar-foo-abc

Also append a dummy div to the start of the original page to make sure
that original page links without an anchor can also be located.

Signed-off-by: Hollow Man hollowman@opensuse.org

HollowMan6 · 2022-02-10T01:44:47Z

Tested on the following Rust Bookshelf books with the js code here checking broken links also unavailable anchors on print.html.

With this PR now all the links on the print page are self-contained, no broken links found except for those who are broken originally, or has self-made JavaScript that needs adaptation (ferris.js in Rust Programming Language).

Title	Source	Original Book Online Version
Cargo Book	Source	HTML
Edition Guide	Source	HTML
Embedded Rust Book	Source	HTML
Mdbook User Guide	Source	HTML
Rust Reference	Source	HTML
Rust By Example	Source	HTML
Rust Programming Language	Source	HTML
Rustc Book	Source	HTML
Rustdoc Book	Source	HTML
Rustonomicon	Source	HTML

Please review, Will appreciate if this PR can be merged.

cc: @ehuss

dyaso · 2022-07-15T23:16:28Z

This was very helpful for me, it would be nice if this could be merged so i didn't have to build HollowMan6's fork myself in order to generate PDFs whose internal links work

HollowMan6 · 2022-10-16T14:50:05Z

Hi @ehuss , I've noticed that you added the S-waiting-on-author label to this PR recently, but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

ehuss

This is just a partial review, there are some other things that I'd like to follow up on.

src/utils/mod.rs

src/renderer/html_handlebars/hbs_renderer.rs

ehuss · 2022-10-17T15:24:44Z

but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

I'm so sorry. I had a partial review that I wrote a long time ago, but never finished and didn't click the submit button. But GitHub displays the review on the conversation page as-if it was submitted (with a little icon that I overlooked), so I thought I had submitted it.

HollowMan6 · 2022-10-17T17:02:30Z

This is just a partial review, there are some other things that I'd like to follow up on.

Done! Thanks for reviewing. Would love to see it get merged in the near future.

HollowMan6 · 2023-04-07T13:14:14Z

Hi @ehuss ! Any update about this PR?

sjsadowski · 2023-05-09T12:33:21Z

@ehuss Circling back here, recently started using mdBook and merging this would be very helpful.

Thanks in advance!

ehuss · 2023-05-09T14:13:10Z

Unfortunately this PR looks like it still needs a lot of work. I started a longer review, but I just don't currently have the time to work through all of them. There are two classes of issues. One is the code itself, there are various places that aren't written in an idiomatic Rust style. For example, let closure = |path: Option<&Path>| add_base(path); doesn't really make much sense as Rust code goes (it does nothing). The other class is correctness in meaning, where I started to notice some issues with how some of the links are rewritten.

Here is a diff of some of the cosmetic cleanup I started on but didn't finish:

diff --git a/src/renderer/html_handlebars/hbs_renderer.rs b/src/renderer/html_handlebars/hbs_renderer.rs
index 010e1dc2c..41866a4a1 100644
--- a/src/renderer/html_handlebars/hbs_renderer.rs
+++ b/src/renderer/html_handlebars/hbs_renderer.rs
@@ -57,7 +57,7 @@ impl HtmlHandlebars {
         let content = ch.content.clone();
         let content = utils::render_markdown(&content, ctx.html_config.curly_quotes);

-        let fixed_content = utils::render_markdown_with_path_and_redirects(
+        let printed_item = utils::render_markdown_with_path_and_redirects(
             &ch.content,
             ctx.html_config.curly_quotes,
             Some(path),
@@ -70,7 +70,7 @@ impl HtmlHandlebars {
             print_content
                 .push_str(r#"<div style="break-before: page; page-break-before: always;"></div>"#);
         }
-        let path_id = {
+        let print_page_id = {
             let mut base = path.display().to_string();
             if base.ends_with(".md") {
                 base.truncate(base.len() - 3);
@@ -84,10 +84,10 @@ impl HtmlHandlebars {
         // We have to build header links in advance so that we can know the ranges
         // for the headers in one page.
         // Insert a dummy div to make sure that we can locate the specific page.
-        print_content.push_str(&(format!(r#"<div id="{}"></div>"#, &path_id)));
+        print_content.push_str(&(format!(r#"<div id="{print_page_id}"></div>"#)));
         print_content.push_str(&build_header_links(
-            &build_print_element_id(&fixed_content, &path_id),
-            Some(path_id),
+            &build_print_element_id(&printed_item, &print_page_id),
+            Some(print_page_id),
         ));

         // Update the context with data for this file
@@ -212,8 +212,11 @@ impl HtmlHandlebars {
         Ok(())
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
-    fn post_process_print(
+    /// Applies some post-processing to the HTML to apply some adjustments.
+    ///
+    /// This common function is used for both normal chapters (via
+    /// `post_process`) and the combined print page.
+    fn post_process_common(
         &self,
         rendered: String,
         playground_config: &Playground,
@@ -225,7 +228,7 @@ impl HtmlHandlebars {
         rendered
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
+    /// Applies some post-processing to the HTML to apply some adjustments.
     fn post_process(
         &self,
         rendered: String,
@@ -233,7 +236,7 @@ impl HtmlHandlebars {
         edition: Option<RustEdition>,
     ) -> String {
         let rendered = build_header_links(&rendered, None);
-        let rendered = self.post_process_print(rendered, &playground_config, edition);
+        let rendered = self.post_process_common(rendered, &playground_config, edition);

         rendered
     }
@@ -599,7 +602,7 @@ impl Renderer for HtmlHandlebars {
             let rendered = handlebars.render("index", &data)?;

             let rendered =
-                self.post_process_print(rendered, &html_config.playground, ctx.config.rust.edition);
+                self.post_process_common(rendered, &html_config.playground, ctx.config.rust.edition);

             utils::fs::write_file(destination, "print.html", rendered.as_bytes())?;
             debug!("Creating print.html ✓");
@@ -802,7 +805,7 @@ fn make_data(

 /// Go through the rendered print page HTML,
 /// add path id prefix to all the elements id as well as footnote links.
-fn build_print_element_id(html: &str, path_id: &str) -> String {
+fn build_print_element_id(html: &str, print_page_id: &str) -> String {
     static ALL_ID: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(<[^>]*?id=")([^"]+?)""#).unwrap());
     static FOOTNOTE_ID: Lazy<Regex> = Lazy::new(|| {
         Regex::new(
@@ -812,19 +815,22 @@ fn build_print_element_id(html: &str, path_id: &str) -> String {
     });

     let temp_html = ALL_ID.replace_all(html, |caps: &Captures<'_>| {
-        format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+        format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
     });

     FOOTNOTE_ID
         .replace_all(&temp_html, |caps: &Captures<'_>| {
-            format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+            format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
         })
         .into_owned()
 }

 /// Goes through the rendered HTML, making sure all header tags have
 /// an anchor respectively so people can link to sections directly.
-fn build_header_links(html: &str, path_id: Option<&str>) -> String {
+///
+/// `print_page_id` should be set to the print page ID prefix when adjusting the
+/// print page.
+fn build_header_links(html: &str, print_page_id: Option<&str>) -> String {
     static BUILD_HEADER_LINKS: Lazy<Regex> =
         Lazy::new(|| Regex::new(r"<h(\d)>(.*?)</h\d>").unwrap());

@@ -836,7 +842,7 @@ fn build_header_links(html: &str, path_id: Option<&str>) -> String {
                 .parse()
                 .expect("Regex should ensure we only ever get numbers here");

-            insert_link_into_header(level, &caps[2], &mut id_counter, path_id)
+            insert_link_into_header(level, &caps[2], &mut id_counter, print_page_id)
         })
         .into_owned()
 }
@@ -849,10 +855,11 @@ fn insert_link_into_header(
     level: usize,
     content: &str,
     id_counter: &mut HashMap<String, usize>,
-    path_id: Option<&str>,
+    print_page_id: Option<&str>,
 ) -> String {
-    let id = if let Some(path_id) = path_id {
-        utils::unique_id_from_content_with_path(content, id_counter, path_id)
+    let id = if let Some(print_page_id) = print_page_id {
+        let with_prefix = format!("{} {}", print_page_id, content);
+        utils::unique_id_from_content(&with_prefix, id_counter)
     } else {
         utils::unique_id_from_content(content, id_counter)
     };
diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index a9e1298e9..cbf170b63 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -83,14 +83,6 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
     unique_id
 }

-pub(crate) fn unique_id_from_content_with_path(
-    content: &str,
-    id_counter: &mut HashMap<String, usize>,
-    path_id: &str,
-) -> String {
-    unique_id_from_content(&format!("{} {}", path_id, content), id_counter)
-}
-
 /// Improve the path to try remove and solve .. token,
 /// This assumes that `a/b/../c` is `a/c`.
 ///
@@ -136,13 +128,8 @@ fn normalize_path_id(mut path: String) -> String {
 ///
 /// This adjusts links, such as turning `.md` extensions to `.html`.
 ///
-/// `path` is the path to the page being rendered relative to the root of the
-/// book. This is used for the `print.html` page so that links on the print
-/// page go to the anchors that has a path id prefix. Normal page rendering
-/// sets `path` to None.
-///
-/// `redirects` is also only for the print page. It's for adjusting links to
-/// a redirected location to go to the correct spot on the `print.html` page.
+/// See [`render_markdown_with_path_and_redirects`] for a description of
+/// `path` and `redirects`.
 fn adjust_links<'a>(
     event: Event<'a>,
     path: Option<&Path>,
@@ -377,7 +364,16 @@ pub fn new_cmark_parser(text: &str, curly_quotes: bool) -> Parser<'_, '_> {
     Parser::new_ext(text, opts)
 }

-pub fn render_markdown_with_path_and_redirects(
+/// Renders markdown to HTML.
+///
+/// `path` is the path to the page being rendered relative to the root of the
+/// book. This is used for the `print.html` page so that links on the print
+/// page go to the anchors that has a path id prefix. Normal page rendering
+/// sets `path` to None.
+///
+/// `redirects` is also only for the print page. It's for adjusting links to
+/// a redirected location to go to the correct spot on the `print.html` page.
+pub(crate) fn render_markdown_with_path_and_redirects(
     text: &str,
     curly_quotes: bool,
     path: Option<&Path>,

If I find the time, I'll try to come back to this. Alternatively, if there is someone who is an experienced Rust developer who could help with the review here, that may help.

sjsadowski · 2023-05-09T14:24:19Z

@ehuss I wish I could help out, I'm mostly just consuming mdBook - my rust is amateurish at best. Thank you for the quick response, though, it helps with clarity on where things are!

HollowMan6 · 2023-05-09T16:22:19Z

@ehuss Glad to know that this PR is actually been handled by the maintainer all the time! Feel free to commit code directly to this PR, I always keep "Allow edits by maintainers" on.

tetsushiawano · 2023-10-29T23:41:11Z

i'm using hallowman fork to create pdf files now. waiting for this branch to move to master.

HollowMan6 · 2024-02-07T17:48:16Z

Hi, @ehuss! It took me some time to resolve the conflicts this time, so I hope this can get merged soon, any plans for continue reviewing?

Dylan-DPC · 2024-02-08T13:00:30Z

Thanks. We will review it when we get the time so don't worry :)

HollowMan6 · 2024-02-08T17:52:12Z

I checked the broken links again with the methods described at: #1738 (comment) and fixed the following issues:

Fix header links such as: (should be type-layout-the-rust-representation instead of type-layout--the-rust-representation)
https://github.com/rust-lang/reference/blob/5be836c39a8f6a990ce5da17955cb53bac4b18a8/src/type-layout.md?plain=1#L162
Fix link to <a name="xxx"></a> (we should also add path id to those as well https://github.com/rust-lang/rustc-dev-guide/blob/af8e2fe2f8be94e84d6269563ad71b715eb962fe/src/variance.md?plain=1#L146)
Fix mailto links (shouldn't append path ID in this case)

Now no other broken links are found with the current version.

Signed-off-by: Hollow Man <hollowman@opensuse.org>

Let all the anchors id on the print page to have a path id prefix to help locate. e.g. bar/foo.md#abc -> #bar-foo-abc Also append a dummy div to the start of the original page to make sure that original page links without an anchor can also be located. Fix to remove all the `./` in the normalized path id so that for "./foo/bar.html#abc" we still get "#foo-bar-abc" Add support for redirect link anchors in print page so that anchors can also be redirected, also handle URL redirect links on print page Handle all the elements id to add a path prefix, also make path id to all be the lower case Fix for print page footnote links by adding the path id prefix Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 force-pushed the master branch from 1768f71 to 8de5a91 Compare February 2, 2022 13:43

HollowMan6 marked this pull request as draft February 2, 2022 13:53

HollowMan6 force-pushed the master branch 7 times, most recently from 39ffbc0 to e3cca8f Compare February 2, 2022 16:00

HollowMan6 marked this pull request as ready for review February 2, 2022 16:06

HollowMan6 force-pushed the master branch from d06af86 to c25736d Compare April 15, 2022 06:49

HollowMan6 force-pushed the master branch 2 times, most recently from 713c825 to a1bf5cb Compare May 8, 2022 14:59

HollowMan6 force-pushed the master branch from a1bf5cb to 743aed4 Compare July 4, 2022 03:37

HollowMan6 force-pushed the master branch from 743aed4 to f6e7170 Compare July 16, 2022 03:03

HollowMan6 force-pushed the master branch from f6e7170 to 2670796 Compare August 21, 2022 14:48

HollowMan6 force-pushed the master branch 3 times, most recently from f1ceb69 to 38ecc41 Compare September 27, 2022 21:52

HollowMan6 force-pushed the master branch from 38ecc41 to 655f0a8 Compare October 13, 2022 22:15

ehuss added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Oct 13, 2022

HollowMan6 force-pushed the master branch 2 times, most recently from 57db987 to 00d56ab Compare October 16, 2022 08:28

ehuss reviewed Oct 17, 2022

View reviewed changes

src/utils/mod.rs Outdated Show resolved Hide resolved

src/utils/mod.rs Outdated Show resolved Hide resolved

src/renderer/html_handlebars/hbs_renderer.rs Outdated Show resolved Hide resolved

HollowMan6 force-pushed the master branch from 2268e9c to a35b643 Compare October 17, 2022 16:53

HollowMan6 mentioned this pull request Apr 7, 2023

mdbook-pdf-outline doesn't render anything HollowMan6/mdbook-pdf#18

Closed

HollowMan6 force-pushed the master branch from d56513b to 1fbfa21 Compare April 29, 2023 17:58

HollowMan6 mentioned this pull request Jun 9, 2023

Parse TOC from JSON and add PDF metadata HollowMan6/mdbook-pdf#24

Draft

HollowMan6 force-pushed the master branch 4 times, most recently from 2ce21d2 to 0dcdd19 Compare June 11, 2023 20:17

HollowMan6 force-pushed the master branch from 0dcdd19 to 4267afd Compare July 19, 2023 19:22

HollowMan6 force-pushed the master branch from 4267afd to 7725bac Compare August 4, 2023 18:00

HollowMan6 force-pushed the master branch from 7725bac to 91fdcef Compare September 9, 2023 15:19

HollowMan6 force-pushed the master branch from 91fdcef to 311eb7d Compare September 30, 2023 08:11

HollowMan6 force-pushed the master branch from 311eb7d to fceced4 Compare November 30, 2023 21:14

HollowMan6 force-pushed the master branch from fceced4 to 85c5e39 Compare January 26, 2024 20:30

HollowMan6 force-pushed the master branch from 85c5e39 to fae5f79 Compare February 7, 2024 17:43

HollowMan6 force-pushed the master branch 2 times, most recently from c4e9a93 to edf3067 Compare February 8, 2024 17:34

HollowMan6 force-pushed the master branch from edf3067 to 5830c95 Compare February 8, 2024 18:51

HollowMan6 force-pushed the master branch from 5830c95 to 133dc11 Compare March 6, 2024 20:51

HollowMan6 added 2 commits April 27, 2024 20:55

Add Songlin as a contributor

9877a68

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 force-pushed the master branch from 133dc11 to 5b6b5e2 Compare April 27, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make print page (print.html) links link to anchors on the print page #1738

Make print page (print.html) links link to anchors on the print page #1738

HollowMan6 commented Feb 2, 2022

HollowMan6 commented Feb 10, 2022

dyaso commented Jul 15, 2022

HollowMan6 commented Oct 16, 2022

ehuss left a comment

ehuss commented Oct 17, 2022

HollowMan6 commented Oct 17, 2022

HollowMan6 commented Apr 7, 2023

sjsadowski commented May 9, 2023

ehuss commented May 9, 2023

sjsadowski commented May 9, 2023

HollowMan6 commented May 9, 2023

tetsushiawano commented Oct 29, 2023

HollowMan6 commented Feb 7, 2024

Dylan-DPC commented Feb 8, 2024

HollowMan6 commented Feb 8, 2024 •

edited

Make print page (print.html) links link to anchors on the print page #1738

Are you sure you want to change the base?

Make print page (print.html) links link to anchors on the print page #1738

Conversation

HollowMan6 commented Feb 2, 2022

HollowMan6 commented Feb 10, 2022

dyaso commented Jul 15, 2022

HollowMan6 commented Oct 16, 2022

ehuss left a comment

Choose a reason for hiding this comment

ehuss commented Oct 17, 2022

HollowMan6 commented Oct 17, 2022

HollowMan6 commented Apr 7, 2023

sjsadowski commented May 9, 2023

ehuss commented May 9, 2023

sjsadowski commented May 9, 2023

HollowMan6 commented May 9, 2023

tetsushiawano commented Oct 29, 2023

HollowMan6 commented Feb 7, 2024

Dylan-DPC commented Feb 8, 2024

HollowMan6 commented Feb 8, 2024 • edited

HollowMan6 commented Feb 8, 2024 •

edited