Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make print page (print.html) links link to anchors on the print page #1738

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HollowMan6
Copy link

Resolves #1736

Let all the anchors id on the print page to have a path id prefix to
help locate.

e.g. bar/foo.md#abc -> #bar-foo-abc

Also append a dummy div to the start of the original page to make sure
that original page links without an anchor can also be located.

Signed-off-by: Hollow Man hollowman@opensuse.org

@HollowMan6 HollowMan6 marked this pull request as draft February 2, 2022 13:53
@HollowMan6 HollowMan6 marked this pull request as ready for review February 2, 2022 16:06
@HollowMan6
Copy link
Author

Tested on the following Rust Bookshelf books with the js code here checking broken links also unavailable anchors on print.html.

With this PR now all the links on the print page are self-contained, no broken links found except for those who are broken originally, or has self-made JavaScript that needs adaptation (ferris.js in Rust Programming Language).

Title Source Original Book Online Version
Cargo Book Source HTML
Edition Guide Source HTML
Embedded Rust Book Source HTML
Mdbook User Guide Source HTML
Rust Reference Source HTML
Rust By Example Source HTML
Rust Programming Language Source HTML
Rustc Book Source HTML
Rustdoc Book Source HTML
Rustonomicon Source HTML

Please review, Will appreciate if this PR can be merged.

cc: @ehuss

@dyaso
Copy link

dyaso commented Jul 15, 2022

This was very helpful for me, it would be nice if this could be merged so i didn't have to build HollowMan6's fork myself in order to generate PDFs whose internal links work

@ehuss ehuss added the S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. label Oct 13, 2022
@HollowMan6
Copy link
Author

Hi @ehuss , I've noticed that you added the S-waiting-on-author label to this PR recently, but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a partial review, there are some other things that I'd like to follow up on.

src/utils/mod.rs Outdated Show resolved Hide resolved
src/utils/mod.rs Outdated Show resolved Hide resolved
src/renderer/html_handlebars/hbs_renderer.rs Outdated Show resolved Hide resolved
@ehuss
Copy link
Contributor

ehuss commented Oct 17, 2022

but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

I'm so sorry. I had a partial review that I wrote a long time ago, but never finished and didn't click the submit button. But GitHub displays the review on the conversation page as-if it was submitted (with a little icon that I overlooked), so I thought I had submitted it.

@HollowMan6
Copy link
Author

This is just a partial review, there are some other things that I'd like to follow up on.

Done! Thanks for reviewing. Would love to see it get merged in the near future.

@HollowMan6
Copy link
Author

Hi @ehuss ! Any update about this PR?

@sjsadowski
Copy link

@ehuss Circling back here, recently started using mdBook and merging this would be very helpful.

Thanks in advance!

@ehuss
Copy link
Contributor

ehuss commented May 9, 2023

Unfortunately this PR looks like it still needs a lot of work. I started a longer review, but I just don't currently have the time to work through all of them. There are two classes of issues. One is the code itself, there are various places that aren't written in an idiomatic Rust style. For example, let closure = |path: Option<&Path>| add_base(path); doesn't really make much sense as Rust code goes (it does nothing). The other class is correctness in meaning, where I started to notice some issues with how some of the links are rewritten.

Here is a diff of some of the cosmetic cleanup I started on but didn't finish:

diff --git a/src/renderer/html_handlebars/hbs_renderer.rs b/src/renderer/html_handlebars/hbs_renderer.rs
index 010e1dc2c..41866a4a1 100644
--- a/src/renderer/html_handlebars/hbs_renderer.rs
+++ b/src/renderer/html_handlebars/hbs_renderer.rs
@@ -57,7 +57,7 @@ impl HtmlHandlebars {
         let content = ch.content.clone();
         let content = utils::render_markdown(&content, ctx.html_config.curly_quotes);

-        let fixed_content = utils::render_markdown_with_path_and_redirects(
+        let printed_item = utils::render_markdown_with_path_and_redirects(
             &ch.content,
             ctx.html_config.curly_quotes,
             Some(path),
@@ -70,7 +70,7 @@ impl HtmlHandlebars {
             print_content
                 .push_str(r#"<div style="break-before: page; page-break-before: always;"></div>"#);
         }
-        let path_id = {
+        let print_page_id = {
             let mut base = path.display().to_string();
             if base.ends_with(".md") {
                 base.truncate(base.len() - 3);
@@ -84,10 +84,10 @@ impl HtmlHandlebars {
         // We have to build header links in advance so that we can know the ranges
         // for the headers in one page.
         // Insert a dummy div to make sure that we can locate the specific page.
-        print_content.push_str(&(format!(r#"<div id="{}"></div>"#, &path_id)));
+        print_content.push_str(&(format!(r#"<div id="{print_page_id}"></div>"#)));
         print_content.push_str(&build_header_links(
-            &build_print_element_id(&fixed_content, &path_id),
-            Some(path_id),
+            &build_print_element_id(&printed_item, &print_page_id),
+            Some(print_page_id),
         ));

         // Update the context with data for this file
@@ -212,8 +212,11 @@ impl HtmlHandlebars {
         Ok(())
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
-    fn post_process_print(
+    /// Applies some post-processing to the HTML to apply some adjustments.
+    ///
+    /// This common function is used for both normal chapters (via
+    /// `post_process`) and the combined print page.
+    fn post_process_common(
         &self,
         rendered: String,
         playground_config: &Playground,
@@ -225,7 +228,7 @@ impl HtmlHandlebars {
         rendered
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
+    /// Applies some post-processing to the HTML to apply some adjustments.
     fn post_process(
         &self,
         rendered: String,
@@ -233,7 +236,7 @@ impl HtmlHandlebars {
         edition: Option<RustEdition>,
     ) -> String {
         let rendered = build_header_links(&rendered, None);
-        let rendered = self.post_process_print(rendered, &playground_config, edition);
+        let rendered = self.post_process_common(rendered, &playground_config, edition);

         rendered
     }
@@ -599,7 +602,7 @@ impl Renderer for HtmlHandlebars {
             let rendered = handlebars.render("index", &data)?;

             let rendered =
-                self.post_process_print(rendered, &html_config.playground, ctx.config.rust.edition);
+                self.post_process_common(rendered, &html_config.playground, ctx.config.rust.edition);

             utils::fs::write_file(destination, "print.html", rendered.as_bytes())?;
             debug!("Creating print.html ✓");
@@ -802,7 +805,7 @@ fn make_data(

 /// Go through the rendered print page HTML,
 /// add path id prefix to all the elements id as well as footnote links.
-fn build_print_element_id(html: &str, path_id: &str) -> String {
+fn build_print_element_id(html: &str, print_page_id: &str) -> String {
     static ALL_ID: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(<[^>]*?id=")([^"]+?)""#).unwrap());
     static FOOTNOTE_ID: Lazy<Regex> = Lazy::new(|| {
         Regex::new(
@@ -812,19 +815,22 @@ fn build_print_element_id(html: &str, path_id: &str) -> String {
     });

     let temp_html = ALL_ID.replace_all(html, |caps: &Captures<'_>| {
-        format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+        format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
     });

     FOOTNOTE_ID
         .replace_all(&temp_html, |caps: &Captures<'_>| {
-            format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+            format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
         })
         .into_owned()
 }

 /// Goes through the rendered HTML, making sure all header tags have
 /// an anchor respectively so people can link to sections directly.
-fn build_header_links(html: &str, path_id: Option<&str>) -> String {
+///
+/// `print_page_id` should be set to the print page ID prefix when adjusting the
+/// print page.
+fn build_header_links(html: &str, print_page_id: Option<&str>) -> String {
     static BUILD_HEADER_LINKS: Lazy<Regex> =
         Lazy::new(|| Regex::new(r"<h(\d)>(.*?)</h\d>").unwrap());

@@ -836,7 +842,7 @@ fn build_header_links(html: &str, path_id: Option<&str>) -> String {
                 .parse()
                 .expect("Regex should ensure we only ever get numbers here");

-            insert_link_into_header(level, &caps[2], &mut id_counter, path_id)
+            insert_link_into_header(level, &caps[2], &mut id_counter, print_page_id)
         })
         .into_owned()
 }
@@ -849,10 +855,11 @@ fn insert_link_into_header(
     level: usize,
     content: &str,
     id_counter: &mut HashMap<String, usize>,
-    path_id: Option<&str>,
+    print_page_id: Option<&str>,
 ) -> String {
-    let id = if let Some(path_id) = path_id {
-        utils::unique_id_from_content_with_path(content, id_counter, path_id)
+    let id = if let Some(print_page_id) = print_page_id {
+        let with_prefix = format!("{} {}", print_page_id, content);
+        utils::unique_id_from_content(&with_prefix, id_counter)
     } else {
         utils::unique_id_from_content(content, id_counter)
     };
diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index a9e1298e9..cbf170b63 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -83,14 +83,6 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
     unique_id
 }

-pub(crate) fn unique_id_from_content_with_path(
-    content: &str,
-    id_counter: &mut HashMap<String, usize>,
-    path_id: &str,
-) -> String {
-    unique_id_from_content(&format!("{} {}", path_id, content), id_counter)
-}
-
 /// Improve the path to try remove and solve .. token,
 /// This assumes that `a/b/../c` is `a/c`.
 ///
@@ -136,13 +128,8 @@ fn normalize_path_id(mut path: String) -> String {
 ///
 /// This adjusts links, such as turning `.md` extensions to `.html`.
 ///
-/// `path` is the path to the page being rendered relative to the root of the
-/// book. This is used for the `print.html` page so that links on the print
-/// page go to the anchors that has a path id prefix. Normal page rendering
-/// sets `path` to None.
-///
-/// `redirects` is also only for the print page. It's for adjusting links to
-/// a redirected location to go to the correct spot on the `print.html` page.
+/// See [`render_markdown_with_path_and_redirects`] for a description of
+/// `path` and `redirects`.
 fn adjust_links<'a>(
     event: Event<'a>,
     path: Option<&Path>,
@@ -377,7 +364,16 @@ pub fn new_cmark_parser(text: &str, curly_quotes: bool) -> Parser<'_, '_> {
     Parser::new_ext(text, opts)
 }

-pub fn render_markdown_with_path_and_redirects(
+/// Renders markdown to HTML.
+///
+/// `path` is the path to the page being rendered relative to the root of the
+/// book. This is used for the `print.html` page so that links on the print
+/// page go to the anchors that has a path id prefix. Normal page rendering
+/// sets `path` to None.
+///
+/// `redirects` is also only for the print page. It's for adjusting links to
+/// a redirected location to go to the correct spot on the `print.html` page.
+pub(crate) fn render_markdown_with_path_and_redirects(
     text: &str,
     curly_quotes: bool,
     path: Option<&Path>,

If I find the time, I'll try to come back to this. Alternatively, if there is someone who is an experienced Rust developer who could help with the review here, that may help.

@sjsadowski
Copy link

@ehuss I wish I could help out, I'm mostly just consuming mdBook - my rust is amateurish at best. Thank you for the quick response, though, it helps with clarity on where things are!

@HollowMan6
Copy link
Author

@ehuss Glad to know that this PR is actually been handled by the maintainer all the time! Feel free to commit code directly to this PR, I always keep "Allow edits by maintainers" on.

@tetsushiawano
Copy link

i'm using hallowman fork to create pdf files now. waiting for this branch to move to master.

@HollowMan6
Copy link
Author

Hi, @ehuss! It took me some time to resolve the conflicts this time, so I hope this can get merged soon, any plans for continue reviewing?

@Dylan-DPC
Copy link
Member

Thanks. We will review it when we get the time so don't worry :)

@HollowMan6
Copy link
Author

HollowMan6 commented Feb 8, 2024

I checked the broken links again with the methods described at: #1738 (comment) and fixed the following issues:

Now no other broken links are found with the current version.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Let all the anchors id on the print page to have a path id prefix to
help locate.

e.g. bar/foo.md#abc -> #bar-foo-abc

Also append a dummy div to the start of the original page to make sure
that original page links without an anchor can also be located.

Fix to remove all the `./` in the normalized path id so that for
"./foo/bar.html#abc" we still get "#foo-bar-abc"

Add support for redirect link anchors in print page so that anchors can
also be redirected, also handle URL redirect links on print page

Handle all the elements id to add a path prefix, also make path id to
all be the lower case

Fix for print page footnote links by adding the path id prefix

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert all the links in the generated print.html for linking inside the book into URL fragment form
7 participants