New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make print page (print.html) links link to anchors on the print page #1738
base: master
Are you sure you want to change the base?
Conversation
39ffbc0
to
e3cca8f
Compare
Tested on the following Rust Bookshelf books with the js code here checking broken links also unavailable anchors on With this PR now all the links on the print page are self-contained, no broken links found except for those who are broken originally, or has self-made JavaScript that needs adaptation (ferris.js in Rust Programming Language).
Please review, Will appreciate if this PR can be merged. cc: @ehuss |
713c825
to
a1bf5cb
Compare
This was very helpful for me, it would be nice if this could be merged so i didn't have to build HollowMan6's fork myself in order to generate PDFs whose internal links work |
f1ceb69
to
38ecc41
Compare
57db987
to
00d56ab
Compare
Hi @ehuss , I've noticed that you added the S-waiting-on-author label to this PR recently, but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a partial review, there are some other things that I'd like to follow up on.
I'm so sorry. I had a partial review that I wrote a long time ago, but never finished and didn't click the submit button. But GitHub displays the review on the conversation page as-if it was submitted (with a little icon that I overlooked), so I thought I had submitted it. |
Done! Thanks for reviewing. Would love to see it get merged in the near future. |
Hi @ehuss ! Any update about this PR? |
@ehuss Circling back here, recently started using mdBook and merging this would be very helpful. Thanks in advance! |
Unfortunately this PR looks like it still needs a lot of work. I started a longer review, but I just don't currently have the time to work through all of them. There are two classes of issues. One is the code itself, there are various places that aren't written in an idiomatic Rust style. For example, Here is a diff of some of the cosmetic cleanup I started on but didn't finish: diff --git a/src/renderer/html_handlebars/hbs_renderer.rs b/src/renderer/html_handlebars/hbs_renderer.rs
index 010e1dc2c..41866a4a1 100644
--- a/src/renderer/html_handlebars/hbs_renderer.rs
+++ b/src/renderer/html_handlebars/hbs_renderer.rs
@@ -57,7 +57,7 @@ impl HtmlHandlebars {
let content = ch.content.clone();
let content = utils::render_markdown(&content, ctx.html_config.curly_quotes);
- let fixed_content = utils::render_markdown_with_path_and_redirects(
+ let printed_item = utils::render_markdown_with_path_and_redirects(
&ch.content,
ctx.html_config.curly_quotes,
Some(path),
@@ -70,7 +70,7 @@ impl HtmlHandlebars {
print_content
.push_str(r#"<div style="break-before: page; page-break-before: always;"></div>"#);
}
- let path_id = {
+ let print_page_id = {
let mut base = path.display().to_string();
if base.ends_with(".md") {
base.truncate(base.len() - 3);
@@ -84,10 +84,10 @@ impl HtmlHandlebars {
// We have to build header links in advance so that we can know the ranges
// for the headers in one page.
// Insert a dummy div to make sure that we can locate the specific page.
- print_content.push_str(&(format!(r#"<div id="{}"></div>"#, &path_id)));
+ print_content.push_str(&(format!(r#"<div id="{print_page_id}"></div>"#)));
print_content.push_str(&build_header_links(
- &build_print_element_id(&fixed_content, &path_id),
- Some(path_id),
+ &build_print_element_id(&printed_item, &print_page_id),
+ Some(print_page_id),
));
// Update the context with data for this file
@@ -212,8 +212,11 @@ impl HtmlHandlebars {
Ok(())
}
- #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
- fn post_process_print(
+ /// Applies some post-processing to the HTML to apply some adjustments.
+ ///
+ /// This common function is used for both normal chapters (via
+ /// `post_process`) and the combined print page.
+ fn post_process_common(
&self,
rendered: String,
playground_config: &Playground,
@@ -225,7 +228,7 @@ impl HtmlHandlebars {
rendered
}
- #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
+ /// Applies some post-processing to the HTML to apply some adjustments.
fn post_process(
&self,
rendered: String,
@@ -233,7 +236,7 @@ impl HtmlHandlebars {
edition: Option<RustEdition>,
) -> String {
let rendered = build_header_links(&rendered, None);
- let rendered = self.post_process_print(rendered, &playground_config, edition);
+ let rendered = self.post_process_common(rendered, &playground_config, edition);
rendered
}
@@ -599,7 +602,7 @@ impl Renderer for HtmlHandlebars {
let rendered = handlebars.render("index", &data)?;
let rendered =
- self.post_process_print(rendered, &html_config.playground, ctx.config.rust.edition);
+ self.post_process_common(rendered, &html_config.playground, ctx.config.rust.edition);
utils::fs::write_file(destination, "print.html", rendered.as_bytes())?;
debug!("Creating print.html ✓");
@@ -802,7 +805,7 @@ fn make_data(
/// Go through the rendered print page HTML,
/// add path id prefix to all the elements id as well as footnote links.
-fn build_print_element_id(html: &str, path_id: &str) -> String {
+fn build_print_element_id(html: &str, print_page_id: &str) -> String {
static ALL_ID: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(<[^>]*?id=")([^"]+?)""#).unwrap());
static FOOTNOTE_ID: Lazy<Regex> = Lazy::new(|| {
Regex::new(
@@ -812,19 +815,22 @@ fn build_print_element_id(html: &str, path_id: &str) -> String {
});
let temp_html = ALL_ID.replace_all(html, |caps: &Captures<'_>| {
- format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+ format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
});
FOOTNOTE_ID
.replace_all(&temp_html, |caps: &Captures<'_>| {
- format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+ format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
})
.into_owned()
}
/// Goes through the rendered HTML, making sure all header tags have
/// an anchor respectively so people can link to sections directly.
-fn build_header_links(html: &str, path_id: Option<&str>) -> String {
+///
+/// `print_page_id` should be set to the print page ID prefix when adjusting the
+/// print page.
+fn build_header_links(html: &str, print_page_id: Option<&str>) -> String {
static BUILD_HEADER_LINKS: Lazy<Regex> =
Lazy::new(|| Regex::new(r"<h(\d)>(.*?)</h\d>").unwrap());
@@ -836,7 +842,7 @@ fn build_header_links(html: &str, path_id: Option<&str>) -> String {
.parse()
.expect("Regex should ensure we only ever get numbers here");
- insert_link_into_header(level, &caps[2], &mut id_counter, path_id)
+ insert_link_into_header(level, &caps[2], &mut id_counter, print_page_id)
})
.into_owned()
}
@@ -849,10 +855,11 @@ fn insert_link_into_header(
level: usize,
content: &str,
id_counter: &mut HashMap<String, usize>,
- path_id: Option<&str>,
+ print_page_id: Option<&str>,
) -> String {
- let id = if let Some(path_id) = path_id {
- utils::unique_id_from_content_with_path(content, id_counter, path_id)
+ let id = if let Some(print_page_id) = print_page_id {
+ let with_prefix = format!("{} {}", print_page_id, content);
+ utils::unique_id_from_content(&with_prefix, id_counter)
} else {
utils::unique_id_from_content(content, id_counter)
};
diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index a9e1298e9..cbf170b63 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -83,14 +83,6 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
unique_id
}
-pub(crate) fn unique_id_from_content_with_path(
- content: &str,
- id_counter: &mut HashMap<String, usize>,
- path_id: &str,
-) -> String {
- unique_id_from_content(&format!("{} {}", path_id, content), id_counter)
-}
-
/// Improve the path to try remove and solve .. token,
/// This assumes that `a/b/../c` is `a/c`.
///
@@ -136,13 +128,8 @@ fn normalize_path_id(mut path: String) -> String {
///
/// This adjusts links, such as turning `.md` extensions to `.html`.
///
-/// `path` is the path to the page being rendered relative to the root of the
-/// book. This is used for the `print.html` page so that links on the print
-/// page go to the anchors that has a path id prefix. Normal page rendering
-/// sets `path` to None.
-///
-/// `redirects` is also only for the print page. It's for adjusting links to
-/// a redirected location to go to the correct spot on the `print.html` page.
+/// See [`render_markdown_with_path_and_redirects`] for a description of
+/// `path` and `redirects`.
fn adjust_links<'a>(
event: Event<'a>,
path: Option<&Path>,
@@ -377,7 +364,16 @@ pub fn new_cmark_parser(text: &str, curly_quotes: bool) -> Parser<'_, '_> {
Parser::new_ext(text, opts)
}
-pub fn render_markdown_with_path_and_redirects(
+/// Renders markdown to HTML.
+///
+/// `path` is the path to the page being rendered relative to the root of the
+/// book. This is used for the `print.html` page so that links on the print
+/// page go to the anchors that has a path id prefix. Normal page rendering
+/// sets `path` to None.
+///
+/// `redirects` is also only for the print page. It's for adjusting links to
+/// a redirected location to go to the correct spot on the `print.html` page.
+pub(crate) fn render_markdown_with_path_and_redirects(
text: &str,
curly_quotes: bool,
path: Option<&Path>, If I find the time, I'll try to come back to this. Alternatively, if there is someone who is an experienced Rust developer who could help with the review here, that may help. |
@ehuss I wish I could help out, I'm mostly just consuming mdBook - my rust is amateurish at best. Thank you for the quick response, though, it helps with clarity on where things are! |
@ehuss Glad to know that this PR is actually been handled by the maintainer all the time! Feel free to commit code directly to this PR, I always keep "Allow edits by maintainers" on. |
2ce21d2
to
0dcdd19
Compare
i'm using hallowman fork to create pdf files now. waiting for this branch to move to master. |
Hi, @ehuss! It took me some time to resolve the conflicts this time, so I hope this can get merged soon, any plans for continue reviewing? |
Thanks. We will review it when we get the time so don't worry :) |
c4e9a93
to
edf3067
Compare
I checked the broken links again with the methods described at: #1738 (comment) and fixed the following issues:
Now no other broken links are found with the current version. |
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Let all the anchors id on the print page to have a path id prefix to help locate. e.g. bar/foo.md#abc -> #bar-foo-abc Also append a dummy div to the start of the original page to make sure that original page links without an anchor can also be located. Fix to remove all the `./` in the normalized path id so that for "./foo/bar.html#abc" we still get "#foo-bar-abc" Add support for redirect link anchors in print page so that anchors can also be redirected, also handle URL redirect links on print page Handle all the elements id to add a path prefix, also make path id to all be the lower case Fix for print page footnote links by adding the path id prefix Signed-off-by: Hollow Man <hollowman@opensuse.org>
Resolves #1736
Let all the anchors id on the print page to have a path id prefix to
help locate.
e.g. bar/foo.md#abc -> #bar-foo-abc
Also append a dummy div to the start of the original page to make sure
that original page links without an anchor can also be located.
Signed-off-by: Hollow Man hollowman@opensuse.org