Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: 'pdf_toc()' returns more pdf bookmarks information #117

Open
trevorld opened this issue Sep 27, 2022 · 3 comments
Open

Comments

@trevorld
Copy link

  • Currently I observe that pdf_toc() only returns the bookmark titles and the nesting hierarchy of bookmarks.
  • It would be nice if it could also return more bookmark attributes in addition to the title.
  • In particular it would be nice if we could get the page number that each bookmark goes to (when the bookmark action is to go to a page number). Currently I need to use a wrapper around the command-line tool pdftk to get that information.
@trevorld
Copy link
Author

trevorld commented Sep 30, 2022

In case it is helpful here is a minimal pdf document with the following pdf bookmarks features presents:

  • Bookmarks starting open and bookmarks starting closed (integer count positive versus negative)
  • Bookmarks with different styles (i.e. plain, bold, italic, bold-italic)
  • Bookmarks with different colors

PDF attachment: bookmarks.pdf

Note many open-source pdf viewers quietly ignore some of these features. Foxit reader is an example of a cross-platform (but proprietary) pdf reader that supports all of these.

Here is the R code to create the minimal pdf with pdf bookmarks:

library("grid")
library("grDevices")
library("xmpdf") # remotes::install_github("trevorld/r-xmpdf")

stopifnot(supports_gs()) # needs 'ghostscript'

# Create two-page pdf
pdf("bookmarks.pdf", onefile = TRUE)
grid.text("Page 1")
grid.newpage()
grid.text("Page 2")
invisible(dev.off())

# Add bookmarks
bookmarks <- data.frame(title = c("Front", "Page 1", "Page 2"),
                        page = c(1L, 1L, 2L),
                        count = c(2L, -1L, 0),
                        fontface = c("italic", "bold", "bold.italic"),
                        color = c("black", "red", "blue"))
set_bookmarks_gs(bookmarks, "bookmarks.pdf")

Currently pdf_toc() seems to ignore most of this information:

pdftools::pdf_toc("bookmarks.pdf")
$title
[1] ""

$children
$children[[1]]
$children[[1]]$title
[1] "Front"

$children[[1]]$children
$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$title
[1] "Page 1"

$children[[1]]$children[[1]]$children
$children[[1]]$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$children[[1]]$title
[1] "Page 2"

$children[[1]]$children[[1]]$children[[1]]$children
list()

@jeroen
Copy link
Member

jeroen commented Oct 4, 2022

I don't think poppler supports this right now, at least I can't find it in the API. I found this old post but it looks like it was never followed up on.

@trevorld
Copy link
Author

trevorld commented Oct 4, 2022

Thanks for the explanation!

Looking at the poppler API documentation I guess besides the bookmark's title the only other information the API makes available is whether that bookmark should start open/closed in the TOC (i.e. is_open). No bookmark color, style, or page number (or other action) seems to be currently supported.

Feel free to close this issue but I'll leave it open since it seems you could still return the is_open data to pdf_toc(). pdftk currently doesn't return that bookmark info...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants