Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docx_extract_all_cmnts(..., include_text = TRUE) failing on edge case #31

Open
conig opened this issue Jan 18, 2022 · 0 comments
Open

Comments

@conig
Copy link

conig commented Jan 18, 2022

First off, thank you for this package, it's really useful.

I've run into an interesting scenario where the argument include_text = TRUE fails for a word document.

Here are two near identical word documents:
works.docx
does not work.docx

Both just have the text: "Manuscript text" with the comment "comment text"

However the include_text argument fails for "does not work.docx" due to the introduction to a tab symbol.

"does not work.docx" |> 
  docxtractr::read_docx() |> 
  docxtractr::docx_extract_all_cmnts(include_text = TRUE)
#> # A tibble: 1 x 6
#>   id    author          date                 initials comment_text word_src
#> * <chr> <chr>           <chr>                <chr>    <chr>        <chr>   
#> 1 0     James Conigrave 2022-01-18T02:08:00Z ""       Comment text ""
"works.docx" |> 
  docxtractr::read_docx() |> 
  docxtractr::docx_extract_all_cmnts(include_text = TRUE)
#> # A tibble: 1 x 6
#>   id    author          date                 initials comment_text word_src     
#> * <chr> <chr>           <chr>                <chr>    <chr>        <chr>        
#> 1 0     James Conigrave 2022-01-18T02:08:00Z ""       Comment text Manuscript t~

It appears that in the file "does not work" there are small changes to the xml which break the functionality. I'm not quite sure how they have been caused but would love a fix if you have time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant