Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Splitting and recombining an inputted document #449

Open
DragonflyStats opened this issue Sep 15, 2022 · 1 comment
Open

Comments

@DragonflyStats
Copy link

DragonflyStats commented Sep 15, 2022

Hi there - my query relates to dividing / truncating / split the document at a specified part of the document.

Suppose I have a pre-existing wod document, and there are 3 parts (Parts 1, Parts 2 and Parts 3).
I would like to be able to separate the document into the three parts , and recombine Parts 1 and 3 with a new part 2.

Here is some pseudo-code that hopefully expresses the idea



# my_New_Part_2 is already created by {officer}

myDoc <- read_docx("my_existing_doc.docx")

# when the start and end arguments are left blank, it defaults to the start and end of the inputted document

my_Part_1 <- myDoc %>% doc_split(start="", end="piece of text that indentifies the end of Part 1")

my_Part_2 <- myDoc %>% doc_split(
       start="piece of text that indentifies the end of Part 2", 
       end="piece of text that indentifies the end of Part 2"
       )

my_Part_3 <- myDoc %>% doc_split(
       start="piece of text that indentifies the end of Part 3", 
       end=""
       )

myDoc <- doc_combine(my_Part_1, my_New_Part_2, my_Part_3)

print(myDoc)

The rationale here is that Part 2 of the document is too complex to edit with body_replace_text() or may have images that need to be updated. Additionally there is no way of telling where in the document - in terms of sheet number - where Part 2 starts

Update 1

I think this code segment in Stack Overflow might be able to effectively extractmy_Part_3
https://stackoverflow.com/questions/71811129/how-to-subset-text-from-a-word-docx-after-a-matching-phrase/72018891#72018891
If I can get my_Part_1 then I should have an effective solution

Update 2

I am trying the inverse to the solution presented previously to extract out my_Part_1.
It is not working. I think the issue is the error argument needs to be something else.


body_remove_after_cursor <- function(x) {
  tryCatch(
    {
      x <- officer::cursor_forward(x)
      x <- officer::body_remove(x)
      body_remove_after_cursor(x)
    },
    error = function(e) { 
      return(x)
    }
  )
}
@DragonflyStats DragonflyStats changed the title Splitting and recombining an inputted document Feature Request - Splitting and recombining an inputted document Sep 15, 2022
@davidgohel
Copy link
Owner

Hello @DragonflyStats

Sorry for the lack of feedback. We will try, but not sure how :)

I think it is indeed necessary to use a technique on the cursors and body_remove(). But I doubt we can have a subset of a document as in your proposition (we are using R6 and this would probably mean to refactor the whole package).

This is what I have in mind:

  • my_New_Part_2 could be made with with block_list()
  • my_New_Part_2 would replace from "piece of text that identifies the end of Part 1" to "piece of text that identifies the end of Part 2"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants