Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminating build part way through book leaves many intermediate files lying around #986

Open
hadley opened this issue Oct 19, 2020 · 17 comments · May be fixed by #1079
Open

Terminating build part way through book leaves many intermediate files lying around #986

hadley opened this issue Oct 19, 2020 · 17 comments · May be fixed by #1079
Labels
bug an unexpected problem or unintended behavior

Comments

@hadley
Copy link
Member

hadley commented Oct 19, 2020

e.g. all of the .md files of the chapters built so far, and a number of render*.rds

@cderv
Copy link
Collaborator

cderv commented Oct 19, 2020

By Terminating build part part way through book, you are meaning stopping the build ? Or after an error occurs ?

I'll need to try reproduce this in order to see what is happening.

BTW there is a bookdown::clean_book() function available - but this may not be for the files you are seing

@hadley
Copy link
Member Author

hadley commented Oct 19, 2020

Generally, I mean pressing cmd + c. I suspect the problem is that the cleanup code is just run after the book is successfully built, rather than being wrapped in an on.exit().

@cderv
Copy link
Collaborator

cderv commented Oct 19, 2020

There are a bunch of on.exit() call already. For example, the one for the rds file:

bookdown/R/render.R

Lines 168 to 169 in 7ace59f

render_args = tempfile('render', '.', '.rds')
on.exit(file.remove(render_args), add = TRUE)

I'll try to reproduce and see why the cleaning is not happening

@yihui
Copy link
Member

yihui commented Oct 19, 2020

In theory, these files should be move/cleaned since I did use on.exit() to do that. Unless R crashed very badly, it shouldn't leave the intermediate files in the root dir.

@hadley
Copy link
Member Author

hadley commented Oct 19, 2020

Hmmm, weird. I double checked that you always used add = TRUE (since that's another easy mistake) but it looked ok to me.

Another thing worth checking if it you're using relative paths and the working directory changes at some point.

@cderv
Copy link
Collaborator

cderv commented Oct 19, 2020

Yes in theory they should be removed. However, while investigating #920 , I stumble upon this issue with intermediary md files still present. If there is an error in the book (using new_session: true), the rendering will stop but there still be the .md file present in the directory. They are not deleted.

Reprex:

dir.create(temp_proj <- tempfile())
old <- setwd(temp_proj)
bookdown:::bookdown_skeleton('.')
#> [1] TRUE
files1 <- fs::dir_ls()
files1
#> 01-intro.Rmd       02-literature.Rmd  03-method.Rmd      04-application.Rmd 
#> 05-summary.Rmd     06-references.Rmd  book.bib           index.Rmd          
#> preamble.tex       README.md          style.css          _bookdown.yml      
#> _output.yml
content <- xfun::read_utf8("02-literature.Rmd")
content <- c(content, "```{r}", "stop('Will be an error')", '```')
xfun::write_utf8(content, "02-literature.Rmd")
yml <- yaml::read_yaml("_bookdown.yml")
yml <- c(yml, list(new_session = TRUE))
yaml::write_yaml(yml, "_bookdown.yml")
bookdown::render_book("index.Rmd", "bookdown::gitbook")
#> Error in Rscript_render(f, render_args, render_meta, add1, add2): Failed to compile 02-literature.Rmd
files2 <- fs::dir_ls()
setdiff(files2, files1)
#> [1] "01-intro.md"           "01-intro.utf8.md"      "02-literature.knit.md"
#> [4] "index.md"              "index.utf8.md"         "packages.bib"         
#> [7] "_bookdown_files"
setwd(old)
unlink(temp_proj)

However, the .rds file is not present so I guess there is some place where this does not go well and other (like which rds) which are ok.

I'll double check this tomorrow, specificaly the relative path.

@jtbayly
Copy link
Contributor

jtbayly commented Nov 6, 2020

I often have extra files left when the build fails because of an error, fwiw. (I make a lot of errors. lol)

@hadley
Copy link
Member Author

hadley commented Feb 17, 2021

The problem is that the intermediate files are left behind by:

for (f in files[rerun]) Rscript_render(f, render_args, render_meta, add1, add2)

And those keep intermediate files because clean is explicitly set to FALSE:

saveRDS(
    list(output_format = output_format, ..., clean = FALSE, envir = envir),
    render_args
  )

hadley added a commit that referenced this issue Feb 17, 2021
* Save render args in tempdir()
* Pass clean argument to chapter render
* Delete intermediate .md files

Fixes #986
@hadley hadley linked a pull request Feb 17, 2021 that will close this issue
@yihui
Copy link
Member

yihui commented Feb 26, 2021

  1. The intermediate file render*.rds is cleaned up at
    on.exit(file.remove(render_args), add = TRUE)
    If R is terminated by clicking the red Stop button in RStudio, for some reason this on.exit() is not called. @hadley I've verified that even with your fix in Improved intermediate cleanup #1079, the .rds will still be left inside the tempdir(). We need to figure out why the Stop button in RStudio kills R so brutally.
  2. For the intermediate *.md files, they are left there if a certain Rmd fails to compile, but they will be cleaned up again the next time after you fix the error in the failing Rmd document and recompile it:

    bookdown/R/render.R

    Lines 222 to 223 in ca0145f

    intermediates = unlist(lapply(meta, attr, 'intermediates', exact = TRUE))
    if (clean) on.exit(unlink(intermediates, recursive = TRUE), add = TRUE)

@juliasilge
Copy link
Member

I find that I am left with many intermediate files (.md files, render.rds files, etc) when a bookdown render fails due to an error and that I have to manually delete them, or else the next time I try to render() it fails again. I have assumed this is because of being in an uncomfortable cached situation.

I am not clicking the red 🛑 RStudio button; the render is erroring out part of the way through.

@yihui
Copy link
Member

yihui commented Mar 16, 2021

I find that I am left with many intermediate files (.md files, render.rds files, etc) when a bookdown render fails due to an error and that I have to manually delete them, or else the next time I try to render() it fails again.

Hmm... After you fix the error in the Rmd, bookdown should clean up the intermediate files the next time you build the book. I wasn't able to reproduce this problem the last time, but let me try again.

@yihui
Copy link
Member

yihui commented Mar 24, 2021

Unfortunately I still can't reproduce the problem. Here I add an error chunk to index.Rmd:

d = tempfile()
dir.create(d)
owd = setwd(d)
bookdown:::bookdown_skeleton('test')
setwd('test')

cat('new_session: true\n', file = '_bookdown.yml', append = TRUE)
cat('```{r}
stop("Error")
```
', file = 'index.Rmd', append = TRUE)

bookdown::render_book()

Rendering index.Rmd will fail, and leave an intermediate index.knit.md:

cat(list.files(), sep = '\n')
_bookdown.yml
_output.yml
01-intro.Rmd
02-literature.Rmd
03-method.Rmd
04-application.Rmd
05-summary.Rmd
06-references.Rmd
book.bib
index.knit.md  # <- the intermediate file
index.Rmd
packages.bib
preamble.tex
README.md
style.css

Then I remove the error chunk and re-render:

xfun::process_file('index.Rmd', function(x) head(x, -3))
bookdown::render_book()
cat(list.files(), sep = '\n')
_book
_bookdown_files
_bookdown.yml
_output.yml
01-intro.Rmd
02-literature.Rmd
03-method.Rmd
04-application.Rmd
05-summary.Rmd
06-references.Rmd
book.bib
index.Rmd
packages.bib
preamble.tex
README.md
style.css
test.rds

The intermediate .knit.md file is gone.

Same thing when building the book with the RStudio IDE.

So I still need a reproducible example...

@juliasilge
Copy link
Member

Let me see if I can put one together soon; I think the problem may be related to how caching is working when some chunks depend on something that has errored.

@cderv cderv added the bug an unexpected problem or unintended behavior label Mar 30, 2021
@cderv
Copy link
Collaborator

cderv commented Mar 30, 2021

To help with this, @apreshill suggested to look into https://github.com/rstudio-education/stat545 where this problem was also encountered while collaborating on the book.

@juliasilge
Copy link
Member

Well, I spent some time today trying to generate this type of problem in a small bookdown repo, and I was not successful. 😔 The small bookdown would always build successfully after I removed the error (even playing with caching and chunk dependencies).

Unfortunately I can't offer a small reproducible example of the problems I experienced for now, but I will update you all in the future if/when I come upon it again!

@yihui
Copy link
Member

yihui commented Apr 7, 2021

Thank you anyway! When we resolve the rmarkdown issue rstudio/rmarkdown#2086, this bookdown issue should also be resolved.

@cderv
Copy link
Collaborator

cderv commented Apr 8, 2021

@yihui I have took some time to look into this issue.

A reminder: there was a weird interaction between rlang and bookdown in #920
It required a fix in rlang and it is adviced to use rlang 0.4.9+ if you set options(error = rlang::entrace) in . The initial issue was causing wrong exit and, even if I am not sure it could be linked, it worth checking the rlang version if you have a fixed one because using renv.

Based on your examples, and a more complex project like mentioned in #986 (comment), I went further.
What I did is I added an error in the first chunk of one of the file, not the first. The file with the error has a .knit.md leftover where others have *.ut8.md leftover. It means each rendering with new_session is not cleaned up as it should for this file. About, *.utf8.md it is solved by rstudio/rmarkdown#2086 and this file won't exist anymore. Initially I think those files were removed because the clean call has not been made and on.exit handler for those has not been registered yet.

But there will still be the intermediates .md files as the merged has not happened yet (in knit and merge approach)

d = tempfile()
dir.create(d)
owd = setwd(d)
bookdown:::bookdown_skeleton('test')
setwd('test')

cat('new_session: true\n', file = '_bookdown.yml', append = TRUE)
cat('```{r}
stop("Error")
```
', file = '05-summary.Rmd', append = TRUE)

bookdown::render_book('.')

xfun::raw_string(list.files())
_bookdown.yml
_bookdown_files
_output.yml
01-intro.md <-- unmerged md file
01-intro.Rmd
01-intro.utf8.md
02-literature.md <-- unmerged md file
02-literature.Rmd
02-literature.utf8.md
03-method.md <-- unmerged md file
03-method.Rmd
03-method.utf8.md
04-application.md <-- unmerged md file
04-application.Rmd
04-application.utf8.md
05-summary.knit.md <-- error during knitting
05-summary.Rmd
06-references.Rmd
book.bib
index.md <-- unmerged md file
index.Rmd
index.utf8.md
packages.bib
preamble.tex
README.md
style.css
test.rds <-- is this expected ?

It seems currently those .md file are moved to the output folder when no error.

xfun::process_file('05-summary.Rmd', function(x) head(x, -3))
bookdown::render_book(".")
> xfun::raw_string(list.files())
_book
_bookdown.yml
_bookdown_files
_output.yml
01-intro.Rmd
02-literature.Rmd
03-method.Rmd
04-application.Rmd
05-summary.Rmd
06-references.Rmd
book.bib
index.Rmd
packages.bib
preamble.tex
README.md
style.css
test.rds
> xfun::raw_string(list.files("_book"))
01-intro.md <-- moved
01-intro_files
02-literature.md <-- moved
03-method.md <-- moved
04-application.md <-- moved
05-summary.md <-- moved
06-references.md <-- moved
applications.html
final-words.html
index.html
index.md
intro.html
libs
literature.html
methods.html
reference-keys.txt
references.html
search_index.json
style.css

I believe on suggestion in #1079 is to remove the .md files on error during the process left because the on.exit clause where not called. Same as utf8.md, the on.exit() handler has not yet been registered for those file to be moved.

Why are those .md files moved to output dir by the way ? I am not sure to follow the purpose.

If an error occurs during Rscript_render(), it seems ok to remove all those unused file which currently does not happen. As suggested in https://github.com/rstudio/bookdown/pull/1079/files#diff-0d68ae244e4a8c5ddc391c48f5399a91f18bc2a955f098e0e66069c607b444f4R210
Would that be ok ? It could depend on why we move them.

render*.rds files are from render_args save and should be clean when render_new_session() exists. I can't yet reproduce this file to still be present on error. If something cause this, it is during the main render().

I believe most of the issue here is to be sure the correct on.exit() handler will be register as soon as possible, or should be moved closed to where the intermediates happens. Currently we need to do a lot of cleaning because we use clean = FALSE in the call to render() using Rscript_render() - is calling clean = TRUE here an issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants