Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download only necessary packages #112

Open
vnijs opened this issue Jun 12, 2018 · 11 comments
Open

Download only necessary packages #112

vnijs opened this issue Jun 12, 2018 · 11 comments

Comments

@vnijs
Copy link
Contributor

vnijs commented Jun 12, 2018

When I run the lines below all dependencies of the packages in the pkg_src character vector are downloaded from CRAN, even the ones that are already up to date in the local miniCRAN directory. This can take a pretty long time if there are many deps. Is there already a function in the miniCRAN package to check this? I looked around but could not see one. Did I miss something?

pkgList <- pkgDep(pkgs_src, repos = "a-cran-repo", type = "source", suggests = FALSE)
dl <- makeRepo(pkgList, path = pth, type = "source")
@achubaty
Copy link
Collaborator

This probably duplicates #56. The issue is that dependencies may have changed (new dependency packages) and/or dependency versions may have changed, so I think downloading all dependencies is probably the correct thing to do here. Of course it would be better if there were a way to only update packages that need to be updated, but that's more complicated to determine.

Do you have any thoughts on how to examine/determine the update dependency graph?

@vnijs
Copy link
Contributor Author

vnijs commented Jun 12, 2018

I agree you want to check all dependencies as well. I'm not sure how addPackage addresses this ... unless I didn't get the reference.

How-to: Once you make the list of all packages that are needed using pkgDep, check the versions of all identified packages available on CRAN and available in the local repo. If a dependency has changed (i.e., nothing in local repo) it should download. If the version of the dependency on CRAN is the same as what is already available locally, don't download. etc.

@achubaty
Copy link
Collaborator

You're correct that it currently doesn't address this issue. We hadn't yet taken the time to work through a solution. Presumably, package dependencies that are no longer required should be removed from the repo, along with adding new package deps.

@vnijs
Copy link
Contributor Author

vnijs commented Jun 27, 2018

Below a draft of a function that will only download packages that are not already in an (online) repo or for which a newer package version is available on CRAN. Seems to work and saves a good amount of time for a reasonably big repo (https://github.com/radiant-rstats/minicran). The function also returns any packages that could be removed but doesn't actually do anything with that information just yet. Looking forward to hearing your comments.

For an example script that uses the function see: https://github.com/radiant-rstats/minicran/blob/gh-pages/minicran.R This script also has a (very) crude function to remove older-version files from the repo. If you have ideas on how to improve that I'd also be interested.

selMakeRepo <- function(
  pkgs, path, minicran, repos = getOption("repos"),
  type = "source", Rversion = R.version, ...
) {

  minicran_avail <- miniCRAN::pkgAvail(repos = minicran, type = type, Rversion = Rversion)[, "Version"]
  cran_avail <- miniCRAN::pkgAvail(repos = repos, type = type, Rversion = Rversion)[, "Version"]

  ## in dependent pkgs but not in miniCRAN repo
  to_fetch <- pkgs[!pkgs %in% names(minicran_avail)]

  ## not in dependent pkgs but in miniCRAN repo
  to_remove <- minicran_avail[!names(minicran_avail) %in% pkgs]

  ## which packages should be updated
  to_compare <- intersect(names(cran_avail), names(minicran_avail))

  pkgs_comp <- data.frame(
    compare = to_compare,
    pkgs = cran_avail[to_compare],
    minicran = minicran_avail[to_compare],
    stringsAsFactors = FALSE
  )

  to_update <- apply(pkgs_comp, 1, function(x) compareVersion(x[2], x[3]))
  to_update <- names(to_update[to_update == 1])

  to_fetch <- c(to_update, to_fetch)

  ## selective set of packages to download and add to repo
  dwnload <- makeRepo(to_fetch, path = path, type = type, Rversion = Rversion, ...)

  ## returning packages to remove
  invisible(to_remove)
}

@andrie andrie changed the title Is there already an option to only download the packages that need to be updated? Download only necessary packages Jun 27, 2018
@andrie
Copy link
Owner

andrie commented Jun 27, 2018

Thanks for contributing the code. I'm taking a look, in a new branch 112-download-necessary-only.

My thought is to make this part of makeRepo(), and to add an argument makeRepo(..., download_if_exists = TRUE).

(I'm open for suggestion what this argument should be called.)

@vnijs
Copy link
Contributor Author

vnijs commented Jun 27, 2018

Note that my function also has the url for the remote host of the minicran repo. You could use the path argument to point to local files but then you would need to extract package names and versions from file names.

@andrie
Copy link
Owner

andrie commented Jun 27, 2018

Good pointer, thank you.

I think we can extract the package versions from a local path with something along these lines:

pkg_versions <- function(path){
  file_ptn <- "\\.tar\\.gz|zip|tgz"
  p <- basename(list.files(path, pattern = file_ptn, recursive = TRUE))
  z <- strsplit(p, "_")
  pkg <- sapply(z, "[[", 1)
  version <- gsub(file_ptn, "", sapply(z, "[[", 2))
  names(version) <- pkg
  version
}
pkg_versions(path)
##     dplyr      plyr      Rcpp 
##   "0.7.5"   "1.8.4" "0.12.17" 

@vnijs
Copy link
Contributor Author

vnijs commented Jun 27, 2018

Nice @andrie ! I like this better than my approach of using the remote miniCRAN repo. Once you have the local path you should also be able to remove deps that are no longer needed as @achubaty suggested

Note: This runs through the entire repo right? So if my repo has macOS 3.4 and 3.5, which is the version that will be used to check if new files should be downloaded? CRAN only very slowly updates 3.4 for macOS (if at all). So if 3.5 is up to date this function would never update 3.4 version packages right? Same issue might come up with src files which are always first. Once you have the src file in your repo the others may not get updated. Perhaps the search could be per type and Rversion?

Minor tweak: file_ptn <- "\\.(tar\\.gz|zip|tgz)$"

@andrie
Copy link
Owner

andrie commented Jun 28, 2018

We don't need any of this complication, since pkgAvail() already works on a local path, and supports type and R version:

pkgAvail(path, type = "source", Rversion = "3.5.0")[, "Version"]
    dplyr      plyr      Rcpp 
  "0.7.5"   "1.8.4" "0.12.17" 

@vnijs
Copy link
Contributor Author

vnijs commented Jun 28, 2018

Even better :)

@andrie
Copy link
Owner

andrie commented Mar 24, 2024

Sadly I no longer seem to have a trace of this branch that I reference.
It also seems I never merged this back into main.
😭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants