Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error about missing scran when run with single-cell reference #170

Open
heathergeiger opened this issue Jan 9, 2021 · 12 comments
Open

Error about missing scran when run with single-cell reference #170

heathergeiger opened this issue Jan 9, 2021 · 12 comments

Comments

@heathergeiger
Copy link

I am currently trying to run SingleR vs. the counts and labels available here:

http://geschwindlab.dgsom.ucla.edu/pages/codexviewer

Here is my code to get a log-normalized expression matrix and labels in the appropriate format for SingleR.

load("raw_counts_mat.rdata")

metadata <- read.csv("cell_metadata.csv",header=TRUE,row.names=1)
metadata <- metadata[,1:2]
no_metadata_cells <- setdiff(colnames(raw_counts_mat),rownames(metadata))
no_metadata_n <- length(no_metadata_cells)
dummy_metadata_for_no_metadata_cells <- data.frame(Cluster = rep("None",times=no_metadata_n),
Subcluster = rep("None",times=no_metadata_n),
row.names=no_metadata_cells)
metadata <- rbind(metadata,dummy_metadata_for_no_metadata_cells)
metadata <- metadata[colnames(raw_counts_mat),]

seurat.obj <- CreateSeuratObject(counts=raw_counts_mat,min.cells=3)
seurat.obj <- NormalizeData(seurat.obj)
seurat.obj$Cluster <- metadata$Cluster
seurat.obj <- subset(seurat.obj,Cluster != "None")

ref_norm_counts <- GetAssayData(seurat.obj,assay="RNA",slot="data")
ref_labels <- as.vector(seurat.obj$Cluster)
rm(seurat.obj)

I then ran SingleR like so, where "norm_counts" is the result of run GetAssayData for slot="data" on the Seurat object containing the test data.

predictions <- SingleR(test = norm_counts,
ref = ref_norm_counts,labels = ref_labels,
de.method="wilcox")

But I am getting the following error: "Error in loadNamespace(name) : there is no package called ‘scran’".

Any idea what is going on here? My sessionInfo() result is below. SingleR worked fine with a bulk reference, so the issue appears to be specific to when I use a single-cell reference with the appropriate change to "de.method".

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /nfs/sw/R/R-4.0.0/lib64/R/lib/libRblas.so
LAPACK: /nfs/sw/R/R-4.0.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] Seurat_3.2.1                SingleR_1.2.4              
 [3] SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
 [5] matrixStats_0.57.0          Biobase_2.48.0             
 [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
 [9] IRanges_2.22.2              S4Vectors_0.26.1           
[11] BiocGenerics_0.34.0         GeneBook_1.0               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                    colorspace_1.4-1             
  [3] deldir_0.1-28                 ellipsis_0.3.1               
  [5] ggridges_0.5.2                XVector_0.28.0               
  [7] BiocNeighbors_1.7.0           spatstat.data_1.4-3          
  [9] leiden_0.3.3                  listenv_0.8.0                
 [11] ggrepel_0.8.2                 bit64_4.0.5                  
 [13] interactiveDisplayBase_1.27.5 AnnotationDbi_1.51.3         
 [15] codetools_0.2-16              splines_4.0.0                
 [17] polyclip_1.10-0               jsonlite_1.7.1               
 [19] ica_1.0-2                     cluster_2.1.0                
 [21] dbplyr_1.4.4                  png_0.1-7                    
 [23] uwot_0.1.8                    shiny_1.5.0                  
 [25] sctransform_0.2.1             BiocManager_1.30.10          
 [27] compiler_4.0.0                httr_1.4.2                   
 [29] assertthat_0.2.1              Matrix_1.2-18                
 [31] fastmap_1.0.1                 lazyeval_0.2.2               
 [33] later_1.1.0.1                 BiocSingular_1.5.0           
 [35] htmltools_0.5.0               tools_4.0.0                  
 [37] rsvd_1.0.3                    igraph_1.2.5                 
 [39] gtable_0.3.0                  glue_1.4.2                   
 [41] GenomeInfoDbData_1.2.3        RANN_2.6.1                   
 [43] reshape2_1.4.4                dplyr_1.0.2                  
 [45] rappdirs_0.3.1                spatstat_1.64-1              
 [47] Rcpp_1.0.5                    vctrs_0.3.4                  
 [49] nlme_3.1-149                  ExperimentHub_1.14.2         
 [51] DelayedMatrixStats_1.11.1     lmtest_0.9-37                
 [53] stringr_1.4.0                 globals_0.12.5               
 [55] mime_0.9                      miniUI_0.1.1.1               
 [57] lifecycle_0.2.0               irlba_2.3.3                  
 [59] goftest_1.2-2                 future_1.18.0                
 [61] AnnotationHub_2.21.5          zlibbioc_1.34.0              
 [63] MASS_7.3-52                   zoo_1.8-8                    
 [65] scales_1.1.1                  spatstat.utils_1.17-0        
 [67] promises_1.1.1                RColorBrewer_1.1-2           
 [69] yaml_2.2.1                    curl_4.3                     
 [71] gridExtra_2.3                 memoise_1.1.0                
 [73] reticulate_1.16               pbapply_1.4-3                
 [75] ggplot2_3.3.2                 rpart_4.1-15                 
 [77] stringi_1.4.6                 RSQLite_2.2.0                
 [79] BiocVersion_3.11.1            BiocParallel_1.22.0          
 [81] rlang_0.4.7                   pkgconfig_2.0.3              
 [83] bitops_1.0-6                  lattice_0.20-41              
 [85] tensor_1.5                    ROCR_1.0-11                  
 [87] purrr_0.3.4                   patchwork_1.0.1              
 [89] htmlwidgets_1.5.1             cowplot_1.1.0                
 [91] bit_4.0.4                     tidyselect_1.1.0             
 [93] RcppAnnoy_0.0.16              plyr_1.8.6                   
 [95] magrittr_1.5                  R6_2.4.1                     
 [97] generics_0.0.2                DBI_1.1.0                    
 [99] mgcv_1.8-33                   pillar_1.4.6                 
[101] fitdistrplus_1.1-1            abind_1.4-5                  
[103] survival_3.2-3                RCurl_1.98-1.2               
[105] tibble_3.0.3                  future.apply_1.6.0           
[107] crayon_1.3.4                  KernSmooth_2.23-17           
[109] BiocFileCache_1.13.1          plotly_4.9.2.1               
[111] grid_4.0.0                    data.table_1.13.0            
[113] blob_1.2.1                    digest_0.6.25                
[115] xtable_1.8-4                  tidyr_1.1.2                  
[117] httpuv_1.5.4                  munsell_0.5.0                
[119] viridisLite_0.3.0            
@LTLA
Copy link
Collaborator

LTLA commented Jan 9, 2021

Nothing too complicated. When de.method="wilcox" or "t", the package uses scran's functions to perform the pairwise t-tests or Wilcoxon tests in an efficient manner; so to use that functionality, you'll need scran installed, as the error message suggests. It's not installed by default to keep SingleR's dependencies low, given that the default method on the default bulk references doesn't require scran.

So just BiocManager::install('scran') and you'll be good to go.

@dtm2451
Copy link
Collaborator

dtm2451 commented Jan 15, 2021

Perhaps we should add an if !require("scran") { stop('scran package is required for de.method="wilcox" or "t"') }? I believe this require() conditional method is the recommendation from Bioconductor's developer guidelines, but I'm curious about your thoughts, @LTLA, as there is the downside of then loading that entire package in all de.method="wilcox" or "t" cases!

@LTLA
Copy link
Collaborator

LTLA commented Jan 16, 2021

Hm. Traditionally I have always considered the error message out of :: to be satisfactory. Also it was a pain to have to write these protective clauses every time I used a Suggested package.

The best of both worlds would be to write a little getter function along the lines of:

checkForPackage <- function(pkg) {
    if (!requireNamespace(pkg, quietly=TRUE)) {
         # Perhaps have some smarter checks about whether something is
         # a Bioconductor package, but we could also just trust the developer here.
         stop(pkg, " is not installed, run BiocManager::install('", pkg, "')")
    }
}

which avoids the need to write all this crap everytime we use :: for a Suggested method. This also avoids attaching packages on the search path, only loading their namespaces instead.

Would be nice if we can get it to live in some core package, then I could use it for all my packages.

@dtm2451
Copy link
Collaborator

dtm2451 commented Jan 16, 2021

Such a base function sounds pretty good to me! Would allow me to remove the 5 similar, though each manually made more specific, functions from dittoSeq.

Such a function could potentially also take in multiple pkgs for cases when 2 or more are actually needed for the specific action.

@dtm2451
Copy link
Collaborator

dtm2451 commented Jan 16, 2021

Also, yes forgot about but totally meant *requireNamespace()!

@mtmorgan
Copy link

Probably a useful utility, although it sort of seems like one is patching an imperfect error message, with a better solution being a better error message?

One thing about the above is that it doesn't distinguish between types of errors (e.g., when a package fails to load because the installation has become corrupted somehow). One could be more clever, since the error is actually classed

> x = tryCatch(foo::bar(), error = identity)
> x
<packageNotFoundError in loadNamespace(x): there is no package called 'foo'>

So something like

tryCatch({
    foo:bar()
}, packageNotFoundError = function(e) {
    pkg <- e$package
    stop(
        "package '", pkg, "' not found; ",
        'install with `BiocManager::install("', pkg, '")`',
        call. = FALSE
    )
})

which also works for loadNamespace("foo") but not requireNamespace("foo").

Candidate locations are in BiocManager or maybe BiocGenerics; it's currently unusual for a package to Depend: or Import: BiocManager.

@LTLA
Copy link
Collaborator

LTLA commented Jan 16, 2021

BiocManager seems like the best place for this to live. The package has minimal dependencies and it must be installed by default before SingleR anyway, so I wouldn't consider it a real +1 to my dependency count.

@kasperdanielhansen
Copy link

But BiocManager is really for managing installations. Personally, I don't have BiocManager loaded when I do analysis, but I would want this fix to be available in that case.

Having a set of utility functions for dealing with Suggested packages seems worthwhile. I know this is suggesting something slightly different from what is being suggested here.

@hpages
Copy link

hpages commented Jan 19, 2021

So basically the proposal is to replace the "there is no package called ‘scran’" error message with the more user-friendly "you don't have package 'scran'; install it with blah blah".

Personally I think that the specific error message suggested by @dtm2451 (scran package is required for de.method="wilcox" or "t") still has more value because it explains why the package is suddenly needed. It's always a little bit of an annoyance to discover that you miss a package in the middle of an analysis so it's nice to understand why this happens.

@mtmorgan
Copy link

Any thoughts @hpages on a home for this? I'm not sure, as Kasper notes, that BiocManager is the right place for it.

@dtm2451
Copy link
Collaborator

dtm2451 commented Jan 19, 2021

I sometimes have tasks requiring multiple suggested packages, so would definitely vote for something which can check a set of packages. Perhaps the algorithm framework could be something like this:

suggested_pkgs_check <- function(pkgs, fxnality_message = "this functionality") {

    pkgs_missing <- vapply(
        pkgs, function(pkg) {
        # Martin's `tryCatch` suggestion modified to allow multiple packages,
        # OR a `requireNamespace` check
        # output: a logical for each pkg of whether it is missing (TRUE) vs available (FALSE) 
        }, FUN.VALUE = logical(1)
    )

    if (any(pkgs_missing)) {
        stop(
            "Package(s) ", paste0(pkgs[pkgs_missing], collapse = ", "),
            " unavailable, but required for ", fxnality_message,
            ". Install with `BiocManager::install(c('",
            paste0(pkgs[pkgs_missing], collapse = "', '"),
            "')`.",
            call. = FALSE
        )
    }
}

Then, 1) multiple packages could be checked (so user's don't install a single package, and start rerunning their pipeline only to get an error at the same point due to a different package needed for the same step!) && 2) my specific message suggestion can be accommodated (yet even if a developer doesn't bother to add there own custom fxnality_message here, the idea that new packages are needed for the specific, currently requested, functionality is still given!).

I wonder if we need to distinguish between reasons that a function may be inaccessible? The path forward if a package has become corrupted is still to reinstall, no?

Re the home for this function: I don't have anything to add that hasn't already been said.

@hpages
Copy link

hpages commented Jan 20, 2021

It's about installing missing packages (and the error message explicitly instructs the user to use BiocManager to do so), which makes BiocManager kind of a natural place for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants