Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample_frac function changes the cell order #58

Open
aodainic7 opened this issue Apr 27, 2023 · 2 comments
Open

sample_frac function changes the cell order #58

aodainic7 opened this issue Apr 27, 2023 · 2 comments

Comments

@aodainic7
Copy link

Hey,
when I try subsetting a large Seurat object to reduce the computing time, the sample_frac() function changes the cell order, so that the Seurat functions do not work anymore. To repeat the error try the code:

pbmc_small = SeuratObject::pbmc_small

pbmc_small_subset <- pbmc_small |> sample_frac(0.9)

pbmc_small_subset <- RunPCA(pbmc_small_subset, reduction.name = 'pca', assay = "RNA")

The error I'm getting is :
Error in validObject(object = x) : invalid class “Seurat” object: 1: all cells in assays must be in the same order as the Seurat object invalid class “Seurat” object: 2: all cells in reductions must be in the same order as the Seurat object invalid class “Seurat” object: 3: all cells in graphs must be in the same order as the Seurat object (offending: RNA_snn) invalid class “Seurat” object: 4: 'active.idents' must be named with cell names

Thanks!

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices datasets utils methods base

other attached packages:
[1] cbmc.SeuratData_3.1.4 SeuratData_0.2.2 scclusteval_0.0.0.9000 SingleCellExperiment_1.18.1 SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0
[8] GenomeInfoDb_1.32.4 IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.1 matrixStats_0.63.0 tidyseurat_0.5.9
[15] ttservice_0.2.2 RColorBrewer_1.1-3 patchwork_1.1.2 Seurat_4.9.9.9042 SeuratObject_4.9.9.9084 sp_1.6-0 lubridate_1.9.2
[22] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[29] ggplot2_3.4.2 tidyverse_2.0.0

loaded via a namespace (and not attached):
[1] spam_2.9-1 systemfonts_1.0.4 plyr_1.8.8 igraph_1.4.2 lazyeval_0.2.2 splines_4.2.0 RcppHNSW_0.4.1 listenv_0.9.0
[9] scattermore_0.8 digest_0.6.31 htmltools_0.5.5 fansi_1.0.4 magrittr_2.0.3 tensor_1.5 cluster_2.1.4 ROCR_1.0-11
[17] limma_3.52.4 tzdb_0.3.0 globals_0.16.2 timechange_0.2.0 spatstat.sparse_3.0-1 colorspace_2.1-0 rappdirs_0.3.3 ggrepel_0.9.3
[25] textshaping_0.3.6 xfun_0.39 crayon_1.5.2 RCurl_1.98-1.12 jsonlite_1.8.4 progressr_0.13.0 spatstat.data_3.0-1 survival_3.3-1
[33] zoo_1.8-12 glue_1.6.2 polyclip_1.10-4 gtable_0.3.3 zlibbioc_1.42.0 XVector_0.36.0 leiden_0.4.3 DelayedArray_0.22.0
[41] future.apply_1.10.0 abind_1.4-5 scales_1.2.1 spatstat.random_3.1-4 miniUI_0.1.1.1 Rcpp_1.0.10 viridisLite_0.4.1 xtable_1.8-4
[49] reticulate_1.28 dotCall64_1.0-2 htmlwidgets_1.6.2 httr_1.4.5 ellipsis_0.3.2 ica_1.0-3 farver_2.1.1 pkgconfig_2.0.3
[57] sass_0.4.5 uwot_0.1.14 deldir_1.0-6 utf8_1.2.3 here_1.0.1 labeling_0.4.2 tidyselect_1.2.0 rlang_1.1.0
[65] reshape2_1.4.4 later_1.3.0 cachem_1.0.7 munsell_0.5.0 tools_4.2.0 cli_3.6.1 generics_0.1.3 ggridges_0.5.4
[73] evaluate_0.20 fastmap_1.1.1 yaml_2.3.7 ragg_1.2.5 goftest_1.2-3 knitr_1.42 fitdistrplus_1.1-11 RANN_2.6.1
[81] pbapply_1.7-0 future_1.32.0 nlme_3.1-160 mime_0.12 compiler_4.2.0 rstudioapi_0.14 plotly_4.10.1 png_0.1-8
[89] spatstat.utils_3.0-2 bslib_0.4.2 stringi_1.7.12 RSpectra_0.16-1 lattice_0.20-45 Matrix_1.5-1 vctrs_0.6.2 pillar_1.9.0
[97] lifecycle_1.0.3 BiocManager_1.30.20 jquerylib_0.1.4 spatstat.geom_3.1-0 lmtest_0.9-40 RcppAnnoy_0.0.20 data.table_1.14.8 cowplot_1.1.1
[105] bitops_1.0-7 irlba_2.3.5.1 httpuv_1.6.9 R6_2.5.1 promises_1.2.0.1 renv_0.17.3 KernSmooth_2.23-20 gridExtra_2.3
[113] parallelly_1.35.0 codetools_0.2-18 fastDummies_1.6.3 MASS_7.3-58.1 rprojroot_2.0.3 withr_2.5.0 sctransform_0.3.5 GenomeInfoDbData_1.2.8
[121] parallel_4.2.0 hms_1.1.3 grid_4.2.0 rmarkdown_2.21 Rtsne_0.16 spatstat.explore_3.1-0 shiny_1.7.4

@william-hutchison
Copy link
Collaborator

Hello,

I believe changing the row order is the intended behaviour of sample_fraq() in dplyr, so it makes sense for the function to change cell order in tidyseurat.

You can see how sample_fraq() changes the row order in this example: https://dplyr.tidyverse.org/reference/sample_n.html

@stemangiola
Copy link
Owner

stemangiola commented Jun 4, 2023

Yes it randomizes cells, but it should not break the object, I think cells are randomised but not for all assays or slots somehow.

You can see the seurat function here that does that for reference

subset <- SubsetData(object, max.cells.per.ident = n.cells, random.seed = NULL)

from

satijalab/seurat#243

possibly related

satijalab/seurat#5329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants