Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default random seed in WhichCells prevents random sampling with [] #62

Open
nicolaromano opened this issue Oct 18, 2022 · 1 comment
Open

Comments

@nicolaromano
Copy link

nicolaromano commented Oct 18, 2022

Problem

Wanting to randomly subset a Seurat object, one could do

library(Seurat)

set.seed(12345)
for (i in 1:5)
{
    ten_cells <- pbmc_small[, sample(Cells(pbmc_small), 5)]

    print(Cells(ten_cells))
}

This prints

[1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG"
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG"
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG"
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG"
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG"

Expected behaviour

The five rounds should all return different cells, while it always (but for the first round) returns the same cells.

Issue

Credits to wurli on Stackoverflow for figuring out the problem.

Seurat overloads [] to call subset.Seurat() which in turn calls WhichCells(), which has a default seed parameter of 1.
The first time our seed 12345 works (hence the first sample being different), but for the following rounds it is overwritten to 1.

Workaround

library(Seurat)

set.seed(12345)
for (i in 1:5)
{
    ten_cells <- pbmc_small[, sample(Cells(pbmc_small), 5), seed=NULL]

    print(Cells(ten_cells))
}

This prints

[1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG"
[1] "CATGGCCTGTGCAT" "TTACGTACGTTCAG" "ACAGGTACTGGTGT" "AATGTTGACAGTCA" "GATAGAGAAGGGTG"
[1] "CATTACACCAACTG" "GGCATATGCTTATC" "ACAGGTACTGGTGT" "CATCAGGATGCACA" "ATGCCAGAACGACT"
[1] "GAGTTGTGGTAGCT" "GGCATATGGGGAGT" "AGAGATGATCTCGC" "GAACCTGATGAACC" "GATATAACACGCAT"
[1] "CATGAGACACGGGA" "GGGTAACTCTAGTG" "TTTAGCTGTACTCT" "TACATCACGCTAAC" "CTAAACCTGTGCAT"

Proposed solution

Not sure if removing the default seed with break other things, but this should probably be better documented as calling WhichCells overwrites the user seed, which is definitely unwanted behaviour.

@nicolaromano
Copy link
Author

Another example of this behaviour

library(Seurat)

set.seed(12345)
print(rnorm(1)) # 0.5855288

set.seed(12345)
print(rnorm(1)) # 0.5855288

set.seed(12345)
id2 <- WhichCells(pbmc_small, idents = 2)
print(rnorm(1)) # -0.6264538

set.seed(12345)
pbmc1 <- subset(pbmc_small, idents = 1)
print(rnorm(1)) # -0.6264538

set.seed(1)
print(rnorm(1)) # -0.6264538

# Other Seurat functions behave as expected
set.seed(12345)
DimPlot(pbmc_small)
print(rnorm(1)) # 0.5855288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant