The package mixhvg works for highly variable gene selection, including popular public available methods, and also the mixture of multiple highly variable gene selection methods. The mixture of methods can combine the advantages captured by each single method.
This function FindVariableFeaturesMix inherits from FindVariableFeatures function of Seurat Package, which can be used the same as FindVariableFeatures. Also, it accepts the dense or sparse matrix input.
Please download the mixhvg package from CRAN.
install.packages("mixhvg")
Please use GitHub repo to download the most updated package.
devtools::install_github("RuzhangZhao/mixhvg")
There are two inputs can be used in FindVariableFeaturesMix function.
The example data comes from 10x Genomics. You may use this link to download the processed data. The processed data is named as pbmc3k_rna.rds
.
pbmc<-readRDS("pbmc3k_rna.rds")
library(Seurat)
library(mixhvg)
object<-CreateSeuratObject(pbmc)
object<-FindVariableFeaturesMix(object,nfeatures=2000)
head(VariableFeatures(object))
# [1] "CD74" "LYZ" "IGLC3" "IGKC" "RPS29" "IGHA1"
One may use FindVariableFeaturesMix to replace the FindVariableFeatures function in the analysis pipeline of Seurat. For example,
object<-CreateSeuratObject(pbmc)
object<-FindVariableFeaturesMix(object)
object<-NormalizeData(object,verbose=FALSE)
object<-ScaleData(object,verbose = FALSE)
object<-RunPCA(object,npcs=30,verbose=FALSE)
object<-RunUMAP(object,dims=1:30,verbose = FALSE)
pbmc<-readRDS("pbmc3k_rna.rds")
library(mixhvg)
pbmc_hvg<-FindVariableFeaturesMix(pbmc,nfeatures=2000)
head(pbmc_hvg)
# [1] "CD74" "LYZ" "IGLC3" "IGKC" "RPS29" "IGHA1"
The method.names
can take one method or multiple methods for mixture.
pbmc_hvg<-FindVariableFeaturesMix(pbmc,method.names="seuratv3")
pbmc_hvg<-FindVariableFeaturesMix(pbmc,method.names="scran")
pbmc_hvg<-FindVariableFeaturesMix(pbmc,
method.names=c("scran","scran_pos","seuratv1"))
The following methods can be chosen. And also, any mixture of the following methods is acceptable. For example, the default is c("scran","seuratv1","mv_PFlogPF","scran_pos")
- scran: Use mean-variance curve adjustment on lognormalized count matrix, which is scran ModelGeneVar.
- mv_ct: Use mean-variance curve adjustment on count matrix, inherited from scran ModelGeneVar.
- mv_nc: Use mean-variance curve adjustment on normalized count matrix, inherited from scran ModelGeneVar.
- mv_lognc: The same as scran.
- mv_PFlogPF: Use mean-variance curve adjustment on PFlog1pPF matrix, inherited from scran ModelGeneVar.
- scran_pos: Use scran poisson version, modelGeneVarByPoisson.
- seuratv3: Use logmean-logvariance curve adjustment on count matrix, which is vst, Seurat FindVariableFeatures Function(https://satijalab.org/seurat/reference/findvariablefeatures).
- logmv_ct: The same as seuratv3.
- logmv_nc: Use logmean-logvariance curve adjustment on normalized count matrix, inherited from seuratv3(vst).
- logmv_lognc: Use logmean-logvariance curve adjustment on lognormalized count matrix, inherited from seuratv3(vst).
- logmv_PFlogPF: Use logmean-logvariance curve adjustment on PFlog1pPF matrix, inherited from seuratv3(vst).
- seuratv1: Use dispersion on lognormalized count matrix, which is dispersion (disp), Seurat FindVariableFeatures Function(https://satijalab.org/seurat/reference/findvariablefeatures).
- disp_lognc: The same as seuratv1.
- disp_PFlogPF: Use dispersion on PFlog1pPF matrix, inherited from seuratv1(disp).
- mean_max_ct: Highly Expressed Features with respect to count matrix.
- mean_max_nc: Highly Expressed Features with respect to normalized count matrix.
- mean_max_lognc: Highly Expressed Features with respect to lognormalized count matrix
The table below can describe the data format and mean adjustment combination.
The following figure shows how different methods perform. It includes both single highly variable gene selection methods and the mixture. We notice the 1mvn3pos4dis works best, which is the default setting: c("scran","scran_pos","seuratv1").