Skip to content

The package mixhvg uses mixture of highly variable gene selection methods to improve gene selection.

Notifications You must be signed in to change notification settings

RuzhangZhao/mixhvg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mixhvg: Mixture of Highly Variable Gene Selection

Project Status Build app CRAN_Status_Badge

Overview

The package mixhvg works for highly variable gene selection, including popular public available methods, and also the mixture of multiple highly variable gene selection methods. The mixture of methods can combine the advantages captured by each single method.

This function FindVariableFeaturesMix inherits from FindVariableFeatures function of Seurat Package, which can be used the same as FindVariableFeatures. Also, it accepts the dense or sparse matrix input.

Download

Please download the mixhvg package from CRAN.

install.packages("mixhvg")

Please use GitHub repo to download the most updated package.

devtools::install_github("RuzhangZhao/mixhvg")

Usage

There are two inputs can be used in FindVariableFeaturesMix function.

The example data comes from 10x Genomics. You may use this link to download the processed data. The processed data is named as pbmc3k_rna.rds.

Seurat Object as Input

pbmc<-readRDS("pbmc3k_rna.rds")
library(Seurat)
library(mixhvg)
object<-CreateSeuratObject(pbmc)
object<-FindVariableFeaturesMix(object,nfeatures=2000)
head(VariableFeatures(object))
# [1] "CD74"  "LYZ"   "IGLC3" "IGKC"  "RPS29" "IGHA1"

One may use FindVariableFeaturesMix to replace the FindVariableFeatures function in the analysis pipeline of Seurat. For example,

object<-CreateSeuratObject(pbmc)
object<-FindVariableFeaturesMix(object)
object<-NormalizeData(object,verbose=FALSE)
object<-ScaleData(object,verbose = FALSE)
object<-RunPCA(object,npcs=30,verbose=FALSE)
object<-RunUMAP(object,dims=1:30,verbose = FALSE)

Matrix as Input

pbmc<-readRDS("pbmc3k_rna.rds")
library(mixhvg)
pbmc_hvg<-FindVariableFeaturesMix(pbmc,nfeatures=2000)
head(pbmc_hvg)
# [1] "CD74"  "LYZ"   "IGLC3" "IGKC"  "RPS29" "IGHA1"

Different Methods

The method.names can take one method or multiple methods for mixture.

pbmc_hvg<-FindVariableFeaturesMix(pbmc,method.names="seuratv3")
pbmc_hvg<-FindVariableFeaturesMix(pbmc,method.names="scran")
pbmc_hvg<-FindVariableFeaturesMix(pbmc,
          method.names=c("scran","scran_pos","seuratv1"))

Method Choices

The following methods can be chosen. And also, any mixture of the following methods is acceptable. For example, the default is c("scran","seuratv1","mv_PFlogPF","scran_pos")

  • scran: Use mean-variance curve adjustment on lognormalized count matrix, which is scran ModelGeneVar.
  • mv_ct: Use mean-variance curve adjustment on count matrix, inherited from scran ModelGeneVar.
  • mv_nc: Use mean-variance curve adjustment on normalized count matrix, inherited from scran ModelGeneVar.
  • mv_lognc: The same as scran.
  • mv_PFlogPF: Use mean-variance curve adjustment on PFlog1pPF matrix, inherited from scran ModelGeneVar.
  • scran_pos: Use scran poisson version, modelGeneVarByPoisson.
  • seuratv3: Use logmean-logvariance curve adjustment on count matrix, which is vst, Seurat FindVariableFeatures Function(https://satijalab.org/seurat/reference/findvariablefeatures).
  • logmv_ct: The same as seuratv3.
  • logmv_nc: Use logmean-logvariance curve adjustment on normalized count matrix, inherited from seuratv3(vst).
  • logmv_lognc: Use logmean-logvariance curve adjustment on lognormalized count matrix, inherited from seuratv3(vst).
  • logmv_PFlogPF: Use logmean-logvariance curve adjustment on PFlog1pPF matrix, inherited from seuratv3(vst).
  • seuratv1: Use dispersion on lognormalized count matrix, which is dispersion (disp), Seurat FindVariableFeatures Function(https://satijalab.org/seurat/reference/findvariablefeatures).
  • disp_lognc: The same as seuratv1.
  • disp_PFlogPF: Use dispersion on PFlog1pPF matrix, inherited from seuratv1(disp).
  • mean_max_ct: Highly Expressed Features with respect to count matrix.
  • mean_max_nc: Highly Expressed Features with respect to normalized count matrix.
  • mean_max_lognc: Highly Expressed Features with respect to lognormalized count matrix

The table below can describe the data format and mean adjustment combination.

Fig2

Benchmark Highly Variable Gene Selection Methods

The following figure shows how different methods perform. It includes both single highly variable gene selection methods and the mixture. We notice the 1mvn3pos4dis works best, which is the default setting: c("scran","scran_pos","seuratv1").

Fig2

About

The package mixhvg uses mixture of highly variable gene selection methods to improve gene selection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages