Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exclusion of specific markers and marker combinations from the Polyfunctionality Score calculation #62

Open
gfinak opened this issue Aug 14, 2018 · 2 comments
Assignees

Comments

@gfinak
Copy link
Member

gfinak commented Aug 14, 2018

The lab wants to exclude IL4 single positive, GzB single positive and IL14/GzB double positive cell combinations from the Polyfunctionality Score calculations due to artefacts detected with those antibodies.

  • Is this currently supported?
  • Can we add an argument to support exclusion of cell subsets from the PFS calculation?
@gfinak gfinak self-assigned this Aug 14, 2018
@gfinak
Copy link
Member Author

gfinak commented Aug 14, 2018

Here's a reprex of how to do this currently:

The first part is boilerplate to construct an actual fitted COMPASSResult object.
The latter part shows how to exclude the IL17A and IL17F single positive and double positive cell subsets from the calculation of the Polyfunctionality score manually.

library(COMPASS)
library(readxl)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(assertthat)
f <- system.file("extdata","SimpleCOMPASSExample.xlsx",package="COMPASS")
data <- read_excel(f)

# tidy the input count matrix
# rename the subsets, keep only the booleans, ensure consistent ordering
tidy_data <- data %>% 
  gather(Population, Count, -c(1:3)) %>%
  mutate(Population = gsub("S/Avid-/L/CD3\\+/CD4\\+","CD4",Population)) %>%
  mutate(Population = gsub(",Count","",Population)) %>%
  mutate(Population = gsub("22","IL22",Population)) %>%
  mutate(Population = gsub("10","IL10",Population)) %>%
  mutate(Population = gsub("17A","IL17A",Population)) %>%
  mutate(Population = gsub("17F","IL17F",Population)) %>%
  mutate(Population = gsub("MIP","MIP1B",Population)) %>%
  mutate(Population = gsub("TNF","TNFa",Population)) %>%
  mutate(Population = translate_marker_names(Population)) %>%
  mutate(Population = gsub("([&!])2&","\\1IL2&",Population)) %>%
  mutate(Population = gsub("CD4/","",Population)) %>%
  filter(Population != "CD4") %>% 
  mutate(Parent = "CD4") %>%
  spread(Population, Count) %>% 
  arrange(`PATIENT ID`,Stimulation, GROUP, Parent) %>% 
  mutate(Stimulation = toupper(Stimulation))

# get the stim and unstim data into separate data frames
nu = tidy_data %>% filter(Stimulation == "UNSTIMULATED") %>% arrange(`PATIENT ID`)

ns = tidy_data %>% filter(Stimulation != "UNSTIMULATED") %>% arrange(`PATIENT ID`)

# order of subject ids should be the same
assertthat::are_equal(nu$`PATIENT ID`,ns$`PATIENT ID`)
#> [1] TRUE

# extract the metadata and the stim and unstim count matrices
metadata <- ns %>% select(`PATIENT ID`,GROUP,Stimulation,Parent)
nu <- nu %>% select(-`PATIENT ID`,-GROUP,-Stimulation,-Parent) %>% as.matrix
ns <- ns %>% select(-`PATIENT ID`,-GROUP,-Stimulation,-Parent) %>% as.matrix

# reverse the order of the cell subsets, so that the all-negative is the last column
nu <- nu[,rev(colnames(nu))]
ns <- ns[,rev(colnames(ns))]

# rownames must be set
rownames(nu) <- metadata$`PATIENT ID`
rownames(ns) <- metadata$`PATIENT ID`
metadata <- as.data.frame(metadata)
rownames(metadata) = metadata$`PATIENT ID`

# quick fit via SimpleCOMPASS interface
simplefit <- SimpleCOMPASS(n_s = ns, n_u = nu, meta = metadata, individual_id = "PATIENT ID",iterations = 1000, replications = 1)
#> Initializing parameters...
#> Computing initial parameter estimates...
#> Iteration 1000 of 1000.
#> Fitting model with 1 replications.
#> Running replication 1 of 1...
#> Iteration 1000 of 1000.
#> Done!

# Now compute the polyfunctionality score excluding IL17A and IL17F and IL17A/IL17F

## first get indices of the three cell subsets from the categories matrix
il17a_single <- as.data.frame(simplefit$data$categories)$IL17A == 1 & as.data.frame(simplefit$data$categories)$Counts == 1

il17f_single <- as.data.frame(simplefit$data$categories)$IL17F == 1 & as.data.frame(simplefit$data$categories)$Counts == 1

il17af_double <- as.data.frame(simplefit$data$categories)$IL17A == 1 & 
  as.data.frame(simplefit$data$categories)$IL17F == 1 &
  as.data.frame(simplefit$data$categories)$Counts == 2

indices <- il17a_single|il17af_double|il17f_single

## next extract the Gamma matrix from the fit

gamma_mat <- COMPASS:::Gamma(simplefit)
dim(gamma_mat)
#> [1]   21  256 1000
# Third dimension is the iterations
# second dimension is the cell subsets
# first dimension is the subjects
# we want to average across the iterations to compute the mean_gamma matrix
# from which we will construct a PFS score.

# compute the mean_gamma matrix averaging over iterations
mean_gamma <- apply(gamma_mat[,!indices,],1:2,mean)

# get the degree of each subset in the matrix.
degree <- simplefit$data$categories[!indices,"Counts"]

## Compute the reduced score.
reduced_score <- PolyfunctionalityScore(x = mean_gamma, degree = degree, n = log(256,2))

#combine with the usual scores
merged <- scores(simplefit) %>% 
  arrange(`PATIENT ID`) %>% 
  bind_cols(reduced_PFS = reduced_score)

# do these make sense
all(merged$reduced_PFS <= merged$PFS)
#> [1] TRUE

Created on 2018-08-14 by the reprex
package
(v0.2.0).

@gfinak
Copy link
Member Author

gfinak commented Oct 12, 2018

this can be partly accomplished with the FS and PFS APIs passing in markers= to specify markers to include in the calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant