Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WarpSET:Error in contrast #34

Open
bioguy2018 opened this issue Jul 21, 2020 · 9 comments
Open

WarpSET:Error in contrast #34

bioguy2018 opened this issue Jul 21, 2020 · 9 comments

Comments

@bioguy2018
Copy link

bioguy2018 commented Jul 21, 2020

Dear All, I have been trying to apply the warpset() for normalization to correct the batch effect among my channels but unfortunately I am getting the following error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

I tried to run the function using the ITN dataset as guided through the documentation and I realized I get the same error when I skip the following chunk as I don't want to do any pregating and just want the normalization to be applied on all data

lg <- lymphGate(dat, channels=c("CD3", "SSC"), preselection="CD4",scale=1.5) dat <- Subset(dat, lg$n2gate)

I couldn't fine any specific differneces between the datasets after and before running the above code! can someone help me what's wrong with the contrast?

Thanks a lot!

@jacobpwagner
Copy link
Member

jacobpwagner commented Jul 22, 2020

Can you provide an example of your code that is failing (ideally with available data). I can see the changes from the example in the doc even after removing the two lines you point out:

library(flowCore)
library(flowStats)


data(ITN)
dat <- transform(ITN, "CD4"=asinh(CD4), "CD3"=asinh(CD3), "CD8"=asinh(CD8))
# lg <- lymphGate(dat, channels=c("CD3", "SSC"), preselection="CD4",scale=1.5)
# dat <- Subset(dat, lg)
datr <- warpSet(dat, "CD8", grouping="GroupID", monwrd=TRUE)
if(require(flowViz)){
  d1 <- densityplot(~CD8, dat, main="original", filter=curv1Filter("CD8"))
  d2 <- densityplot(~CD8, datr, main="normalized", filter=curv1Filter("CD8"))
  plot(d1, split=c(1,1,2,1))
  plot(d2, split=c(2,1,2,1), newpage=FALSE)
}

plot_zoom_png

Additionally, flowStats makes no direct uses of stats::contrasts. It's definitely possible that indirect usage is still getting triggered by flowStats code, but I suspect your problem may lie outside of flowStats.

@bioguy2018
Copy link
Author

bioguy2018 commented Jul 22, 2020

Dear Jacob,
Thanks for your reply, sorry that I forgot to mention! so in the above example if you skip the transformation part which is something I do because my data is already transformed using read.fcs() function, you will get the error ! So if I run the following :

data(ITN)
#dat <- transform(ITN, "CD4"=asinh(CD4), "CD3"=asinh(CD3), "CD8"=asinh(CD8))
# lg <- lymphGate(dat, channels=c("CD3", "SSC"), preselection="CD4",scale=1.5)
# dat <- Subset(dat, lg)
datr <- warpSet(ITN, "CD8", grouping="GroupID", monwrd=TRUE)

I get this error which I can't really understand why it's happening.

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

@jacobpwagner
Copy link
Member

Thanks, now I can see the issue. The error is coming from lm here:

anv[i] <- anova(lm(landmarks[,i] ~ grps))$Pr[1]

because whole groups do not have an estimate for the landmark. Those NAs get imputed to the column means below:

flowStats/R/warpSet.R

Lines 722 to 723 in 336f98f

nar <- is.na(landmarks[,n])
landmarks[nar,n] <- mean(landmarks[,n], na.rm=TRUE)

For example, single NA's are tolerable here:

Browse[3]> grps
 [1] Group 1 Group 1 Group 1 Group 1 Group 1 Group 5 Group 5 Group 5 Group 5 Group 5 Group 6 Group 6 Group 6 Group 6 Group 6
Levels: Group 1 Group 5 Group 6
Browse[3]> landmarks[,2]
 sample01  sample02  sample03  sample04  sample05  sample06  sample07  sample08  sample09  sample10  sample11  sample12  sample13  sample14  sample15 
 7.855511  8.999192  8.999203  8.999189  8.999206  8.999176  8.999194  8.999194  8.999193  8.999182 17.275582  8.999199  8.999180        NA  7.481021 
Browse[3]> lm(landmarks[,2] ~ grps)

Call:
lm(formula = landmarks[, 2] ~ grps)

Coefficients:
(Intercept)  grpsGroup 5  grpsGroup 6  
     8.7705       0.2287       1.9183  

But not NA for every member of a group (as then lm will appropriately throw this error):

Browse[3]> landmarks[,1]
sample01 sample02 sample03 sample04 sample05 sample06 sample07 sample08 sample09 sample10 sample11 sample12 sample13 sample14 sample15 
      NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       NA 5.305356       NA 
Browse[3]> lm(landmarks[,1] ~ grps)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

However, I just need to add a guard to that earlier block checking the within-group variance vs between-group variance so it doesn't pass down a broken formula to lm. I will also add a warning, because this is going to mean that those landmarks for the groups entirely missing a landmark will have to be imputed from the others.

@jacobpwagner
Copy link
Member

jacobpwagner commented Jul 22, 2020

@bioguy2018 , after 96eaf10, this case should be handled so you should not get the obscure contrasts error. If you hit this weird case where a landmark is only present in one of the groups, you will just get a warning like the following:

Warning messages:
1: In warpSet.flowSet(ITN, "CD8", grouping = "GroupID", monwrd = TRUE) :
  The following landmark is only present in a single group --
stain: CD8
mean value: 5.30535567890338

As was the case before, that landmark value will be imputed to the mean value based on all samples here:

flowStats/R/warpSet.R

Lines 722 to 723 in 336f98f

nar <- is.na(landmarks[,n])
landmarks[nar,n] <- mean(landmarks[,n], na.rm=TRUE)

Let me know if you still run in to further problems.

@bioguy2018
Copy link
Author

Dear Jacob,
Thank you so much for your comprehensive answer and taking care of this issue! I will re-install the package and try it again! I just have a technical question! Would it be okay to use original values for the normalization and do the transformation after normalizing ? also I read in one of the topics here that you do not support warpset function anymore and people should use other methods like cytonorm! is it really the case and may I know why did you decide to not support it any further?

@jacobpwagner
Copy link
Member

jacobpwagner commented Jul 22, 2020

To your first question, while you could technically normalize with warpSet before further transformation, that further transformation could disrupt the resulting alignment of high-density regions.

To your second question, we are just not actively working on warpSet anymore and there are more modern methods for normalization. It will continue to be available via flowStats and if true bugs like this arise I can still take care of them (assuming the time investment is worth it). We just have limited bandwidth to split between development and maintenance across all packages.

@bioguy2018
Copy link
Author

Dear Jacob,
Thanks a lot for your answers! So is the change already reflected in the package? cause I uninstalled and installed the package again but I still get the same error!

@jacobpwagner
Copy link
Member

Did you install from GitHub or Bioconductor? The change I made is only available from GitHub so far:

remotes::install_github("RGLab/flowStats")

I'll be merging it to Bioconductor today so it should appear in their 3.12 development builds in a few days.

@bioguy2018
Copy link
Author

Dear Jacob,
Thank you! I did upgrade the package using github and the function works with ITN database but for my own dataset I get the following error:

Error in chol.default(Asym) :
the leading minor of order 4 is not positive definite

I have read it can happen due to the noisiness of data, i also tried to transform and truncate my data but still the same error! Do you have any suggestion in that regards?
Also I would like to ask you about the design that should be used for normalization. I have longitudinal data at different time points with replicates that have been done on different days. These data are consisted of ctrl and treatment samples. What kind of design should I use for normalizing my data? do I need to group them based on each and every experiment or can I normalize them based on the different time points?

Sorry for the big issue and I really appreciate your kind help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants