Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool Estimation - Zero variance metabolites #75

Open
dprymidis opened this issue Nov 20, 2023 · 6 comments
Open

Pool Estimation - Zero variance metabolites #75

dprymidis opened this issue Nov 20, 2023 · 6 comments
Assignees
Labels
Intermediate priority Implementation needs to be prioritised

Comments

@dprymidis
Copy link
Collaborator

Need to add the zero variance check in Pool estimation because it could cause error in the PCA in certain scenarios

@dprymidis dprymidis added the Intermediate priority Implementation needs to be prioritised label Nov 20, 2023
@dprymidis dprymidis self-assigned this Nov 20, 2023
@ChristinaSchmidt1
Copy link
Collaborator

Hi, As you probably noticed i am just going trough some issues to close them up and write you were I do not remember the progress.
For this one, did you already start anything? I do remember that you also wanted to implemented the zero variance test in the pre-processing; maybe you could let me know what was the last status and if we can use something you already added into another function.

@dprymidis
Copy link
Collaborator Author

dprymidis commented Jan 30, 2024

No this was not done, but I did write the zero variance check as a function which could be used whereever its needed. I paste it here:

ZeroVarCheck <- function(Input_data){
  
  #Check metabolite variance
  metabolite_var <-  as.data.frame(apply(Input_data, 2, function(x) var(x, na.rm = TRUE)) %>% t()) #calculate each metabolites variance
  metabolite_zero_var_list <- colnames(metabolite_var)[which(metabolite_var[1,]==0)] #takes the names of metabollites with zero variance and puts them in list
  
  #Print a warning if Zero var metabolites were identified
  if(length(colnames(metabolite_var)[which(metabolite_var[1,]==0)]) > 0 ){
    message("Metabolites with zero variance have been identified in the data.")
  }
  
  #Remove the zero variance metabolites
  Input_data_filtered <- Input_data %>% select(-all_of(metabolite_zero_var_list))
  
  #Save resulting table
  #write.table(zero_var_metab_export_df, row.names = FALSE, file =  paste(Results_folder_Preprocessing_folder,"/Zero_variance_metabolites",".csv",sep =  "")) #save zero var metabolite list

  return(list("Input_data_filtered"=Input_data_filtered,"ZeroVarMetabolites" = metabolite_zero_var_list) )
  }

@ChristinaSchmidt1
Copy link
Collaborator

Ok, thanks :)

To sum this up, you planned to add this check to the pool estimation function prior to calcuclating CV?
So basically we would add to MetaProViz::Pool_Estimation:

  1. Zero variance check
  2. Shapiro test (see other issue we just wrote about)
  3. calculate CV

1 and 2 would basically result in messages/warnings and refer to cases (=metabolites) where we can not calculate CV either because we have zero variance or because we have not-normal distribution.

Where there other cases where you would have added the zero variance function? I recon we probably should do this prior to the shapiro test as this would also be impacted by zero variance (?).

@dprymidis
Copy link
Collaborator Author

yes, 1-2-3 are correct and I would add the check prior to when we use the variance or sd like CV tests and also PCAs.

@ChristinaSchmidt1
Copy link
Collaborator

Thanks, then I will do the above :)

About PCA:
So prior to PCA you would also remove the zero variance metabolites - how is this impacting the compression in PCA?

Do we need to check for normailty also prior to PCA? Cause within PCA, we recommend scaling=TRUE.

@dprymidis
Copy link
Collaborator Author

PCA simply does not work if you input features with zero variance, you have to remove them prior using it. About normality no need to check before PCA, it shouldnt play a role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intermediate priority Implementation needs to be prioritised
Projects
None yet
Development

No branches or pull requests

2 participants