Added comment for CV as mentioned in the issue #63

saezlab · Sep 15, 2023 · b762ddb · b762ddb
1 parent 56b84bd
commit b762ddb
Showing 1 changed file with 15 additions and 5 deletions.
diff --git a/vignettes/Standard Metabolomics.Rmd b/vignettes/Standard Metabolomics.Rmd
@@ -146,6 +146,7 @@ The parameter `MVI` refers to Missing Value Imputation (MVI) and if `MVI = TRUE`
 Lastly, the function `Preprocessing()` performs outlier detection and adds a column "Outliers" into the DF, which can be used to remove outliers. The parameter `HotellinsConfidence` can be used to choose the confidence interval that should be used for the Hotellins T2 outlier test [@Hotelling1931].\
 \
 Since our example data contains pool samples, we will do `Pool_Estimation()` before applying the `Preprocessing()` function. This is important, since one should remove the features (=metabolites) that are too variable prior to performing any data transformations such as TIC as part of the `Preprocessing()` function.\
+It is worth mentioning that the Coefficient of variation (CV) is calculated by dividing the standard deviation (SD) by the mean. Hence CV depends on the SD, which in turn works for normally distributed data.\
 ```{r}
 #Get the Pool data
 PoolData <- MetaProViz::toy_data(data="Standard") %>%
@@ -159,7 +160,7 @@ MetaProViz::Pool_Estimation(Input_data = PoolData,
                             Input_SettingsFile = NULL,
                             Input_SettingsInfo = NULL,
                             Unstable_feature_remove = FALSE, 
-                            Therhold_cv = 1)
+                            Threshold_cv = 1)
 ```
 ```{r, echo=FALSE}
 # Check how our data looks like:
@@ -169,7 +170,7 @@ Pool_Estimation_result[1:5,]%>%
   #kableExtra::scroll_box(width = "100%", height = "200px")
 ```
 \
-The results from the `Pool_Estimation()` is a table that has the Coefficient of variation (CV). If there is a high variability, one should consider to remove those features from the data. For the example data nothing needs to be removed. If you have used internal standard in your experiment you should specifically check their CV as this would indicate technical issues (here valine-d8 and hippuric acid-d5).\
+The results from the `Pool_Estimation()` is a table that has the CV. If there is a high variability, one should consider to remove those features from the data. For the example data nothing needs to be removed. If you have used internal standard in your experiment you should specifically check their CV as this would indicate technical issues (here valine-d8 and hippuric acid-d5).\
 
 ```{r, eval=FALSE}
 #Test out QC plots:
@@ -277,9 +278,11 @@ As input we will use the  pre-processed data we have generated using the `Prepro
 `1.` If all values of the replicates of one condition are NA/0 for a feature (=metabolite): Log2FC= Inf/-Inf and the statistics will be NA\
 `2.` If some values  of the replicates of one condition are NA/0 for a feature (=metabolite): Log2FC= positive or negative value, but the statistics will be NA\
 \
+Discuss the potential usage of data transformation prior to DMA (e.g. log transformation)--> Is this even possible to use in the DMA as we do Log2FC of log transformed data?
+\
 In the example data we have four different cell lines, healthy (HK2) and cancer (ccRCC: 786-M1A, 786-M2A and 786-O) and hence we can perform multiple different comparisons. The results are automatically saved and returned into the global environment. If parameter Plot=TRUE, an overview Volcano plot is generated and saved.\
 ```{r}
-`DMA_786-O_vs_HK2` <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], #we need to remove columns that do not include metabolite measurements
+DMA_786O_vs_HK2 <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], #we need to remove columns that do not include metabolite measurements
                         Input_SettingsFile=Intra_Preprocessed[,c(1:2)],#only maintain the information about condition and replicates
                         Input_SettingsInfo = c(conditions="Conditions", numerator="786-O", denominator = "HK2"),
                         STAT_pval ="t.test",#ProDA is another test! --> estimates missing information
@@ -289,8 +292,10 @@ In the example data we have four different cell lines, healthy (HK2) and cancer
                         CoRe=FALSE, 
                         Plot = TRUE)
 
+`DMA_786-O_vs_HK2`<- DMA_786O_vs_HK2[["DMA_Results"]]#Get the DMA table
+
 #Perform the other comparisons:
-`DMA_786-M1A_vs_HK2` <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], 
+DMA_786M1A_vs_HK2 <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], 
                                    Input_SettingsFile=Intra_Preprocessed[,c(1:2)],
                                    Input_SettingsInfo = c(conditions="Conditions", numerator="786-M1A", denominator = "HK2"),
                                    STAT_pval ="t.test",
@@ -300,7 +305,9 @@ In the example data we have four different cell lines, healthy (HK2) and cancer
                                    CoRe=FALSE,
                                    Plot = TRUE)
 
-`DMA_786-M2A_vs_HK2` <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], 
+`DMA_786-M1A_vs_HK2` <- DMA_786M1A_vs_HK2[["DMA_Results"]]#Get the DMA table
+
+DMA_786M2A_vs_HK2 <- MetaProViz::DMA(Input_data=Intra_Preprocessed[,-c(1:4)], 
                                    Input_SettingsFile=Intra_Preprocessed[,c(1:2)],
                                    Input_SettingsInfo = c(conditions="Conditions", numerator="786-M2A", denominator = "HK2"),
                                    STAT_pval ="t.test",
@@ -310,6 +317,9 @@ In the example data we have four different cell lines, healthy (HK2) and cancer
                                    CoRe=FALSE, 
                                    Plot = TRUE)
 
+
+`DMA_786-M2A_vs_HK2` <- DMA_786M2A_vs_HK2[["DMA_Results"]]#Get the DMA table
+
 ```
 ```{r, echo=FALSE}
 # Check how our data looks like: