-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/refactor #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on making the touched parts much more readable :)!
Some general points besides the specific comments:
- Remove debug comments / paths.
- I see both
=
and<-
being used to define variables. Try to be consistent and chose a style. (I think<-
is preferred in the R community, strictly speaking they do something else but the result is mostly the same). - Don't create variables with same name and a number.
- make_plots.R is very hard to interpret, does indeed need a refactor
# Feb 2023: code refactored to run as last step of DIMS pipeline on HPC | ||
# setwd("/Users/mraves2/Metabolomics/DIMS_Test_run"); outdir <- getwd() # test env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not necessary comment, code hsitory can be found using git/github.
- Remove debug comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment removed.
library(openxlsx) # for opening Excel file | ||
library(ggplot2) # for plotting | ||
library(gghighlight) | ||
library(sys) | ||
library(tidyr) | ||
library(gridExtra) | ||
library(grid) | ||
library(testthat) # for unit testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the comments after the library loading necessary or debugging/testing comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing; I want to know which libraries are essential and which can be omitted. Comments will be removed in final version.
|
||
# define parameters - check after addition to run.sh | ||
cmd_args <- commandArgs(trailingOnly = TRUE) | ||
for (arg in cmd_args) cat(" ", arg, "\n", sep = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the paste0 function looks better and do you need the spaces (or tabs?) in front of the cmd args?
cat(paste0("\t", cmd_args), sep ="\n")
If your really would like to use for loops make sure to write them completely, makes it way easier to understand what is happening:
for (arg in cmd_args) {
cat(" ", arg, "\n", sep = "")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I literally copied this from another of the pipeline scripts. I agree the coding can be made clearer. This is something that can only be tested when the run.sh file is modified to include the violin plots script.
|
||
split <- TRUE | ||
shorter <- 0 # shorter list of patients | ||
w <- 0.01 # seconds that system waits in between steps. Remove on HPC? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember reading about the wait parameter, but cannot find it again. I don't see why you would need a wait step; I will remove it.
source("/hpc/dbg_mz/production/DIMS/pipeline/scripts/AddOnFunctions/other_isobaric.R") | ||
source("/hpc/dbg_mz/production/DIMS/pipeline/scripts/AddOnFunctions/make_plots.R") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to the top of this file (below library loads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, moved.
# extract the top 20 highest and top 10 lowest scoring metabolites | ||
if (startsWith(stoftest, "top") & ptcount > 1) { | ||
if (endsWith(stoftest, "hoogst")){ | ||
topX <- unique(summed[pt]) %>% slice_max(unique(summed[pt]),n = 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is X?
@@ -0,0 +1,304 @@ | |||
make_plots <- function(metab.list, stoftest, pt, zscore_cutoff, xaxis_cutoff, ThisProbScore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- function make_plots works, but structure is too complex. Split into prepare_data and violin_plot functions
Totally agree, there is a lot of data prep going on, not just making plots :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped with comments on this function, since I think you will go over it again in the future. Which will probably solve a lot of comments.
moi <- joined[,-2] | ||
j <- joined[,-1] | ||
jm <- reshape2::melt(j, id.vars = "HMDB_name") | ||
#jmo <- jm[ rev(order(match(jm$HMDB_name, joined$HMDB_name))), ] | ||
jma <- aggregate(HMDB_name ~ value+variable, jm, paste0, collapse = "_") | ||
jmas <- jma %>% separate(HMDB_name, into = c("HMDB_name", "isobar"), sep="_",extra = "merge", fill = "right") | ||
jmaso <- jmas[ rev(order(match(jmas$HMDB_name, joined$HMDB_name))), ] | ||
jmao <- jmaso %>% unite(HMDB_name,c("HMDB_name","isobar"),sep="_", na.rm = TRUE) | ||
#enters <- max(lengths(regmatches(jmao$HMDB_name, gregexpr(";|_", jma$HMDB_name)))) | ||
jmao$HMDB_name <- gsub(';|_','\n',jmao$HMDB_name) | ||
jmao$HMDB_name <- factor(jmao$HMDB_name, levels=unique(jmao$HMDB_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand these variable names.
@@ -0,0 +1,22 @@ | |||
other_isobaric <- function(Combined, metab.list, infile, check.lists) { | |||
# not sure what this function does, find metabolites with same mass? Isomeric, not isobaric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think anti_join finds differences between two tables -> result = table with different masses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to test this function separately from the other code to find out what it exactly does.
if (ncol(metab.list) > 2) { | ||
cat("alarmvalues") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No return -> what happens?
No description provided.