Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/refactor #4

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Feature/refactor #4

wants to merge 2 commits into from

Conversation

mraves2
Copy link

@mraves2 mraves2 commented Feb 24, 2023

No description provided.

@rernst rernst self-requested a review March 2, 2023 09:23
Copy link
Member

@rernst rernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on making the touched parts much more readable :)!

Some general points besides the specific comments:

  • Remove debug comments / paths.
  • I see both = and <- being used to define variables. Try to be consistent and chose a style. (I think <- is preferred in the R community, strictly speaking they do something else but the result is mostly the same).
  • Don't create variables with same name and a number.
  • make_plots.R is very hard to interpret, does indeed need a refactor

Comment on lines +10 to +11
# Feb 2023: code refactored to run as last step of DIMS pipeline on HPC
# setwd("/Users/mraves2/Metabolomics/DIMS_Test_run"); outdir <- getwd() # test env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Not necessary comment, code hsitory can be found using git/github.
  • Remove debug comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment removed.

Comment on lines +16 to +23
library(openxlsx) # for opening Excel file
library(ggplot2) # for plotting
library(gghighlight)
library(sys)
library(tidyr)
library(gridExtra)
library(grid)
library(testthat) # for unit testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the comments after the library loading necessary or debugging/testing comments?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing; I want to know which libraries are essential and which can be omitted. Comments will be removed in final version.


# define parameters - check after addition to run.sh
cmd_args <- commandArgs(trailingOnly = TRUE)
for (arg in cmd_args) cat(" ", arg, "\n", sep = "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the paste0 function looks better and do you need the spaces (or tabs?) in front of the cmd args?

cat(paste0("\t", cmd_args), sep ="\n")

If your really would like to use for loops make sure to write them completely, makes it way easier to understand what is happening:

for (arg in cmd_args) {
    cat("  ", arg, "\n", sep = "")
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I literally copied this from another of the pipeline scripts. I agree the coding can be made clearer. This is something that can only be tested when the run.sh file is modified to include the violin plots script.


split <- TRUE
shorter <- 0 # shorter list of patients
w <- 0.01 # seconds that system waits in between steps. Remove on HPC?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember reading about the wait parameter, but cannot find it again. I don't see why you would need a wait step; I will remove it.

Comment on lines +58 to +59
source("/hpc/dbg_mz/production/DIMS/pipeline/scripts/AddOnFunctions/other_isobaric.R")
source("/hpc/dbg_mz/production/DIMS/pipeline/scripts/AddOnFunctions/make_plots.R")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to the top of this file (below library loads).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, moved.

# extract the top 20 highest and top 10 lowest scoring metabolites
if (startsWith(stoftest, "top") & ptcount > 1) {
if (endsWith(stoftest, "hoogst")){
topX <- unique(summed[pt]) %>% slice_max(unique(summed[pt]),n = 20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is X?

@@ -0,0 +1,304 @@
make_plots <- function(metab.list, stoftest, pt, zscore_cutoff, xaxis_cutoff, ThisProbScore,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • function make_plots works, but structure is too complex. Split into prepare_data and violin_plot functions

Totally agree, there is a lot of data prep going on, not just making plots :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stopped with comments on this function, since I think you will go over it again in the future. Which will probably solve a lot of comments.

Comment on lines +46 to +56
moi <- joined[,-2]
j <- joined[,-1]
jm <- reshape2::melt(j, id.vars = "HMDB_name")
#jmo <- jm[ rev(order(match(jm$HMDB_name, joined$HMDB_name))), ]
jma <- aggregate(HMDB_name ~ value+variable, jm, paste0, collapse = "_")
jmas <- jma %>% separate(HMDB_name, into = c("HMDB_name", "isobar"), sep="_",extra = "merge", fill = "right")
jmaso <- jmas[ rev(order(match(jmas$HMDB_name, joined$HMDB_name))), ]
jmao <- jmaso %>% unite(HMDB_name,c("HMDB_name","isobar"),sep="_", na.rm = TRUE)
#enters <- max(lengths(regmatches(jmao$HMDB_name, gregexpr(";|_", jma$HMDB_name))))
jmao$HMDB_name <- gsub(';|_','\n',jmao$HMDB_name)
jmao$HMDB_name <- factor(jmao$HMDB_name, levels=unique(jmao$HMDB_name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand these variable names.

@@ -0,0 +1,22 @@
other_isobaric <- function(Combined, metab.list, infile, check.lists) {
# not sure what this function does, find metabolites with same mass? Isomeric, not isobaric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anti_join finds differences between two tables -> result = table with different masses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to test this function separately from the other code to find out what it exactly does.

Comment on lines +6 to +7
if (ncol(metab.list) > 2) {
cat("alarmvalues")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No return -> what happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants