Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findChromPeaks finds duplicate peaks #695

Open
Pascallio opened this issue Oct 16, 2023 · 1 comment
Open

findChromPeaks finds duplicate peaks #695

Pascallio opened this issue Oct 16, 2023 · 1 comment

Comments

@Pascallio
Copy link

Pascallio commented Oct 16, 2023

Hi,

I've been following the tutorial on BioConductor: link, but after peak picking, I've noticed that findChromPeaks returns duplicate peaks:

library(xcms)

## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
            recursive = TRUE)[c(1, 2, 5, 6, 7, 8, 11, 12)]

## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
                                   replacement = "", fixed = TRUE),
                 sample_group = c(rep("KO", 4), rep("WT", 4)),
                 stringsAsFactors = FALSE)

# Read the raw data
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")

# Filter for a smaller subset
raw_data <- filterRt(raw_data, c(2500, 3500))

# Set parameters for Peak Picking
cwp <- CentWaveParam(peakwidth = c(20, 80), noise = 5000,
                     prefilter = c(6, 5000))

# Perform peak picking and save results
data <- findChromPeaks(raw_data, param = cwp)

# Retrieve peak data as a data.frame
peaks <- as.data.frame(chromPeaks(data))

# Find unique combinations of mass, retention time and sample number
uniqueComb <- paste(peaks$mz, peaks$rt, peaks$sample)

# Find duplicates
isDuplicated <- duplicated(uniqueComb)

# Get all rows that are duplicated
duplicates <- uniqueComb[isDuplicated]

# Get all peaks that have the unique combination in the duplicates
duplicatePeaks <- peaks[uniqueComb %in% duplicates, ]

# Print the duplicate peaks
duplicatePeaks

Here's a sample of the output:
image

Interestingly, while the m/z, rt and into are equal, the intb is not. I've tested this on the same version as BioConductor: 3.22, but also on the GitHub 3.99.5 version.

Best,
Pascal

@jorainer
Copy link
Collaborator

Hi, yes, that is a known issue of centWave - it happens for some data sets. I did however not figure out where in the code this actually happens. What I usually do here is to run the refineChromPeaks with the MergeNeighboringPeaksParam after peak detection. That removes/fuses duplicated peaks. Maybe also have a look into the xcmsTutorials for a description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants