Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do_findChromPeaks_centWave returns wrong maxo and int* values #694

Open
Pascallio opened this issue Oct 14, 2023 · 3 comments
Open

do_findChromPeaks_centWave returns wrong maxo and int* values #694

Pascallio opened this issue Oct 14, 2023 · 3 comments

Comments

@Pascallio
Copy link

Pascallio commented Oct 14, 2023

Hi, I've noticed that do_findChromPeaks_centWave returns an incorrect maxo value and likely incorrect into and intb values. According to the documentation, maxo represents the maximum value of the peak, but in the following example it returns almost double that value (120881.1 versus 61141.14).

# Load library
library(xcms)

# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))

# Select a chromatogram
chrom <- data@.Data[[5]]

# Set the variables needed for findChromPeaks
intensities <- chrom@intensity  # Chromatogram intensities
mzs <- rep(1, length(intensities)) # Fixed MZ to eliminate ppm issues
rts <- chrom@rtime * 60 # Transform minutes to seconds
valsPerSpect <- table(rts) # 1 data point per rtime

# Run findChromPeaks with default values, should not matter for this example
xc <- as.data.frame(do_findChromPeaks_centWave(
  mz = mzs,
  int = intensities,
  scantime = rts,
  valsPerSpect = valsPerSpect
))

assertthat::are_equal(xc$maxo, max(intensities)) # FALSE
xc$maxo # 120888.1
max(intensities) # 61141.14

The chromatogram in question:
image

I've tried to narrow the issue down. The maxo variable is defined here:

peaks[p, "maxo"] <- max(pd)

and into and intb are defined here:

peaks[p, "into"] <- pwid * sum(pd)
db <- pd - baseline
peaks[p, "intb"] <- pwid * sum(db[db > 0])

Since both are depended on the pd variable (which is a subset of the d variable), this issue probably affects both the maxo and int* values. The d variable represents the intensities of the found EIC:

eic <- .Call("getEIC", mz, int, scanindex, as.double(mzrange),
as.integer(sr), as.integer(length(scanindex)),
PACKAGE = "xcms")
## eic <- rawEIC(object,mzrange=mzrange,scanrange=sr)
d <- eic$intensity

I've confirmed that the issue occurs in the getEIC function, but I am unable to narrow the issue further due to my lack of knowledge on C.

I've tried version 3.22 from Bioconductor and version 3.99.5 from GitHub and the issue exists in both versions.

Thanks,
Pascal

@Pascallio
Copy link
Author

I dove a little bit deeper into the code and it's likely an indexing issue. I've extracted the values that go into getEIC in order to extract the result from the function. Using the variables from the previous code gives us the following plot (black = original chromatogram, red = EIC chromatogram):

scanindex <- 1:length(rts) - 1
mzrange <- c(1, 1)
sr <- c(1, length(rts))

eic <- .Call("getEIC", mzs, intensities, as.integer(scanindex), as.double(mzrange),
             as.integer(sr), as.integer(length(scanindex)),
             PACKAGE = "xcms"
)

plot(intensities, type = "l", ylim = c(0, 2e5))
lines(eic$intensity, col = "red")

image

It seems that the getEIC function adds the previous scan intensity to the current scan intensity. A one-liner using dplyr confirms this:

points(intensities + dplyr::lag(intensities, n = 1), col = "green")

image

I haven't found the solution yet, but it's either in the loop in getScanEIC, or in the lowerBound or upperBound functions:

xcms/src/mzROI.c

Lines 440 to 447 in db80f2c

int idx1b = lowerBound(from, pmz, idx1-1, idx2-idx1);
int idx2b = upperBound(to, pmz, idx1b, idx2-idx1b);
for (idx=idx1b;idx <= idx2b; idx++)
{
double mzval = pmz[idx-1];
if ((mzval <= to) && (mzval >= from)) sum += pintensity[idx-1];
}

@sneumann
Copy link
Owner

Hi, thanks for the report, we should turn this into a unit test once we've figured exactly what's going on.
Yours, Steffeb

@Pascallio
Copy link
Author

Thanks Steffen,

Interestingly, this issue does not occur in the S4 method findChromPeaks. It specifically occurs in the do_* version of the method.

# Load library
library(xcms)

# Select MRM mzML from msdata package
data <- readSRMData(system.file("proteomics", "MRM-standmix-5.mzML.gz", package = "msdata"))

# Use 0.1 - 1 due to RT being in minutes
data <- findChromPeaks(data[5, ], CentWaveParam(peakwidth = c(0.1, 1)))
chromPeaks(data)

Returns the top peak with the correct maxo of 61141.1367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants