Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ways to extract all MS2 peaks? #737

Open
jzhou19 opened this issue Apr 5, 2024 · 3 comments
Open

Ways to extract all MS2 peaks? #737

jzhou19 opened this issue Apr 5, 2024 · 3 comments

Comments

@jzhou19
Copy link

jzhou19 commented Apr 5, 2024

Hello,

I had QE data which I wanted to extract MS2 information from. I tried to do so by featureSpectra(xdata) where xdata was an XcmsExperiment object I got by going through LC-MS preprocessing steps including peak detection, alignment, and correspondence. However, the returned object from featureSpectra contained columns like "basePeakMZ", "lowMZ', and "highMZ". Are there any ways to extract all MS2 peaks instead of only having the base peak information?

Thank you!

@jorainer
Copy link
Collaborator

jorainer commented Apr 8, 2024

The object returned by the featureSpectra should be a Spectra - and if that's the case, you can extract the individual peaks data with the peaksData() function, or also using the mz() and intensity() functions.

@jzhou19
Copy link
Author

jzhou19 commented Apr 10, 2024

Hi Johannes,

Thank you for your response. I should've stated this more clearly. Yes, the object returned by featureSpectra is a Spectra, but I extracted the information and converted it to a data frame by as.data.frame(ms2_spectra@backend@spectraData@listData), that's why I said columns which might be confusing.

I tried peaksData on the Spectra object and got all peaks information in a SimpleList object. However, I noticed some discrepancies between the data frame I created and the SimpleList. I attached two screenshots below. For example, in the data frame, for the first spectrum, I have 80 peaks with a base peak intensity of 11350.406, low mz of 70.29, and high mz of 545.44. When I inspected the peaks data, the lowest mz was 78.34 and the highest was 530.06, the most abundant peak is the same but the intensity is slightly different - 10921.4326. Could you explain why I am observing these differences?

df_firstrow_JZ

peaksdata_JZ

Thanks a lot!

@jorainer
Copy link
Collaborator

Firstly please use the dedicated functions to extract data from a Spectra and don't access slots (@) directly! To explain: Spectra can use different backends (MsBackend classes) to keep the data - each one will store the data in a different way, so the code you used will only work for one type of backend. And, more importantly, Spectra uses a lazy processing queue for many data manipulation operations, which means that the original data (m/z and intensity) don't get modified. The data modification gets applied once you access the data (using mz, intensity, spectraData or peaksData) - the way you accessed the data you will always get the original, unmodified, unfiltered data.

For your question: the information on "totIonCurrent", "basePeakMZ", "basePeakIntensity" are all spectra variables that are extracted from the original data file (the mzML file). This is the information what is provided there as the header info for each spectrum and is usually information put there by the MS manufacturers software. Even if you convert from the raw data files to an mzML (e.g. using proteowizard) this data does usually does not get changed/modified. Also, if the data/file was in any form processed (e.g. centroiding, filtering etc) you will start seeing differences here. So, the spectra variables always represent the data what is provided by the original data files. Same for the m/z and intensity values, unless you did any data processing/filtering within Spectra/xcms.

Now, some maybe useful lines of code, if you want to extract peaks data and work with a single data.frame:

pd <- peaksData(s)
s_index <- rep(seq_along(s), vapply(pd, nrow, integer(1)))
pd_df <- data.frame(spectrum_index = s_index, do.call(rbind, pd))

that way you'll have one (veeery long) data.frame with the m/z and intensity values and one additional column that allows you to know from which spectrum the individual peaks are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants