Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update retention times to alkane retention time indices for GC-MS #731

Open
CLUES-Emory opened this issue Mar 12, 2024 · 5 comments
Open

Comments

@CLUES-Emory
Copy link

Hello!
We are running large studies (>1000 samples) on GC-HRMS. Peak detection has been working beautifully. Over the course of the run, we have drifts in retention times due to column aging and periodic column changes. To help account for this effect, we run alkane mixes within each batch (96 samples). These are used to both monitor retention time changes, and calculate retention time indices for comparison to spectral databases. To improve alignment of our data, I would like to loop each batch and replace the raw retention times with retention time indices calculated from the alkane mix in that batch. Once updated, we will extract and align peaks using the traditional XCMS workflow.

My question is what is the most appropriate way to do this? I could think of a few ways, with varying efficiency.

  1. Read in files for each batch using readMSData(). Replace raw retention time with rtime(raw_data)<- Calc_RI. Write to new mzML file using writeMSData(). Obviously this would be very slow and inefficient.

  2. Read in files for all samples readMSData(). Replace raw retention time OR adjusted retention time with rtime(raw_data)<- Calc_RI seperately for each batch. Continue with data extraction. Adjusted retention time would be preferred so the original retention time could be retained

  3. Read in files for all samples readMSData(). Detect peaks using findChromPeaks. Replace peak retention times with Calc_RI. Continue with data extraction. However, I could not get this to work, since it would not let me assign adjusted retention times manually.

Any ideas on how to best do this? One concern I had is how functions that call back to the mzML file (such as fillChromPeaks()) would handle the very different retention time indices. Another issue I'm running into is it won't let me subset XCMSnExp by samples using the [] command. Based on the documentation, I thought this was possible.

Thanks!

SessionInfo() is below.

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] Spectra_1.12.0 microbenchmark_1.4.10 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 xcms_4.0.2
[7] MSnbase_2.28.1 ProtGenerics_1.34.0 S4Vectors_0.40.2 mzR_2.36.0 Rcpp_1.0.12 Biobase_2.62.0
[13] BiocGenerics_0.48.1 BiocParallel_1.36.0

loaded via a namespace (and not attached):
[1] bitops_1.0-7 rlang_1.1.3 magrittr_2.0.3 clue_0.3-65 MassSpecWavelet_1.68.0
[6] matrixStats_1.2.0 compiler_4.3.3 vctrs_0.6.5 pkgconfig_2.0.3 MetaboCoreUtils_1.10.0
[11] crayon_1.5.2 XVector_0.42.0 utf8_1.2.4 preprocessCore_1.64.0 MultiAssayExperiment_1.28.0
[16] zlibbioc_1.48.0 GenomeInfoDb_1.38.7 progress_1.2.3 DelayedArray_0.28.0 prettyunits_1.2.0
[21] cluster_2.1.6 R6_2.5.1 RColorBrewer_1.1-3 limma_3.58.1 GenomicRanges_1.54.1
[26] SummarizedExperiment_1.32.0 snow_0.4-4 IRanges_2.36.0 Matrix_1.6-5 splines_4.3.3
[31] igraph_2.0.2 tidyselect_1.2.0 abind_1.4-5 codetools_0.2-19 affy_1.80.0
[36] lattice_0.22-5 tibble_3.2.1 plyr_1.8.9 survival_3.5-8 pillar_1.9.0
[41] affyio_1.72.0 BiocManager_1.30.22 MatrixGenerics_1.14.0 MALDIquant_1.22.2 ncdf4_1.22
[46] generics_0.1.3 RCurl_1.98-1.14 hms_1.1.3 ggplot2_3.5.0 munsell_0.5.0
[51] scales_1.3.0 MsExperiment_1.4.0 glue_1.7.0 MsFeatures_1.10.0 lazyeval_0.2.2
[56] tools_4.3.3 mzID_1.40.0 robustbase_0.99-2 QFeatures_1.12.0 vsn_3.70.0
[61] RANN_2.6.1 fs_1.6.3 XML_3.99-0.16.1 grid_4.3.3 impute_1.76.0
[66] MsCoreUtils_1.14.1 colorspace_2.1-0 GenomeInfoDbData_1.2.11 cli_3.6.2 fansi_1.0.6
[71] S4Arrays_1.2.1 dplyr_1.1.4 AnnotationFilter_1.26.0 pcaMethods_1.94.0 gtable_0.3.4
[76] DEoptimR_1.1-3 digest_0.6.34 SparseArray_1.2.4 multtest_2.58.0 lifecycle_1.0.4
[81] statmod_1.5.0 MASS_7.3-60.0.1

@CLUES-Emory
Copy link
Author

I was able to implement the second option; however, am now receiving the following error when I try to do my peak picking:

Error: BiocParallel errors
4 remote errors, element index: 1, 6, 11, 16
16 unevaluated and other errors
first remote error:
Error in findChromPeaks_Spectrum_list(x = spectra(object, BPPARAM = SerialParam()), : Spectra are not ordered by retention time!

@sneumann
Copy link
Owner

Hi, the function expects strictly ordered RT for spectra. Could it be that you output integers as RT, and that two consecutive spectra have the same value ? Can you crank up the decimals, even if they were useless for a human observer ? Yours, Steffen

@sneumann
Copy link
Owner

Hi, I just realised that possibly the PR #732 by @philouail might solve a similar use case to yours ? Yours, Steffen

@CLUES-Emory
Copy link
Author

Thanks Steffen! PR #732 is a potential approach that could work, although I'm not sure if our alkane mixes were run frequently enough to be used for alignment. I've actually gotten my code to work after increasing the number of decimals, and it seems to be greatly improving our alignment (less missing values) so far. Thanks for all your help.

@jorainer
Copy link
Collaborator

Note: we are currently adding the function to force retention times to be sorted to MsCoreUtils: rformassspectrometry/MsCoreUtils#123 - so, in future you could use that official force_sorted() function directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants