Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly serialize an xcms result object to disk/files #693

Open
jorainer opened this issue Oct 12, 2023 · 21 comments
Open

Properly serialize an xcms result object to disk/files #693

jorainer opened this issue Oct 12, 2023 · 21 comments
Assignees

Comments

@jorainer
Copy link
Collaborator

Instead of exporting/saving an xcms object in RData format it would be good to also support import/export from/to files in textual format.

Idea:

  • re-create an xcms result object from plain text files.
  • export the xcms result to text files.

Why? mostly for Galaxy-based workflows, to enable usage of the result objects across tools.

@jorainer
Copy link
Collaborator Author

@hechth

@sneumann
Copy link
Owner

How much of that could be handled by mzTab-M ? The SMF Feature table nicely corresponds to the (grouped) xcms list. Ungrouped data is technically also kinda possible and requires lots of repetition. Metadata works as well. Bonus: we'd be able to read results from e.g. MS-Dial for further analysis.

Alternatively there are some functions by @PayamEmami in the MetaboIgniter workflow
https://github.com/nf-core/metaboigniter/blob/master/bin/consensusXMLToXcms.r
https://github.com/nf-core/metaboigniter/blob/master/bin/featureXMLToXcms.r

An earlier version was even implemented in Galaxy to integrate OpenMS and R based tools.

Main question would be if true round-tripping (export xcms object1 and re-import to object2 guaranteeing object1==object2) is a requirement.

Yours, Steffen

@jorainer
Copy link
Collaborator Author

Agree that it would be great to use standard file formats as much as possible! maybe we could export the feature data as a mzTab-M and the rest as other files (all in one folder). I would like to be able to export a full xcms result object to completely restore it again later (including process history, adjusted retention times, etc).

@jorainer
Copy link
Collaborator Author

@sneumann , do you have some definition somewhere how the mzTab-M format should actually look like?

@sneumann
Copy link
Owner

Hi, the official docs are in
https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html
and @nilshoffmann wrote an R package to read/write mzTab-M at https://github.com/lifs-tools/rmzTab-m/
I need to find some snippets I have to export the old xcms stuff which could become a starting point,
and we'd need to figure how to do the dependencies. Does that sound like an MsBackendMzTabM ?
Yours,
Steffen

@jorainer
Copy link
Collaborator Author

Had a quick look at the rmzTab-m - but am not particularly happy with the R6 objects. Would want to avoid dependency of xcms on such objects as much as possible.

@sneumann
Copy link
Owner

sneumann commented Oct 18, 2023

+1 for not having all that as dependency.
Still, there could be an XcmsExperiment-BackendMzTabM in the mid-term,
and in the short-term we have some code snippets that demonstrate import / export of (r)mzTab-M
as started in https://gist.github.com/sneumann/0f0d22027eda4db8ab28175de06b77f2
Yours, Steffen

@sneumann
Copy link
Owner

+1 for not adding these dependencies.
That's why I suggested a kind of XcmsExprtiment-BackendMzTabM in the mid-term.
rmzTab-M was initially generated through code generation from an OpenAPI specification. Maybe someone someday writes a code generator without using R6.
Until then we can work on snippets to demonstrate the export, such as this one:
https://gist.github.com/sneumann/0f0d22027eda4db8ab28175de06b77f2
Yours, Steffen

@hechth
Copy link
Contributor

hechth commented Oct 21, 2023

Hi - I agree that it would not need to be in the XCMS package to not further increase dependencies - also I think the XCMS object is quite stable, so not a lot of API breaking changes expected I think.

I agree that exporting to mzTab-m and importing would also be an option - then the question is, how well does mzTab-M work for peaks? I thought always it was more designed to capture the final output with compound concentrations etc.?

@jorainer
Copy link
Collaborator Author

I would then suggest to maybe define a method

  • storeResults (or exportResults): that takes a xcms result object and saves that in some format.
  • restoreResults: that allows to restore the result object from the exported file(s).

implementations of these methods could then be defined for different param objects:

  • RDataParam: simply export in RData format (same as save/load).
  • PlainTextParam: export data as files in (custom) plain text format.
  • MzTabMParam: export data in mzTab-M format.

The first two could be implemented in xcms as they don't add any dependencies. The last one could be implemented in a separate package.

While it's not exactly what we have with the backend concept in Spectra it is still similar. Would that be a workable solution @sneumann ? With that we would add proper support for mzTab-M and in addition would support @hechth's request to export/import to/from text files which would make it easier for Galaxy integration (if I understood it correctly).

@nilshoffmann
Copy link

Hi - I agree that it would not need to be in the XCMS package to not further increase dependencies - also I think the XCMS object is quite stable, so not a lot of API breaking changes expected I think.

I agree that exporting to mzTab-m and importing would also be an option - then the question is, how well does mzTab-M work for peaks? I thought always it was more designed to capture the final output with compound concentrations etc.?

mzTab-M, in contrast to mzTab 1.0, has the feature table, which allows you to report start and end time of aligned "m/z features", area etc. The output of final compound concentrations (or whatever quantity you define to report) is reported in the summary table. The evidence table further allows you to report identification evidence for your features.
Also, anything that is missing in a particular table can be added as an optional column.

@nilshoffmann
Copy link

+1 for not adding these dependencies. That's why I suggested a kind of XcmsExprtiment-BackendMzTabM in the mid-term. rmzTab-M was initially generated through code generation from an OpenAPI specification. Maybe someone someday writes a code generator without using R6. Until then we can work on snippets to demonstrate the export, such as this one: https://gist.github.com/sneumann/0f0d22027eda4db8ab28175de06b77f2 Yours, Steffen

Additionally, there are low level IO functions available in the package already for reading:
https://github.com/lifs-tools/rmzTab-m/blob/master/R/read_mz_tab.R

The writing part is currently tied to R6, as Steffen said, due to the fact that I autogenerated the models and REST API implementation for the online validator webservice based on the OpenApi specification.

https://github.com/lifs-tools/rmzTab-m/blob/master/R/write_mz_tab.R

However under the hood, most objects are converted to and from tables in some sort of way, also using jsonlite a lot. So mapping this or reimplementing it with another object model, @jorainer which one would you prefer?

@nilshoffmann
Copy link

Each object already has toDataFrame and fromDataFrame methods available, see the Instrument class as an example:

https://github.com/lifs-tools/rmzTab-m/blob/master/R/instrument.R#L173
https://github.com/lifs-tools/rmzTab-m/blob/master/R/instrument.R#L218

@nilshoffmann
Copy link

So converting this to S4 reference classes, or what alternative class type would you recommend, should be doable?

@jorainer
Copy link
Collaborator Author

S4 would be ideal - we could however also use the R6 implementation and hide all the R6 stuff from the user through the functions (e.g. the storeResults method) above, so that should also be doable.

@jorainer
Copy link
Collaborator Author

Maybe @philouail can have a look into this. I'm assigning the issue to her.

@philouail philouail self-assigned this Oct 25, 2023
@hechth
Copy link
Contributor

hechth commented Oct 25, 2023

That sounds great, thanks for getting started on this! Feel free to reach out if you have any more questions or you need some example data for the plain text export!

@hechth
Copy link
Contributor

hechth commented Oct 25, 2023

@jorainer do you think it would also be possible to create an import from mzTab?

@hechth
Copy link
Contributor

hechth commented Oct 25, 2023

also xref lifs-tools/rmzTab-m#3

@jorainer
Copy link
Collaborator Author

yes, sure @hechth , we will also work on an mzTab-M importer - but let's first get started with the "simpler" text export/imports.

@hechth
Copy link
Contributor

hechth commented Oct 25, 2023

Cool - let me know if I can help on this. The mzTab-M importer would be great to import results from OpenMS peak picking which exports as mzTab - so then we could build an XCMS object and then for example run it through RAMClustR etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants