FAIRer rose, clarifying the semantics of data matrices
FAIRer rose, clarifying the semantics of data matrices.
A FAIRification project: Rose scent metabolite profiles - Nature Genetics, June 2018 and Science, July 2015
The FAIRification process relies on a principled approach rooted in the notion of design of experiments and relies on the Statistics Ontology - STATO for specifying the relevant semantics of the data matrices by identifying independent and dependent variables, as well as quantitation types (sample mean and standards error) held in the original documents.
- A first set of data was extracted from a supplementary material table and published alongside the Nature Genetics manuscript, published in June 2018. This dataset is used to demonstrate how to make data Findable, Accessible, Discoverable and Interoperable, in short FAIR and how Frictionless Tabular Data Package representations can be easily mobilised for re-analysis and data science.
- A second set of data was extracted from another GC-MS profile produced by the same team in a Science manuscript, published in July 2015. The data were originally made available in the form of pdf tables as supplementary documents.
Both datasets were used to showcase how data can be compared efficiently once the data matrices have been made FAIR.
This Data Science project is available from github at:https://github.com/proccaserra/rose2018ng-notebook with all necessary information, code and Jupyter notebooks, according to a CookieCutter Data Science template.
This release is related to following documents:
- Original Excel Table:
Available as supplementary material and now made available via Zenodo.
- Frictionless Tabular Data Package:
Resulting from the transformation of the excel document to a Tabular Data Package, available via Zenodo.
- RDF Linked Data graph:
Resulting from the conversion of Frictionless Tabular Data Package to a semantic model using (OBOfoundry resources)[http://www.obofoundry.org/] such as STATO, ChEBI, Plant Ontology as well as NCBI Organismal Taxonomy, available via Zenodo.
- Dataset comparison as Frictionless Data Package:
Metabolites measured in two distinct experiments published in Science,2015 and Nature Genetics, 2018 and made available as a Tabular Data Package, available via Zenodo.