Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotating models with the genome identifier #96

Open
draeger opened this issue Apr 2, 2020 · 3 comments
Open

Annotating models with the genome identifier #96

draeger opened this issue Apr 2, 2020 · 3 comments
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects

Comments

@draeger
Copy link
Member

draeger commented Apr 2, 2020

@Midnighter requests at SBRG/bigg_models#368:

Many models in BiGG are currently annotated with a taxonomic identifier and a reference to the model itself, for example, as shown below.

    <annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqmodel="http://biomodels.net/model-qualifiers/" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
        <rdf:Description rdf:about="#iML1515">
          <bqbiol:hasTaxon>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/taxonomy/511145" />
            </rdf:Bag>
          </bqbiol:hasTaxon>
          <bqmodel:is>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/bigg.model/iML1515" />
            </rdf:Bag>
          </bqmodel:is>
        </rdf:Description>
      </rdf:RDF>
    </annotation>

On the website, BiGG also provides a link to the genome sequence that was used to create the model, see, for example, http://bigg.ucsd.edu/models/iML1515.

Where possible, it would be great to also create MIRIAM compliant annotations of the genome on the model using the identifier from the genome database or RefSeq namespaces as defined at Identifiers.org.

Is this a task for ModelPolisher?

@mephenor
Copy link
Collaborator

There is a ncbi_assembly id column in the genome table of BiGG DB, however, it appears to be empty.

Additionally there are accession_type and accession_value columns, where accession_type is currently one of ncbi_accession or ncbi_assembly.
BiGG resolves the Genome link to a list of models and chromosomes, where the ncbi_accessions can be directly resolved as ids appended to https://www.ncbi.nlm.nih.gov/nuccore/. The ncbi_assembly entries are resolved to a list of chromosomes, however I don't know how this is done exactly.

From what I've gathered from BiGG, neither these accessions nor the taxon ids appear in any other place, so retrieving RefSeq annotations would likely require to fetch the corresponding entry from GenBank.

@mephenor
Copy link
Collaborator

mephenor commented May 13, 2020

Had another look at the data BiGG provides and this is actually easy to do, albeit with some issues regarding the MIRIAM compliance. Must have been half asleep when looking at the issue last time...

All accession starting with NC_ or NZ_ can be converted to MIRIAM compliant URIs.
All GCF_ entries should fit the genome assembly database pattern, there just seems to be a problem regarding resolution.
If used as id in https://identifiers.org/insdc.gca:{$id}, this is resolved to https://www.ebi.ac.uk/ena/data/view/{$id}, where no entry is available for the id.
Using the ncbi resource, however, the id can be resolved correctly, so for now we could create a non MIRIAM annotation this way.
it might be worth to inquire about that issue, as it contradicts my understanding of how the resolution process works, if the given resources have different resolution capabilities.
All other accesions could be added as non MIRIAM annotation the way done on the BiGG Models website, i.e. https://www.ncbi.nlm.nih.gov/nuccore/{$id}.

Do we want to add just the MIRIAM compliant annotations or all of them?

Edit: Just realized we have a INCLUDE_ANY_URI flag we could use here.
What is the appropriate qualifier for these annotations, BQB_IS_VERSION_OF?

@mephenor
Copy link
Collaborator

Implemented as described above in 2.1. branch.
Leaving open to discuss the correct qualifier.

@mephenor mephenor added this to Backlog in Release 2.1 via automation May 13, 2020
@mephenor mephenor added the feature Issues that aim to introduce new feature in ModelPolisher. label May 13, 2020
@mephenor mephenor self-assigned this May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects
Release 2.1
  
Todo
Development

No branches or pull requests

2 participants