Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simply saving an OWL/XML (*.owl) ontology with ROBOT 1.9.2 is insufficient to canonicalize it #1090

Open
jclerman opened this issue Feb 11, 2023 · 11 comments

Comments

@jclerman
Copy link
Contributor

jclerman commented Feb 11, 2023

The recommendation in the release notes for robot 1.9.2 suggests to:

save your ontology with ROBOT 1.9.2 or Protégé 5.6.0 without introducing any changes to the logic or annotations, and commit the resulting ontology files

In my experience, that wasn't quite enough - complete canonicalization of my ontology didn't happen without round-tripping through OWL functional format - without doing that, some lines in the XML output were re-ordered when I round-tripped.

What worked for me (other variants might work too; haven't tested):

robot convert -i my-protege-5.5.0-ontology.owl -o my-ontology.ofn
robot convert -i my-ontology.ofn -o my-canonicalized-ontology.owl
@matentzn
Copy link
Contributor

Surprising! What happened when you tried without the ofn intermediary?

@jclerman
Copy link
Contributor Author

Hi @matentzn. When I just did:

robot convert -i my-original-ontology.owl -o my-attempted-canonicalized-ontology.owl

I found that annotation-values were not sorted in the output. After round-tripping through ofn, I got a stable result (including sorting of those values).

Here's a fragment of a diff of the my-attempted-canonicalized-ontology.owl against what I get after round-tripping:

*** 104805,104816 ****
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym"/>
          <owl:annotatedTarget>wing zeugopod skeleton</owl:annotatedTarget>
-         <oboInOwl:hasDbXref>OBOL:automatic</oboInOwl:hasDbXref>
          <oboInOwl:hasDbXref>NCBITaxon:8782</oboInOwl:hasDbXref>
          <oboInOwl:hasSynonymType rdf:resource="http://purl.obolibrary.org/obo/uberon/core#SENSU"/>
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>
--- 104805,104816 ----
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym"/>
          <owl:annotatedTarget>wing zeugopod skeleton</owl:annotatedTarget>
          <oboInOwl:hasDbXref>NCBITaxon:8782</oboInOwl:hasDbXref>
+         <oboInOwl:hasDbXref>OBOL:automatic</oboInOwl:hasDbXref>
          <oboInOwl:hasSynonymType rdf:resource="http://purl.obolibrary.org/obo/uberon/core#SENSU"/>
      </owl:Axiom>
      <owl:Axiom>
          <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010703"/>
          <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>

@matentzn
Copy link
Contributor

Very important to know for us, thank you for taking the time to report this.. Apart from us knowing about this, is there anything you think that should be done here in terms of a fix? It seems we basically have to live with this (short of someone working on the OWX parser in the OWL API itself)

@jclerman
Copy link
Contributor Author

Doesn't seem like there is too much that ROBOT could do (I can imagine internal workarounds, like setting up the ROBOT code to internally do an ofn round-trip when being asked to do a no-op conversion from/to the same format - but not sure that's a good idea).

My only real suggestion would be to perhaps update the 1.9.2. release-notes, to tell people that they might need to do an ofn round-trip to achieve canonicalization - that'd help users avoid getting bitten by this issue.

@jamesaoverton
Copy link
Member

Thanks @jclerman! I added a mention of this issue to the 1.9.2 release notes. Once #1088 and #1089 are resolved and everything is updated, we'll make a bigger push to get everyone to update, and we'll keep this in mind.

@CarMoreno
Copy link

I am experimenting with this behaviour by using the robot template command to generate a .owl file from a .tsv file. Unfortunately, the workaround using .ofn does not work. :(

@jamesaoverton
Copy link
Member

@CarMoreno What doesn't work? I think the suggestion in this thread it to use robot template to create an .ofn file, then then robot convert to .owl (RDF/XML).

@CarMoreno
Copy link

@jamesaoverton That's exactly what I am doing. I generated thedummy.ofn file from the template. And then, I generate the dummy.owl using dummy.ofn created previously:

robot template --template dummy_template.csv --output dummy.ofn
robot convert --input dummy.ofn --output dummy.owl

The axioms keep unsorted.

@allenbaron
Copy link
Contributor

I have been exploring this somewhat with ROBOT 1.9.4 and Protege 5.6.2 (starting files were built with ROBOT 1.8.3 and Protege 5.5.0). Based on my exploration, for any file to reach a stable serialization two convert operations are needed but it doesn't matter what the file is converted to, e.g. to make doid.owl stable, either of the following work and end up with the same result.

robot convert -i doid.owl -o doid1.owl
robot convert -i doid1.owl -o doid.owl

OR

robot convert -i doid.owl -o doid.ofn
robot convert -i doid.ofn -o doid.owl

Stabilizing the doid-edit.owl file, which is actually in OWL functional syntax, also requires two filetype-agnostic converts. Protege has similar behavior. The first edit and save results in sorting by language tag, then alphabetical (same as first convert) and the second edit, if made after closing and re-opening the file, gets the final sort ordering.

For some reason the first convert operation sorts by presence/absence of language tag before sorting strings alphabetically, while the second sorts alphabetically first and language tag second.

Comparison of doid-edit.owl (ROBOT 1.8.3/Protege 5.5.0) with doid-edit.owl after one convert with ROBOT 1.9.4

image

Comparison of first convert of doid-edit.owl with second, both ROBOT 1.9.4

image

ROBOT template tests

I have only tested using ROBOT template to add axioms to an existing file, e.g. robot template -i doid-edit.ofn --template template.tsv --merge-before -o doid-edit.ofn and to me it appears that the added axioms are sorted correctly. As long as the file serialization is already stable nothing needs to be done; if it hasn't, one ROBOT convert is needed.

@matentzn
Copy link
Contributor

matentzn commented Jun 6, 2023

@allenbaron super useful analysis, thank you!

@allenbaron
Copy link
Contributor

Just noting that after stabilizing serialization of an .ofn file, if I run a robot query --update command the ordering of lines in the output file (from .ofn to .ofn, in my case) changes making it similar to running a single robot convert as I described above. I have to run another non-chained robot convert to get the ordering back.

Full command to maintain stable ordering (and prefixes):

robot --add-prefixes build/doid-edit_prefixes.json \
    query -i src/ontology/doid-edit.owl \
    --update ../../DO_dev/sparql/update/DO-def_format_gene.ru \
    -o tmp.ofn && \
robot convert -i tmp.ofn -o tmp2.ofn && \
mv tmp2.ofn src/ontology/doid-edit.owl && \
rm tmp.ofn

Serialization is stable when I run chained reason & annotate going from .ofn to .owl (i.e. another robot convert on the resulting .owl file has no effect).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants