Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map all annotations to BiGG #20

Open
draeger opened this issue Mar 1, 2017 · 7 comments
Open

Map all annotations to BiGG #20

draeger opened this issue Mar 1, 2017 · 7 comments
Assignees
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects

Comments

@draeger
Copy link
Member

draeger commented Mar 1, 2017

Suggested enhancement by @tpfau: Look over all present annotations and map every annotation that can be mapped to BiGG. For instance, if there is a KEGG compound annotation that compound will be assigned its corresponding BiGG id along with all other annotations available in BiGG. Especially since that annotation data is already present in the BiGG Models Database, this would make ModelPolisher much more useful.

As long as ModelPolisher only relies on BiGG ids as an input this will always require manual matching of the original id used to BiGG ids or assume that the model originally used BiGG ids. It would be much better to make it database dependent.

@mephenor mephenor added this to Close open issues in Release 2.1 Nov 7, 2019
@mephenor
Copy link
Collaborator

While this has been implemented during GSoC19, proper testing of the feature has not taken place yet.
As discussed some models containing annotations from BioModels could be used for initial manual testing and converted into test cases later on, after validating that 1) additional annotations are obtained and 2) those annotations are in fact accurate.

@mephenor mephenor moved this from Close open issues to Backlog in Release 2.1 Jan 31, 2020
@mephenor mephenor moved this from Backlog to Started in Release 2.1 Jan 31, 2020
@mephenor mephenor moved this from Started to Mostly finished in Release 2.1 Jan 31, 2020
@mephenor
Copy link
Collaborator

Finding a good BioModels subset is a task in itself, so this should likely be done differently.
Polishing one model with BiGGIds twice, once with the correct id and once with a scrambled variant should be a valid test for this functionality.
Setting up a database for this testing procedure is currently the problem here, as discussed.
This will be done after the beta release.

@Schmoho Schmoho closed this as completed May 3, 2022
@Schmoho Schmoho reopened this May 3, 2022
Release 2.1 automation moved this from Mostly finished to Backlog May 3, 2022
@Schmoho Schmoho moved this from Backlog to Mostly finished in Release 2.1 May 3, 2022
@Schmoho Schmoho moved this from Mostly finished to Started in Release 2.1 May 10, 2022
@Schmoho Schmoho added feature Issues that aim to introduce new feature in ModelPolisher. and removed enhancement labels May 10, 2022
@Schmoho Schmoho moved this from In Progress to Todo in Release 2.1 Jul 26, 2022
Schmoho added a commit that referenced this issue Aug 2, 2022
Schmoho added a commit that referenced this issue Aug 2, 2022
@Schmoho
Copy link
Collaborator

Schmoho commented Aug 2, 2022

For species this seems to work as expected:

@Test
public void unknownMetaboliteCanBeInferredFromCV() {
var m = new Model(3, 2);
var s = m.createSpecies("big_chungus");
var cvTerm = new CVTerm();
cvTerm.setQualifier(CVTerm.Qualifier.BQB_IS);
cvTerm.addResource("http://identifiers.org/reactome.compound/113592");
s.addCVTerm(cvTerm);
var annotator = new SpeciesAnnotation(s);
annotator.annotate();
assertEquals("big_chungus", s.getId());
assertEquals("ATP C10H12N5O13P3", s.getName());
assertEquals("SBO:0000240", s.getSBOTermID());
assertEquals(1, s.getCVTermCount());
assertEquals(30, s.getCVTerm(0).getNumResources());
assertCVTermIsPresent(s,
CVTerm.Type.BIOLOGICAL_QUALIFIER,
CVTerm.Qualifier.BQB_IS,
"http://identifiers.org/reactome.compound/113592");
assertCVTermsArePresent(s,
CVTerm.Type.BIOLOGICAL_QUALIFIER,
CVTerm.Qualifier.BQB_IS,
expectedATPAnnotations,
"Expected uris are not present.");
}

Schmoho added a commit that referenced this issue Aug 2, 2022
@Schmoho
Copy link
Collaborator

Schmoho commented Aug 2, 2022

For reactions it also kind of works like expected, however there is an issue with foreign IDs that map to more than one BiGG-ID: those are discarded.

@Test
public void getBiGGIdFromResourcesTest() {
initParameters();
var m = new Model("iJO1366", 3, 2);
var r1 = m.createReaction("some_name");
var r2 = m.createReaction("some_other_name");
var r3 = m.createReaction("some_third_name");
r1.addCVTerm(new CVTerm(
CVTerm.Type.BIOLOGICAL_QUALIFIER,
CVTerm.Qualifier.BQB_IS,
"http://identifiers.org/biocyc/META:ACETATEKIN-RXN"));
r2.addCVTerm(new CVTerm(
CVTerm.Type.BIOLOGICAL_QUALIFIER,
CVTerm.Qualifier.BQB_IS,
"http://identifiers.org/metanetx.reaction/MNXR103371"));
r3.addCVTerm(new CVTerm(
CVTerm.Type.BIOLOGICAL_QUALIFIER,
CVTerm.Qualifier.BQB_IS,
"http://identifiers.org/kegg.reaction/R00299"));
var gPlugin = (GroupsModelPlugin) m.getPlugin(GroupsConstants.shortLabel);
assertEquals(0, gPlugin.getGroupCount());
new ReactionAnnotation(r1).annotate();
new ReactionAnnotation(r2).annotate();
new ReactionAnnotation(r3).annotate();
var r1FbcPlugin = (FBCReactionPlugin) r1.getPlugin(FBCConstants.shortLabel);
var gpa1 = r1FbcPlugin.getGeneProductAssociation();
assertNull(gpa1);
assertEquals(false, r1.isSetCompartment());
assertEquals("", r1.getName());
assertEquals(1, r1.getCVTermCount());
assertEquals(1, r1.getCVTerm(0).getNumResources());
assertEquals(1, r2.getCVTermCount());
assertEquals(1, r2.getCVTerm(0).getNumResources());
var r3FbcPlugin = (FBCReactionPlugin) r3.getPlugin(FBCConstants.shortLabel);
var gpa3 = r3FbcPlugin.getGeneProductAssociation();
assertNotNull(gpa3);
assertEquals("G_b2388", ((GeneProductRef) gpa3.getAssociation()).getGeneProduct());
assertEquals(false, r1.isSetCompartment());
assertEquals("", r1.getName());
assertEquals(1, r3.getCVTermCount());
assertEquals(11, r3.getCVTerm(0).getNumResources());
assertEquals(1, gPlugin.getGroupCount());
assertEquals("glycolysis/gluconeogenesis", gPlugin.getGroup(0).getName());
assertEquals(Set.of("some_third_name"), gPlugin.getGroup(0)
.getListOfMembers().stream().map(Member::getIdRef).collect(Collectors.toSet()));
assertFalse(r3.isSetListOfReactants());
assertFalse(r3.isSetListOfProducts());
}

Schmoho added a commit that referenced this issue Aug 2, 2022
@Schmoho
Copy link
Collaborator

Schmoho commented Aug 2, 2022

Running

select distinct r.bigg_id as reaction_bigg_id, c.bigg_id as compartment_bigg_id, c.name as compartment_name
from reaction_matrix rm, compartmentalized_component cc, compartment c, reaction r
where rm.reaction_id in (select ome_id
                      from synonym
                      where synonym ilike '%ACETATEKIN-RXN%')
           and rm.compartmentalized_component_id = cc.id
           and cc.compartment_id = c.id
           and rm.reaction_id = r.id;

yields

"reaction_bigg_id"	"compartment_bigg_id"	"compartment_name"
"ACKr"	                "c"	"cytosol"
"ACKrh"	                "h"	 "chloroplast"
"ACKrm"	                "m"	  "mitochondria"

@Schmoho
Copy link
Collaborator

Schmoho commented Aug 2, 2022

The offending code is here:

results = results.stream().filter(biggId -> biggId != null && !biggId.isEmpty()).collect(Collectors.toSet());
if (results.size() == 1) {
return Optional.of(results.iterator().next());
} else {
return Optional.empty();
}

Unfortunately this is somewhat deep in the stack and embedded in creative attempts at code deduplication.

getBiggIdFromParts:329, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
lambda$getBiGGIdFromResources$1:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
apply:-1, 28318221 (edu.ucsd.sbrg.bigg.annotation.BiGGAnnotation$$Lambda$607)
flatMap:294, Optional (java.util)
getBiGGIdFromResources:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
checkId:91, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
annotate:58, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
getBiGGIdFromResourcesTest:50, ReactionAnnotationTest (edu.ucsd.sbrg.bigg.annotation)

Schmoho added a commit that referenced this issue Aug 2, 2022
Schmoho added a commit that referenced this issue Aug 2, 2022
@Schmoho
Copy link
Collaborator

Schmoho commented Aug 2, 2022

Last commit introduced a change to the reaction annotations.
We now consider all potential reaction hits from foreign IDs and filter on matching compartment.
I.e. even if a foreign ID (e.g. a kegg ID) is associated with multiple BiGG-IDs, we only discard those that don't match the compartment of the reaction.
On the flip side, this will no longer annotate in case there is only a single hit but no matching compartment.

var query = "SELECT R.BIGG_ID AS REACTION_BIGG_ID, "
+ "C.BIGG_ID AS COMPARTMENT_BIGG_ID, "
+ "C.NAME AS COMPARTMENT_NAME "
+ "FROM REACTION R "
+ "left join REACTION_MATRIX RM "
+ "on RM.REACTION_ID = R.ID "
+ "left join COMPARTMENTALIZED_COMPONENT CC "
+ "on RM.COMPARTMENTALIZED_COMPONENT_ID = CC.ID "
+ "left join COMPARTMENT C "
+ "on CC.COMPARTMENT_ID = C.ID "
+ "join synonym s "
+ "on synonym = ? and r.id = s.ome_id "
+ "join data_source d "
+ "on s.data_source_id = d.id and d.bigg_id = ?";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that aim to introduce new feature in ModelPolisher.
Projects
Release 2.1
  
Todo
Development

No branches or pull requests

3 participants