Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting GP identifiers from UniProt to mod ID's #4

Open
dustine32 opened this issue Jul 6, 2018 · 6 comments
Open

Converting GP identifiers from UniProt to mod ID's #4

dustine32 opened this issue Jul 6, 2018 · 6 comments
Assignees

Comments

@dustine32
Copy link

For geneontology/go-site#617

This ticket is to update the gene product identifiers in SynGO models to use the mod-specific identifier. For example:

http://identifiers.org/uniprot/Q9JIR3 -> http://identifiers.org/rgd/628762

The prefixes will be sourced from here:
https://github.com/geneontology/minerva/blob/master/minerva-core/src/main/resources/go_context.jsonld

By having consistent identifiers for GP's we can have cross-model aggregation of gene information, maintain label data, and more easily merge models.

To run this conversion, I have a script that grabs MOD IDs (if present) from UniProt's web service and appends this data to the SynGO annotation JSON that gets consumed by David OS' model code. I can add this code to my syngo2lego fork.

@ftwkoopmans I know we discussed this in the past. Will using non-UniProt ID's cause any issue for SynGO? Tagging @lpalbou @cmungall and @thomaspd for their opinions too.

@cmungall
Copy link
Member

cmungall commented Jul 6, 2018

Just to clarify: no conversion is needed for human at the moment as GO currently treats UniProt as canonical for human. But this may change in future.

@lpalbou
Copy link

lpalbou commented Jul 6, 2018

The URIs used by NEO and GO-CAMs for mouse GPs are for instance of the type http://identifiers.org/mgi/MGI:1926134.

If we do not follow this syntax, we have no access to the information in NEO to describe this gene, and because it is not linked with other models, it's like we would be describing a totally different GP.

We need to have a data model consistency and apply the same rules to all models and annotations. As an example, because of this misalignment between GP URIs (this time between the way we reference GPs in GO-CAMs and those referenced in SYNGO), we have no access to the recommended name of the gene (nor other meta information):

@dustine32
Copy link
Author

@lpalbou In case you're interested, I have the UniProt-to-MOD-ID conversion code here:
https://github.com/dustine32/uniprot_wrapper/blob/master/syngo_uniprot_resolver.py

@lpalbou
Copy link

lpalbou commented Jul 6, 2018

@dustine32 thanks, but this is not a viable solution for live queries. Converting 2k+ (and hopefully soon 20k+) ids on the fly would take too much time for the website. And it wouldn't solve the data model consistency nor would it help when looking for overlaps of annotations and go-cams, and possibly when wanting to merge them in larger go-cams: we should all use the same URIs for same entities.

@dustine32
Copy link
Author

@lpalbou Yep, agreed! This is just the pre-data-loading, data-massaging step for generating the models.

@dustine32
Copy link
Author

@cmungall Just making sure: Is this the right URL to cross-reference prefixes for the SynGO model UniProt ID's?
https://github.com/geneontology/minerva/blob/master/minerva-core/src/main/resources/go_context.jsonld

And to confirm your earlier clarification, I'm not converting the human gene ID's to HGNC and just going with the UniProt ID's provided by SynGO. Only mouse, rat, fly and worm ID's are being converted. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants