Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hgnc family info into MyGene.info #73

Open
kevinxin90 opened this issue Sep 18, 2019 · 8 comments
Open

Add hgnc family info into MyGene.info #73

kevinxin90 opened this issue Sep 18, 2019 · 8 comments

Comments

@kevinxin90
Copy link
Contributor

hgnc contains gene group info:
https://www.genenames.org/data/genegroup/#!/group/567

@newgene newgene moved this from Gene-level data src to Milestone group 1 in BioThings/SmartAPI Translator Service Provider Milestones Nov 15, 2019
@newgene newgene moved this from Milestone group 1 - New KP APIs to Milestone group 1 - Year 2 in BioThings/SmartAPI Translator Service Provider Milestones Dec 14, 2020
@andrewsu
Copy link
Member

This file has the gene-to-family links: http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/csv/genefamily_db_tables/gene_has_family.csv

hgnc_id family_id
11148 3
3960 3
3961 3
3477 1963
4621 1963
4622 1963
9962 1963
16719 1963

This file has the name and metadata for each HGNC family: http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/csv/genefamily_db_tables/family.csv

id abbreviation name external_note pubmed_ids desc_comment desc_label desc_source desc_go typical_gene
1296   TIR domain containing NULL NULL NULL NULL NULL TIRAP
75 ZDBF Zinc fingers DBF-type NULL NULL NULL NULL NULL ZDBF2
302 CLCN Chloride voltage-gated channels NULL NULL NULL NULL NULL CLCN1
228 HCRTR Hypocretin receptors NULL NULL NULL NULL NULL HCRTR1

The combination of these two files should be what we initially add to mygene.info records for each human gene.

@colleenXu
Copy link

It looks like there's already some gene group info in MyGene.info that is shown as node attribute in BTE.

BTE brings in interpro info using this code.

You can see that some gene family info is included when you look at that field in mygene like this, as well as maybe some info that's for specific domains of the protein?: https://mygene.info/v3/query?q=CDK2&fields=interpro.desc%2C%20type_of_gene

@andrewsu andrewsu added this to Intern / volunteer / help wanted in Translator project management (old) Jun 9, 2021
@jal347
Copy link
Contributor

jal347 commented Nov 19, 2021

I made the plugin for the hgnc_family. The main branch contains the manifest and parser. v2 branch contains the advanced plugin. If we use the advanced plugin can someone check if I did the mapping correctly? thanks. https://github.com/jal347/hgnc_family

@jal347
Copy link
Contributor

jal347 commented Nov 22, 2021

This is a quick summary of the current hgnc mapping. The total number of hgnc_id data points is 29872. The total number of unique hgnc_ids is 24952. Out of the 24952 hgnc_ids 24895 were mapped while 57 could not be queried in mygene.info. The number of 1-1 hgnc_id to family_id is 21100 and 1-n mapping is 3852. 1-7 is the max hgnc_id to family_id mapping. An example is shown below and more detailed information of the 1-n mappings.

{
    "_id": "6624",
    "hgnc_genegroup": [
        {
            "id": "3",
            "abbr": "FSCN",
            "name": "Fascin family",
            "comments": "",
            "pubmed": [
                21618240
            ],
            "typical_gene": "FSCN1"
        }
    ]
}

image

@zcqian
Copy link
Contributor

zcqian commented Nov 22, 2021

(I remember commenting on this yesterday where did it go ...)

added here: https://github.com/biothings/mygene.info/tree/add_hgnc_family/src/plugins/hgnc_family

@newgene should PubMed ID be of type long and not indexed? This is what we have in other sources in MyGene.

@newgene
Copy link
Member

newgene commented Nov 22, 2021

@zcqian (RE: pubmed) good catch. Let's keep this field the same as other sources then.

@zcqian
Copy link
Contributor

zcqian commented Nov 23, 2021

@newgene should we index the PubMed ID field?

@newgene
Copy link
Member

newgene commented Nov 23, 2021

No for now, we can change if later we do need to query it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
BioThings/SmartAPI Translator Service...
  
Milestone group 1 - Year 2 - BioThing...
Translator project management (old)
Intern / volunteer / help wanted
Development

No branches or pull requests

6 participants