Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frequency to PomBase gene to phenotype transform #647

Open
kevinschaper opened this issue Mar 20, 2024 · 2 comments
Open

Add frequency to PomBase gene to phenotype transform #647

kevinschaper opened this issue Mar 20, 2024 · 2 comments

Comments

@kevinschaper
Copy link
Member

Column 15 in the phaf format is described as:

Penetrance describes the proportion of a population that shows a cell-level phenotype. Penetrance data are represented as percents or entries from the in-house FYPO_EXT ontology (FYPO_EXT:0000001 = high; FYPO_EXT:0000002 = medium; FYPO_EXT:0000003 = low; FYPO_EXT:0000004 = full).

(the numbers preceding values below are counts)

The mapping to FYPO_EXT looks fairly clear here for these qualifier names:

5424 high
1391 medium
 991 low
 153 complete

Less clear for these:

   1 medium,high
   1 high,20

The FYPO_EXT definitions themselves don't give frequency ranges. For HPO frequency qualifiers, our sorting function takes the low value of the defined ranges, I'm not sure how I would map these to numeric values for sorting.

There are numerical ranges defined as well, some examples:

  10 60-70
   9 30-40
   6 5-30
   6 10-20
   5 70-80

For consistancy with HPO range qualifier behavior, I assume these would sort on the low value.

For sorting approximate frequencies, I would probably just strip the ~ and continue sorting on the low value

   1 ~8
   1 ~7580
   1 ~75
   1 ~70
   1 ~7
   1 ~66
   1 ~65
   1 ~60-70
   1 ~58
   1 ~52

(~7580 looks like it's meant to be ~75-80?)

Finally there are greater than and less than. I assume for the sake of sorting, we would just want to strip the > or < and alter the value slightly so that ">80" would sort above "80".

cc:@ValWood

@kevinschaper kevinschaper added this to the 2024-05 Release milestone Mar 20, 2024
@ValWood
Copy link

ValWood commented Mar 22, 2024

Hi @kevinschaper

I don't think it is worth you including the fission yeast penetrance and specificity extensions in Monarch. These are probably only really useful to fission yeast researchers working on these genes.

I misunderstood what the frequencies referred to. I thought that multiple annotations to the same phenotype were going to be collapsed and a "frequency" assigned like a "tally".

For instance in cdc2 there are 387 phenotypes, but many of the annotations are identical (from different sources)
e.g
Screenshot 2024-03-22 at 16 32 08

Is the "frequency" column intended to represent the frequency in a population? If so, it might be better to call it penetrance to be unambiguous?

The extensions in column 17 with the "assayed using" qualifier might be more useful because these link to the other gene entities that the mutant affects (making connections between other entities in the knowledge graph). a biological might be that gene A when mutated affects the localization, or transcript level, or modification of gene B. These could be useful for networks because >70% fission yeast genes have human orthologs.

I once sent an e-mail describing the aspects of fission yeast phenotypes data that would be most useful for informing human biology and hence for display in Monarch. I will see if I can find it.

I'm happy to meet up and discuss what might be most useful for Monarch with you @cmungall @monicacecilia

Sorry, this ticket is now about multiple things!

@ValWood
Copy link

ValWood commented Mar 22, 2024

Anyway, if you do decide to use penetrance these 3 will be fixed in tomorrow's export file

high,20 (fixed to 20 (%)
7580 fixed, I used a non ascii dash which got stripped, we will add a check for that)
medium,high (fixed to high)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants