Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display of modified forms #880

Open
ValWood opened this issue Apr 23, 2024 · 29 comments
Open

Display of modified forms #880

ValWood opened this issue Apr 23, 2024 · 29 comments

Comments

@ValWood
Copy link

ValWood commented Apr 23, 2024

In this model
rec8 protection

Screenshot 2024-04-23 at 16 45 55

The phosphorylation status of rec8 is important (in this case (Phos:(S450),UnPhos(S412))
In PRO this has a short label
PRO-Short-label: EXACT: Spom-rec8/Phos:2

Because most of our models will likey have modified forms and often multiple different modified forms (i.e acting as regulatory switches with different outcomes), I wonder if it would be possible to display the

"Phos:2" part of the PRO label on the noctua entity (currently only displays rec8 Spom whichever form is selected).
This will help us to navigate the different forms in the model.

I was really impressed that I could select and use modified forms!

@pgaudet
Copy link

pgaudet commented Apr 23, 2024

I think this is driven by the information provided in your GPI file --
@kltm Is this right?

@kltm
Copy link
Member

kltm commented Apr 23, 2024

@pgaudet If I'm following and am guessing right about what's happening, the answer is "yes".

@pgaudet
Copy link

pgaudet commented May 6, 2024

@ValWood

The PRO information is NOT in your GPI file (https://www.pombase.org/data/annotations/Gene_ontology/pombase.gpi.gz ) the line for that ID (PR:000050512) is:

PR:000050512 rec8 meiotic cohesin complex kleisin subunit Rec8 SO:0001217 NCBITaxon:4896 PomBase:SPBC29A10.14 PomBase:SPBC29A10.14.1.pep UniProtKB:P36626 go-annotation-summary=meiotic cohesin complex kleisin subunit Rec8

Howveer based on the PRO entry, it looks like you should put the PRO-Short-label: | EXACT in column 2 of your GPI:


https://proconsortium.org/cgi-bin/entry_pro?id=PR%3A000050512&retrieve.x=11&retrieve.y=14 image

Is this possible?

@ValWood
Copy link
Author

ValWood commented May 7, 2024

We can do that.
But we also have a display label which makes more sense biologically.
Can we include this label for display in Noctua?
The Pro-lables number in numerical order rec8/Phos:1, rec8/Phos:2, Rec8Phos3 etc.
So for this modified form we display ([Phos:(S450),UnPhos(S412)

This can get very confusing and isn't very meaningful to the users. Especially if there are many modified forms like rpb1
https://www.pombase.org/gene/SPBC28F2.12

@kimrutherford can you look into getting these 2 labels into the GPI?

@pgaudet
Copy link

pgaudet commented May 7, 2024

AFAIK what is is column 2 of the GPI gets displayed, so you'd need to make sure the information you want is there.

@ValWood
Copy link
Author

ValWood commented May 7, 2024

Great thanks. Are Complex Portal entities in Noctua read from the GPI file? We were discussing this morning and decided they probably were, but I have a note to check...

@vanaukenk
Copy link

@ValWood
Yes, Complex Portal entities are also read from the GPI file. SGD has some of those in their file, for example.
Basically, any entity that you want to have available for annotation in Noctua needs to be in the PomBase gpi file and, as Pascale said, what you want for a display label needs to be in Column 2, at least for now.
One thing we might want to consider for a future gpi format, though, is having a specific column for 'Display name' in case that is different from any of the other systematic names that exist which I think we would still want to capture.

@ValWood
Copy link
Author

ValWood commented May 7, 2024

Hi @vanaukenk - So do we see these IDs in Noctua after the next GO update? or is this on a different cycle?

@vanaukenk
Copy link

@ValWood
New entities in your gpi file are available for curation with each Noctua maintenance outage. The next one is tomorrow, May 9th, if you want to check.

@ValWood
Copy link
Author

ValWood commented May 8, 2024

great!

@ValWood
Copy link
Author

ValWood commented May 9, 2024

I can't locate a complex that is in our GPI file CPX-555

@kltm
Copy link
Member

kltm commented May 9, 2024

@ValWood how long ago did you add that to your file? If it does not show up after today's update, let's look into it.

@kltm
Copy link
Member

kltm commented May 9, 2024

@ValWood I'm guessing that it was added during this last period. It now seems to be available: http://noctua-amigo.berkeleybop.org/amigo/term/ComplexPortal:CPX-555 .

@vanaukenk
Copy link

@ValWood
I've checked on the three Noctua workbenches and can autocomplete on CPX-555 after this maintenance outage.
Please let us know if you can't find it.

Note that the maintenance outages happen from ~4-6pm PDT on the Thursdays when we have them. We put the outages on the GO calendar, too, (and send the email notice out), just in case you were unclear about the timing of the updates.

Thx.

@ValWood
Copy link
Author

ValWood commented May 10, 2024

Oddly I can't find it:

Screenshot 2024-05-10 at 09 38 54

the entire list had only SGD IDS. If I remove digits to autocomplete I get other entities but not this one.

@ValWood
Copy link
Author

ValWood commented May 10, 2024

ignore me, I get it, I will check on Friday.

@ValWood
Copy link
Author

ValWood commented May 10, 2024

...It is Friday.......

@ValWood
Copy link
Author

ValWood commented May 10, 2024

We had another look, we can find the complex when searching for entities in the activity unit, but not in the "protein complex" tool. MAybe we have something wrong in our GPI file?

@ValWood
Copy link
Author

ValWood commented May 10, 2024

MAybe we used the incorrect term for "complex" and is defaulting to gene? (GPI v20

@vanaukenk
Copy link

@ValWood - let me do some systematic testing/investigating to see if I can understand why it's not showing up in the Protein Complex Form part of the VPE.

I'll probably also move this issue to a separate ticket in the VPE tracker, but will link it here.

@vanaukenk
Copy link

@ValWood - this does indeed seem to be an issue with how the S. pombe protein-containing complexes are being typed in Noctua, although you've used the correct type (GO:0032991) for the gpi2.0 file format.

Looking at the entries for an S. pombe vs S. cerevisiae protein-containing complex in noctua-amigo I see the parentage differences:

image

image

The difference between the S. pombe and S. cerevisiae gpi files is the format; SGD is still using gpi1.2.

@kltm @balhoff - could the incorrect typing of S. pombe complexes in NEO be the result of a different input file format?

@ValWood
Copy link
Author

ValWood commented May 10, 2024

not that this isn't critical for us right now if you are busy, we can manage without complexes in the short term but we want to keep moving with this it so we can power through later.
...of course if it is at our end we will prioritise a fix.

@vanaukenk
Copy link

@ValWood - can you still make the annotations to complexes that you need using the gene product field in the Activity Unit interface?

Let me know if you want to conference.

@kltm
Copy link
Member

kltm commented May 10, 2024

The logic for GPI seems to be:

default:  'CHEBI:33695 ! information biomacromolecule';
if 'protein': 'CHEBI:36080 ! protein';
if 'transcript': 'CHEBI:33697 ! ribonucleic acid';
if 'protein_complex': 'GO:0032991 ! macromolecular complex';

The same is mostly true in the GAF processor, except the if 'protein': 'CHEBI:36080 ! protein'; bit is removed with the comment:
# note some groups incorrectly classify their genes as proteins

@vanaukenk
Copy link

@kltm - so, if a group's input file is gpi2.0 and they're using GO:0032991 for protein-containing complex type, does this mean the type field is ignored and everything defaults to 'information biomacromolecule'?

@kltm
Copy link
Member

kltm commented May 10, 2024

@vanaukenk Well, assuming there are no bugs elsewhere, the logic is basically to pull the "type" info from a column in the line and match as outlined above. I'm not quite sure what you mean by "using GO:0032991". If you want to hop on a Zoom, we can sort this out real quick.

kltm added a commit to geneontology/neo that referenced this issue May 10, 2024
@kltm
Copy link
Member

kltm commented May 10, 2024

In discussion with @vanaukenk we have modified the parser to follow the GPI 2.0 spec a little better. Re-running load to see if there are improvements.

@vanaukenk
Copy link

@ValWood
Following up on testing for S. pombe complexes in the VPE (and elsewhere).
I checked all of the different workbenches on production Noctua and things look okay. CPX-555 (and other pombe complexes) are now available in the Protein Complex widget of the the VPE and in the other gene product autocompletes, so I think we're good.
Please double-check and let us know if this is working as expected for you now.

@kltm

@ValWood
Copy link
Author

ValWood commented May 15, 2024

Yes that works, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants