Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding annotation on a gff file results in uniqueID offset-1968619510" #1556

Open
Jungal10 opened this issue Nov 12, 2020 · 9 comments
Open

Comments

@Jungal10
Copy link

Jungal10 commented Nov 12, 2020

I have a gff file that contains some info on the annotation of my genome. E.g:

##gff-version 3
##annot-version v2.1
##species Amaranthus hypochondriacus
Contig115|quiver|       phytozomev12    gene    5405    8135    .       -       .       ID=AH023784.v2.1;Name=AH023784
Contig115|quiver|       phytozomev12    mRNA    5405    8135    .       -       .       ID=AH023784-RA.v2.1;Name=AH023784-RA;pacid=38173840;longest=1;Parent=AH023784.v2.1
Contig115|quiver|       phytozomev12    CDS     8115    8135    .       -       0       ID=AH023784-RA.v2.1.CDS.1;Parent=AH023784-RA.v2.1;pacid=38173840

I have another file with extra information on the annotation that I parsed and pasted to the 9th column of the gff file. I followed the same pattern, having the feature identifier, equal sign, feature and semi-colon. e.g "Panther=PTHR10668,PTHR10668:SF80;"

The new gff is looking like this:

##species Amaranthus hypochondriacus    V2      V3      V4      V5      V6      V7      V8      V9
Contig177|quiver|       phytozomev12    CDS     231594  231692  .       +       0       ID=AH023665-RA.v2.1.CDS.7;Parent=AH023665-RA.v2.1;pacid=38166878 locus_name=AH018775;transcriptName=AH018775-RA;Pfam=PF05199,PF00732;Panther=PTHR10668,PTHR10668:SF80;KOG=;EC=1.1.3.20;KO=K17756;GO=GO:0055114,GO:0016614,GO:0050660;Best-hit-arabi-name=AT4G19380.1;arabi-symbol=;arabi-defline=Long-chain fatty alcohol dehydrogenase family protein locus_name=AH018775;transcriptName=AH018775-RA;Pfam=PF05199,PF00732;Panther=PTHR10668,PTHR10668:SF80;KOG=;EC=1.1.3.20;KO=K17756;GO=GO:0055114,GO:0016614,GO:0050660;Best-hit-arabi-name=AT4G19380.1;arabi-symbol=;arabi-defline=Long-chain fatty alcohol dehydrogenase family protein
Contig177|quiver|       phytozomev12    three_prime_UTR 231693  231975  .       +       .       ID=AH023665-RA.v2.1.three_prime_UTR.1;Parent=AH023665-RA.v2.1;pacid=38166878 locus_name=AH018775;transcriptName=AH018775-RA;Pfam=PF05199,PF00732;Panther=PTHR10668,PTHR10668:SF80;KOG=;EC=1.1.3.20;KO=K17756;GO=GO:0055114,GO:0016614,GO:0050660;Best-hit-arabi-name=AT4G19380.1;arabi-symbol=;arabi-defline=Long-chain fatty alcohol dehydrogenase family protein locus_name=AH018775;transcriptName=AH018775-RA;Pfam=PF05199,PF00732;Panther=PTHR10668,PTHR10668:SF80;KOG=;EC=1.1.3.20;KO=K17756;GO=GO:0055114,GO:0016614,GO:0050660;Best-hit-arabi-name=AT4G19380.1;arabi-symbol=;arabi-defline=Long-chain fatty alcohol dehydrogenase family protein

When I tried to use bin/flatfile-to-json.pl I had several error messages, so I opted to follow the indexed file format to add the annotation.

My problem now is that instead of having all these features that I added no the gff file column, I only have a feature stating "uniqueID offset-1968619510" <img width="276" alt="Screenshot 2020-11-12 at 23 38 15" src="https://user-images.githubusercontent.com/49553532/99004909-25f6d480-2540-11eb-9354-8e02840fdee4.png">

Can you help me with this issue, please? Thank you

@cmdcolin
Copy link
Contributor

what is the issue? not clear

the small gff fragment that is pasted can't be loaded because it refers to parent features that don't exist and it does seem that flatfile-to-json fails on the contigs containing a pipe symbol in them...not sure if there is anything we can do about that though

@Jungal10
Copy link
Author

This is the error message when trying to run flatfile-to-json:

 at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/Bio/JBrowse/ConfigurationManager.pm line 7.
	Bio::JBrowse::ConfigurationManager::__ANON__("\x{a}GFF3 parse error: some features reference other features tha"...) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 195
	Bio::GFF3::LowLevel::Parser::_buffer_all_under_construction_features(Bio::GFF3::LowLevel::Parser=HASH(0x7f879e42b7c0)) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 168
	Bio::GFF3::LowLevel::Parser::_buffer_items(Bio::GFF3::LowLevel::Parser=HASH(0x7f879e42b7c0)) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/../../extlib/lib/perl5/Bio/GFF3/LowLevel/Parser.pm line 73
	Bio::GFF3::LowLevel::Parser::next_item(Bio::GFF3::LowLevel::Parser=HASH(0x7f879e42b7c0)) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/Bio/JBrowse/FeatureStream/GFF3_LowLevel.pm line 15
	Bio::JBrowse::FeatureStream::GFF3_LowLevel::next_items(Bio::JBrowse::FeatureStream::GFF3_LowLevel=HASH(0x7f879d83c348)) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/Bio/JBrowse/Cmd/NCFormatter.pm line 52
	Bio::JBrowse::Cmd::NCFormatter::_format(Bio::JBrowse::Cmd::FlatFileToJson=HASH(0x7f879d01cea8), "trackConfig", HASH(0x7f879d1e70f8), "featureStream", Bio::JBrowse::FeatureStream::GFF3_LowLevel=HASH(0x7f879d83c348), "featureFilter", CODE(0x7f879d1f3318), "trackLabel", ...) called at /Users/josedias/Documents/PhD_Cologne/projects/PopAmaranth/popamabrowser/bin/../src/perl5/Bio/JBrowse/Cmd/FlatFileToJson.pm line 128
	Bio::JBrowse::Cmd::FlatFileToJson::run(Bio::JBrowse::Cmd::FlatFileToJson=HASH(0x7f879d01cea8)) called at bin/flatfile-to-json.pl line

what is the issue? not clear
The issue is that I want to have all the attributes I manually added to the gff file (Panther, Pfam, KO (..), but when I load the indexed file, instead of having these attributes, it only adds an attribute called unique with an offset and a number (like the example on the picture with uniqueID offset-1968619510)

the small gff fragment that is pasted can't be loaded because it refers to parent features that don't exist and it does seem that flat-file-to-json fails on the contigs containing a pipe symbol in them...not sure if there is anything we can do about that though

With the original gff file, it works even with the pipes.
And I only pasted new items to the already existing column with attributes. I did not add any parents or changed ID's. Not sure how can then parents point to non-existent IDs.

In the alternative, do you any other method to add more attributes to a gff file without eventually breaking it?

Thanks for your help, super fast as always

@cmdcolin
Copy link
Contributor

Can you share the whole GFF? The error I think is that the Parent= attribute in column 9 refers to another feature's ID that is then not in the file

@Jungal10
Copy link
Author

Jungal10 commented Nov 13, 2020

Ahypochondriacus_459_v2.1.gene.gff3.gz

another_attempt.gff.sorted.gff.gz
Ahypochondriacus_459_v2.1.annotation_info.txt

The ‘annotation_info’ is the file having extra attributes that I want to add.
The Ahypochondriacus_459_v2.1 is the original fully functional gff.
And the another_attempt is the file I created that is not working.
Thank you

@cmdcolin
Copy link
Contributor

@Jungal10

In what way is the another_attempt file not working?

@cmdcolin
Copy link
Contributor

When I load the another_attempt.gff.sorted.gz it appears to look ok in the browser

localhost_jbrowse__data=g1 loc=Contig115%7Cquiver%7C%3A1 37393 tracks=test highlight=

If the issue is strictly about the uniqueID appearing in the feature details then that is probably a small issue, but otherwise I am not sure what the issue is?

My track config is just this

[tracks.test]
urlTemplate=another_attempt.gff.sorted.gff.gz

@Jungal10
Copy link
Author

I tried now with the simple configuration like you shown and it worked.

I had it this way before which resulted in not showing any of the new attributes:

[tracks.Gene annotation A hypochondriacus v2 complete]
urlTemplate=another_attempt.gff.sorted.gff.gz
storeClass=JBrowse/Store/SeqFeature/GFF3Tabix
type=CanvasFeatures
category=Annotation

I just had a bad definition of the track, then.

Any idea why also the uniqueID insists on appearing?

Thank you one more time for your help!

@cmdcolin
Copy link
Contributor

Good to know. This branch might fix the uniqueID appearing.... https://github.com/GMOD/jbrowse/tree/uniqueID_remove_gff3tabix

If you have any interest feel free to test...can probably get merged and released later on too

@cmdcolin
Copy link
Contributor

Also not sure, that track definition looks ok so surprising that mine helped at all! Maybe just some weird luck or something? Hopefully didnt just gloss over a bug but the another_attempt file looked like it worked ok

@rbuels rbuels closed this as completed Mar 25, 2021
@rbuels rbuels reopened this Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants