Skip to content

Input data used for CACAO annotation analysis. Code used to generate resulting figures for sharing.

Notifications You must be signed in to change notification settings

community-biocuration/cacao-annotations-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Annotation Analysis

Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO) Figure Generation Code

The input files and code used to generate the graphical figures in the CACAO manuscript are provided here.

Requirements

  • requirements.txt has all versioned python packages used to generate the figures. Conda was used as the package manager.

Data

  • The cacao_expanded_info.dat file is a modified gpad that is a precursor to the final quality-checked file sent to GO. Additional taxon information, as well as various CACAO-specific fields, have their own added columns. Like a GPAD, it is a tab-delimited file.
    • The taxon information was retrieved using ete3.
  • The cacao_dcnt-tinfo.txt and uniprot_dcnt-tinfo.txt files are results from the GOATOOLS analysis. The descendant count (dcnt) values for GO terms used in CACAO were calculated here.
  • The goa_uniprot_all_noiea_20200101.gaf is provided, but can also be located in the GO Data Archive.

Pie Charts

  • cacao_taxon_pie.py generates the taxonomy pie chart.
  • cacao_go_pie.py generates the GO aspect pie chart.

Descendant Count

  • cacao_dcnt.py generates the descendant count (dcnt) box plot comparison.

Notes

  • Code was formatted using Black

About

Input data used for CACAO annotation analysis. Code used to generate resulting figures for sharing.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages