Skip to content

biovcnet/bvcn-binder-kegg-koala

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Binder for BVCN Functional Annotation lesson 6

Initially forked from here. Thank you to the awesome binder team!

Binder

Part of the Bioinformatics Virtual Coordination Network :)

Vist the KEGG-Decoder GitHub page for more information

Walkthrough

There are 4 example files from the various KEGG KOALA outputs, blastkoala.txt, ghostkoala.txt, kofamkoala.txt, and NORP_subset.txt. Each one has the data formatted in a specific way that may (or may not) work with KEGG-Decoder.

blastkoala.txt is a single genome formated correctly

BAFMBKGE_00001  K03596
BAFMBKGE_00002  K03100
BAFMBKGE_00003  K03685

As a single genome, KEGG-Decoder can only produce a static visualization output - interactive visualization requires at least 3 genomes.

KEGG-decoder -i blastkoala.txt -o blastkoala.ko -v static

The blastkoala.svg file contains a heatmap comparing the genome results to the KEGG-Decoder pathways. The more red the cell of the heatmap, the more complete the pathway

ghostkoala.txt has a single genome that is formatted incorrectly

NC_015736.1_1   K01703
NC_015736.1_2   K01704

KEGG-Decoder can only accept gene IDs with a single _. KEGG-Decoder uses all of the string prior to the first _ as the name to associate with each row of the output and heatmap. Identical strings would result in the merging of multiple genomes together

sed 's/NC_/NC/g' ghostkoala.txt > ghostkoala-mod.txt
KEGG-decoder -i ghostkoala-mod.txt -o ghostkoala.ko -v static

kofamkoala.txt is a complex table that requires some pre-processing

sed 's/ \+ /\t/g' kofamkoala.txt | cut -f1,2 | sed 's/\* //g' |  grep -v "#" > kofamkoala-mod.txt
KEGG-decoder -i kofamkoala-mod.txt -o kofamkoala.ko -v static

The genome name for the output will be WP as it is the string prior to the first _.

NORP_subset.txt contains KOALA results for 4 genomes. The NORP.ko file will be created after each KEGG-Decoder run, but will contain identical contents. It is the numerical values used to the create the heatmap. Because there are 3+ genomes, we can make an interative visualization as well as a static visualization.

KEGG-decoder -i NORP_subset.txt -o NORP.ko -v static
KEGG-decoder -i NORP_subset.txt -o NORP.ko -v interactive

About

Lesson and tutorial on the KEGG database, and annotating with Blast/Ghost/KofamKOALA and KEGGDecoder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 65.2%
  • Python 34.8%