Skip to content

Scripts to extract information about Carbohydrate Binding Modules (CBM) from the CAZy Database

Notifications You must be signed in to change notification settings

crfield18/CAZy-Database-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CAZy Database Parser

This is a series of scripts that parse and extract information about each carbohydrate binding module (CBM) family from the Carbohydrate Active enZYme Database.


What is the Carbohydrate Active enZYme Database?

CAZy is an online database created in 1998 that holds genomic, structural and biochemical information about Carbohydrate-Active Enzymes (CAZymes) and their associated modules.

These include:

  • Glycoside Hydrolases (GH)
  • GlycosylTransferases (GT)
  • Polysaccharide Lyases (PL)
  • Carbohydrate Esterases (CE)
  • Auxilliary Activities (AA)
  • Carbohydrate Binding Modules (CBM)

Included Scripts

cazy_parse.py

Consolidates the 'Activity in Family' information from each CBM page into a single excel file. 'Note' information is used in place of 'Activity in Family' if that field is not populated.

database_trim.py

Downloads and extracts each CBM listed in the CAZy database across bacteria, archaea, viruses and eukaryota.

cazy_functions.py

Generic functions used across the other scripts.


Dependencies

  • pandas 1.5.2
  • python3-wget 3.2

Written in Python 3.10.9

About

Scripts to extract information about Carbohydrate Binding Modules (CBM) from the CAZy Database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages