Skip to content

arbasher/prepBioCyc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocess BioCyc files

Codacy Badge

BioCyc data preprocessing steps

Basic Description

prepBioCyc is a collection of modules for

  • preprocessing BioCyc database collection, including MetaCyc (step 1)
  • building association matrices among pathways, ECs, and compounds (step 2)
  • extracting EC and pathway properties while building mapping files for the downstream PathoLogic and MinPath prediction algorithms (steps 3-4)
  • construct pairwise similarities among pathways (step 5)
  • building gene, EC, and pathway graphs (step 6)
  • creating synthetic and golden datasets with features (step 7-8)

Dependencies

The codebase is tested to work under Python 3.8. To install the necessary requirements, run the following commands:

pip install -r requirements.txt

Basically, prepBioCyc requires following packages:

Installation and Basic Usage

Run the following commands to clone the repository to an appropriate location:

git clone https://github.com/hallamlab/prepBioCyc.git

For all experiments, navigate to src folder then run the commands of your choice. For example, to display * prepBioCyc*"s running options use: python main.py --help. It should be self-contained. For a general usage execute the following command:

python main.py --build-biocyc-object --build-indicator --build-pathway-properties --build-ec-properties --build-pathway-similarities --build-graph --constraint-kb 'metacyc' --build-synset --ex-features-from-synset --build-golden-dataset --ex-features-from-golden-dataset --build-pathologic-input --build-minpath-dataset --minpath-map --kbpath "[path to database]" --ospath "[path to the object files (e.g. 'biocyc.pkl')]" --dspath "[path to dataset and to store results]" --display-interval -1 --num-jobs 2

Please obtain MetaCyc and other databases from BioCyc.

Citing

If you find prepBioCyc useful in your research, please consider citing this repo and the following papers:

Contact

For any inquiries, please contact: arbasher@student.ubc.ca