Generation of consensus metabolic reconstructions and community-dependent gap-filling using COMMIT for communites sampled from Arabidopsis thaliana.

Description

This repository contains all files used for community-dependent gap-filling and subsequent analysis of consensus genome-scale metabolic reconstructions for two soil communities (Bulgarelli et al. 2012, Schlaeppi et al. 2014).

The consensus reconstructions were generated by merging draft reconstructions from four different approaches:

KBase (Arkin et al. 2018)
RAVEN 2.0 (Wang et al. 2018)
CarveMe (Machado et al. 2018)
AuReMe (Aite et al. 2018) / Pathway Tools (Karp et al. 2016)

Requirements for COMMIT

Matlab (tested with versions R2017b and R2020b)
COBRA toolbox v3.0
LP solver for the COBRA toolbox, tested with
- IBM CPLEX v12.9
- Gurobi v9.1.1
- not recommended: glpk solver
HMMER (tested with v3.2.1)
R packages: pheatmap, wesanderson, scales, plotrix, igraph, ape

Installation

Clone the GitHub repository using

git clone https://github.com/pwendering/COMMIT

Adjust settings and paths in the configuration script at

code/matlab/options.m

The configuration script will also add all COMMIT functions to the Matlab search path. For this run the following commands within Matlab:

run('[path_to_commit]/code/matlab/options.m')

Finally, add bash scripts at code/bash/ to your PATH variable.

(optional) update reaction database

The reaction database for gap-filling and the tables used for reaction and metabolite ID translation depend on the MetaNetX database.

If you want to update the gap-filling database run (might have to be adjusted for more current version of MNXref) the following steps:

download the data files of the MNXref database from here
- reac_prop.tsv
- reac_xref.tsv
- chem_prop.tsv
- chem_xref.tsv
Within Matlab run:

% update the reaction_MNXref_balanced.lst file
mnxref_dir = 'data/tables/MNXref';
createMnxRefReactionDbFile(mnxref_dir)

database_file = 'data/tables/MNXref/reaction_MNXref_balanced.lst';
compartments = {'c', 'e'};
% list of IDs that should not enter the gap-filling database
black_list = {...
	'BIOMASS',...  % Biomass
    'MNXM517',...  % Light/hnu/photon
    'MNXM639',...  % peptide
    };
dbModel_MNXref_balanced = prepareFastGapFilling(database_file, compartments, black_list)
dbModel_MNXref_balanced = addNamesPermeabilityToGfDb(dbModel_MNXref_balanced, mnxref_dir);
save('data/gap-filling/database/Universal-model-MNXref-balanced.mat',...
    'dbModel_MNXref_balanced')

If you want to update the tables for ID translation run within Matlab (this step takes some time):

mnxref_dir = 'data/tables/MNXref';
createMnxrefTranslationTables(mnxref_dir)

Generation of consensus models

Preparation

create draft reconstructions with different approaches for different habitats

save the draft reconstructions as Matlab workspaces ('.mat') under data/models/habitat/method for each habitat and reconstruction approach

for this, read individual models using the COBRA toolbox function readCbModel, collect them in a cell array and save it as a Matlab workspace
The following code may serve as a blueprint

 % specify model file names
 file_names = {'model_1.xml', 'model_2.xml', 'model_3.xml'};
 
 % initialize cell array
 models = cell(1, numel(file_names));
 
 % read sbml files and add them to the cell array
 for i = 1:numel(models)
 	model = readCbModel(file_names{i});
 	models{i} = model;
 end
 
 % save cell array as workspace
 save('models', 'models')

Merge metabolic reconstructions

To generate consensus models from multiple reconstruction approaches use the script

code/model_generation/merge_metabolic_models.m

Adjust the variables to specify the location of reconstructions from single approaches (see point Preparation, above):

habitats: habitats where samples were obtained

methods: reconstruction approaches used

The consensus models will be saved under data/models/consensus.

(optional) Reconcile gene IDs

Some reconstruction approaches use their own annotation pipeline, starting from the genome sequence (e.g. KBase), while others require a multi-fasta file (i.e. structural annotation). As the structural annotation can differ and different pipelines assign different gene IDs, the resulting reconstructions from different approaches can contain different gene IDs. For this publication, I mapped gene IDs using BLAST to be able to compare the gene sets that are included in the reconstructions obtained from the different approaches.

Create a BLAST database for reference gene identifiers
- use your own structural annotation or a structural annotation obtained from one of the reconstruction approaches as a reference
- use the script code/bash/batch-makeblastdb.sh to generate BLAST databases for each amino acid multi-fasta file
- use code/bash/batch-makeblastdb-nucl.sh if you have nucleic acid sequences
If not already available, create multi-fasta files from .gff files
- e.g. in KBase, you can export .gff files and use them together with your genome sequence to create a multi-fasta file
- use the script code/bash/convert-gff-to-fasta.sh
- this script required bedtools to be installed
Run blastp or blastx to find corresponding genes or proteins
- The scripts code/bash/batch-blastp.sh and code/bash/batch-blastx.sh loop after all multi-fasta files in a directory and perform BLAST searches for all proteins/genes in a fasta file
- only one hit per protein is returned to obtain a one-to-one mapping
- this is not ideal because sequences might be split or lumped between structural annotations
The Matlab function code/matlab/translateGeneIDs.m can then be used to map gene IDs in the reconstructions, such that GPR rules can be combined and compared

Workflow for the publication results

the script COMMIT/code/matlab/commit.m contains the workflow for the publication
the script COMMIT/code/matlab/gap_filling/run_iterative_gap_filling.m performs the conditional gap filling procedure of COMMIT
the script COMMIT/code/matlab/gap_filling/gap_fill_individual_models.m performs gap-filling on the individual reconstructions without considering the community composition

Reference

Wendering P, Nikoloski Z (2022) COMMIT: Consideration of metabolite leakage and community composition improves microbial community reconstructions. PLOS Computational Biology 18(3): e1009906. https://doi.org/10.1371/journal.pcbi.1009906

Name		Name	Last commit message	Last commit date
Latest commit History 362 Commits
code		code
data		data
figures		figures
logs		logs
smetana-analysis/results		smetana-analysis/results
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

figures

figures

logs

logs

smetana-analysis/results

smetana-analysis/results

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Generation of consensus metabolic reconstructions and community-dependent gap-filling using COMMIT for communites sampled from Arabidopsis thaliana.

Description

Requirements for COMMIT

Installation

(optional) update reaction database

Generation of consensus models

Preparation

Merge metabolic reconstructions

(optional) Reconcile gene IDs

Workflow for the publication results

Reference

About

Releases 2

Packages

Languages

License

pwendering/COMMIT

Folders and files

Latest commit

History

Repository files navigation

Generation of consensus metabolic reconstructions and community-dependent gap-filling using COMMIT for communites sampled from Arabidopsis thaliana.

Description

Requirements for COMMIT

Installation

(optional) update reaction database

Generation of consensus models

Preparation

Merge metabolic reconstructions

(optional) Reconcile gene IDs

Workflow for the publication results

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages