An aaRS Engineering Pipeline 🧬

A Scalable Pipeline for the Discovery of aaRS Mutants to Aid in Genetic Code Expansion.

Introduction

Aminoacyl-tRNA Synthetases (aaRSs) are a class of enzyme central to translation, facilitating the interaction of tRNAs to a cognate canonical amino acid (CAA). To enable expansion of the natural chemical toolbox to encompass non-canonical amino acids (NCAAs), new enzymes of this class have to be enginereed to accept the alternative chemistries offered by NCAAs, while remaining inert to pre-existing CAAs.

Workflow

The engineering workflow facilitates structure prediction of the complete
permutational complexity of the defined mutant landscape through Rosetta Cartesian energy minimization, followed by blind docking of native and target (exogenous) amino acids by CB-Dock. Mutants are then scored for fitness using the Delta and RMSD metrics.

Scoring

Delta:

Measures enzyme engineered favourability as: NCAA Dock Score - Native Dock Score
Lower the score, the greater the engineered affinity for the NCAA over the native substrate

RMSD:

Estimation metric of the mutant producing a productive docking pose with the target NCAA
Root-Mean-Square Deviation of the exogenously docked NCAA to the crystal-structure derived docking position of the native amino acid
Lower the RMSD, the greater the mutant appears to dock the NCAA in a productive binding pose.

Prerequisites

This pipeline uses the Rosetta 3 cartesian_ddg script for structure prediction. This software suite is incredibly large (~18Gb compressed) and so can't be packaged within this repository. Download here and move the uncompressed directory with binaries (or compile yourself), to the /resources/ directory.
(Version rossetta_bin_linux_2020.08.61146_bundle)

Quick Start

This is a Nextflow pipeline and as such, can only be run on a POSIX OS, if using windows I'd recommend using the Windows Subsystem for Linux (WSL)

git clone https://github.com/J-E-J-S/aaRS-pipeline
conda env create -f environment.yml
conda activate aaRS-pipeline
./aaRS-pipeline.sh -i -m -r

Usage

Inputs

Create a mutations.txt file in the /inputs/ directory
Must be in form:

X99 MNQ
Y100 MNQ
Z101 MNQ

Where:

X is the single-letter ID for the wild-type residue to be mutated
The number '99' is the residue number of the mutable position
MNQ is the pool of residues to be mutated at this position

Add the native and exogenous (target) amino acids to the /inputs/ directory

Amino acids have to be labelled as nativeLigand.mol2 and exogenousLigand.mol2 respectively
Amino acids must be in .mol2 standard
The native amino acids must be taken from the known crystal structure of the template enzyme
- This permits the RMSD calculation to estimate the productivity of the NCAA docking pose

Add the template enzyme .pdb file to the /inputs/ directory

Running the Pipeline

To install the prerequisite scripts run the shell script with the -i option (this only has to be performed once) ./aaRS-pipeline.sh -i
To prepare the mutational file system to begin pipeline flow, run the shell script with the -m option (this has to be performed for every new template enzyme or mutational context)
./aaRS-pipeline.sh -m
To begin the pipeline, run the shell script with the -r option
./aaRS-pipeline.sh -r or run cmd nextflow run main.nf
To run pipeline from beginning to end as a new user, combine all options
./aaRS-pipeline.sh -i -m -r

Results

Results are compiled into a JSON object that can be found in the /output/ directory, in the form:

{
    mutantID:
                {
                    'exogenousScores': [],
                    'nativeScore': float,
                    'Delta': [],
                    'RMSD' : [],
                    'structurePath': string,
                    'dockingPath': string
                }
}

This file becomes the store for all the data for the pipeline run.

Querying Results

Run the script: python queryResults.py <mutantQn> <rmsdCutOFf> <./../output/results.json> in the /bin/ dir
Where:

mutantQn = number of mutants desired to filter down to
rmsdCutOff = the minimal accepted RMSD value (2Å recommended)
/output/results.json = path to the pipeline generated results.json
Creates individual summary directories for mutants with the structure.pdb,
docking_results.mol2 and a summary .fasta file, as well as an overall
summary .fasta file.

Manually inspect the results in PyMOL against the wild-type template

Look for similarity in binding mode of the target NCAA to the native CAA
Mutant Structure .pdb and NCAA docking results .mol2 are compiled into the queried results generated directory

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
assets		assets
bin		bin
inputs		inputs
output		output
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aaRS-pipeline.sh		aaRS-pipeline.sh
environment.yml		environment.yml
main.nf		main.nf
nextflow		nextflow

License

J-E-J-S/aaRS-pipeline

Folders and files

Latest commit

History

Repository files navigation

An aaRS Engineering Pipeline 🧬

A Scalable Pipeline for the Discovery of aaRS Mutants to Aid in Genetic Code Expansion.

Introduction

Workflow

Scoring

Prerequisites

Quick Start

Usage

Inputs

Running the Pipeline

Results

Querying Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages