OriC_Finder is no longer in progress. It has been moved here: https://github.com/ZoyavanMeel/ORCA/
Python scripts that predict and plot the location of the origin of replication (oriC) of circular bacterial genomes based on Z-curve and GC-skew analysis.
The scripts are excecuted in a specific order to work properly, but also work independently, so that each script can serve as a checkpoint.
The DoriC data can be downloaded from http://tubic.tju.edu.cn/doric/public/index.php as a .RAR. Unpack this however you want and you'll be left with a CSV-file.
data_prep_doric.py
: Creates three new CSV-files (only _concat.csv works for now) that have each ordered the relevent DoriC data slightly differenly.
These scripts prepare the NCBI data for analysis. Each script has docs-strings for further information.
ncbi_download.py
: Use this script to download a dataset of your choice. Documentation for thencbi-genome-download
package can be found here: https://github.com/kblin/ncbi-genome-download.ncbi_to_fasta.py
: Unzips and extracts the downloaded FASTA-files from the dataset. Multiple cleaning/filtering options available.fasta_to_oriC_csv.py
: Predicts the oriC(s) for the whole dataset.
Once both the DoriC and NCBI datasets have been processed, they can be compared. This is done with oriC_comparison.py
.
This script predicts the origin of replication for circular bacterial DNA. It makes use of a combination of Z-curve and GC-skew analysis. You can load the required FASTA files yourself, or simply provide an accession and NCBI-account email and the find_ori
function will fetch them.
Required packages:
There are 3 general functions in this file which can be used to plot any generic 1D-np.array. To use these functions, make sure to have matplotlib installed
plot_Z_curve_3D
: Makes a 3D-plot of the Z-curve.plot_Z_curve_2D
: Can plot a maximum of four axes in a single 2D-plot. This one is useful for plotting single or multiple Z-curve or GC-skew component agaist each other.plot_GC_skew
: Does the same asplot_Z_curve_2D
, except only takes one array.