Markonv: a novel convolutional layer for identifying inter-positional correlations effectively and efficiently

This is the repository for reproducing figures and tables in the paper.

Requirements

Python environment

conda env create -f env_markonv.yaml
conda activate markonv
pip3 install git+https://github.com/rrwick/Porechop.git
cd scripts/bonito
python setup.py develop

R packages

ggpubr
data.table
magrittr
foreach

Reproduce results related to simulations

Generate simulation datasets and plot Markov transition matrix (Reproduce Appendix C)

cd external/simulation
python3 GeneRateMarcov.py

The plotted Markov transition matrix is saved in external/simulation/motif/.

Training and evaluation

cd ../../scripts/simulation
python3 torch_trainMarkonv.py

Plot AUROC (Reproduce Figure 2)

python3 Compare.py
Rscript generate_fig_2.R

The figure generated is saved in result/simulation.auroc.png.

Recover motifs (Reproduce Figure 3 and Appendix G)

python3 geneRatemotifs.py
python3 kernel2motif.py
cd ../../

The recovered Markov transition matrix is saved in result/simulation/Motifs/*.png.

You can then compare the recovered motif with the real motif. Offset may exist between the real motif and the recovered motif, and the offset is saved in result/simulation/Motifs/kernel_offset.txt.

Reproduce results related to HOCNNLB (Figure 4)

Downloads training and test datasets from https://github.com/NWPU-903PR/HOCNNLB/blob/master/lncRBPdata.zip and put them into ./external/HOCNNLB

cd external/HOCNNLB 
unzip lncRBPdata.zip
mv RBPdata1201 fasta
cd ../code/
python generateHDF5.py

cd ../../../scripts/HOCNNLB
python torch_trainMarkonv.py

python Compare.py

cd ../../

Reproduce results related to Bonito (Table 2)

Download training and test sets

Get training set

cd scripts/bonito
bonito download --training

Get test set

mkdir -p ../../external/bonito/Klebsiella_pneumoniae_INF032_fast5s
cd ../../external/bonito/Klebsiella_pneumoniae_INF032_fast5s
wget https://bridges.monash.edu/ndownloader/files/15188573
mv 15188573 Klebsiella_pneumoniae_INF032_fast5s.tar.gz
tar -xzf Klebsiella_pneumoniae_INF032_fast5s.tar.gz

cd ../../../scripts/bonito/test
wget https://bridges.monash.edu/ndownloader/files/14260223
mv 14260223 Klebsiella_pneumoniae_INF032_reference.fasta.gz
gzip -d Klebsiella_pneumoniae_INF032_reference.fasta.gz
mv Klebsiella_pneumoniae_INF032_reference.fasta reference.fasta
cd ../

Training and basecalling

Convolution-based bonito network

bash run_train_basecall_conv.sh

Markonv-based bonito network

bash run_train_basecall_markonv.sh

The number of parameters for each network (in Table 2) will be printed on the screen.

Evaluating basecalled reads

Convolution-based bonito network

cd test
bash run_analysis_conv.sh

Markonv-based bonito network

bash run_analysis_markonv.sh

The read accuracy for each read is saved in result/bonito/*_reads.tsv, and the consensus accuracy for each read is saved in result/bonito/*_assembly.tsv. The median of read accuracy and consensus accuracy for each network (in Table 2) will be printed on the screen.

Testing for multiple random seeds

cd ../
sbatch sbatch_multiple_seed.sh
cd ../../

The result file is saved in result/bonito/bonito_multiple_seeds.tsv. You can use result/bonito/stat_multiple_seeds.py to calculate the mean and standard deviation. (Table 2)

Reproduce results related to loss curve (Figure 5 and Appdenix H)

cd scripts/SpeedCompare
python torch_trainMarkonv.py
python DrawTheHistory.py

The figure is saved in result/SpeedCompare/figure/.

Reproduce results related to additional figure 1

Generate simulation datasets for training and testing

cd external/JasparSimu
python3 PWMmotifSimu.py

The training and testing dataset is saved in external/JasparSimu/HDF5/1/.

Training and evaluation

cd ../../scripts/JasparPWMtest
python3 torch_trainMarkonv.py

Plot AUROC (Reproduce additional figure 1.B)

python3 Compare.py

The figure generated is saved in result/JasparSimu/picture/1auc.png.

Recover motifs (Reproduce additional figure 1.B and Appendix G)

python3 geneRatemotifs.py
python3 kernel2motif.py
cd ../../

the additional figure 1.C is in result/JasperSimu/Motifs/1/MarkonvV, while the additional figure 1.B is in result/JasperSimu/PWMMotifs2

Reproduce results related to additional figure 2 and additional figure.4

Generate simulation datasets for training and testing

cd external/simulation2merShuffle
python3 generateData.py

Training and evaluation

cd ../../scripts/simulation2merShuffle
python3 torch_trainMarkonv.py

Plot AUROC of all networks (Reproduce additional figure 4)

python3 Compare.py

The figure generated is saved in ~/result/simulation2merShuffle/picture/.

Plot AUROC between Markonv-based network and convolution-based network (Reproduce additional figure 2)

Rscript generate_fig_2.R

The figure generated is saved in ~/result/simulation/simulation.auroc.png.

Recover motifs (additional figure 3.A and additional figure 3.C)

python3 geneRatemotifs.py
python3 kernel2motif.py
cd

The Markov transition matrix(additional figure 3.A) is saved in result/simulation2merShuffle/Motifs/291/MarkonvV/*.png. The Markov transition matrix(additional figure 3.C) is saved in result/simulation2merShuffle/Motifs/291*.png.

Reproduce results of additional figure 5 and additional figure 6

Downloads training and test datasets from https://github.com/NWPU-903PR/HOCNNLB/blob/master/lncRBPdata.zip and put them into ./external/HOCNNLB

cd external/HOCNNLB 
unzip lncRBPdata.zip
mv RBPdata1201 fasta
cd ../code/
python generateHDF5.py

cd ../../../scripts/HOCNNLB
python torch_trainMarkonv.py

python Compare.py
python robust.py

cd ../../

Reproduce results of additional figure 7 and 64 loss curve in all-64-figures.zip

cd scripts/SpeedCompare
python torch_trainMarkonv.py
python DrawTheHistory.py

The figure is saved in result/SpeedCompare/figure/.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
core		core
external		external
result		result
scripts		scripts
LICENSE		LICENSE
README.md		README.md
env_markonv.yaml		env_markonv.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

core

core

external

external

result

result

scripts

scripts

LICENSE

LICENSE

README.md

README.md

env_markonv.yaml

env_markonv.yaml

Repository files navigation

Markonv: a novel convolutional layer for identifying inter-positional correlations effectively and efficiently

Requirements

Python environment

R packages

Reproduce results related to simulations

Reproduce results related to HOCNNLB (Figure 4)

Reproduce results related to Bonito (Table 2)

Reproduce results related to loss curve (Figure 5 and Appdenix H)

Reproduce results related to additional figure 1

Reproduce results related to additional figure 2 and additional figure.4

Reproduce results of additional figure 5 and additional figure 6

Reproduce results of additional figure 7 and 64 loss curve in all-64-figures.zip

About

Releases

Packages

Contributors 2

Languages

License

gao-lab/Markonv_figures

Folders and files

Latest commit

History

Repository files navigation

Markonv: a novel convolutional layer for identifying inter-positional correlations effectively and efficiently

Requirements

Python environment

R packages

Reproduce results related to simulations

Reproduce results related to HOCNNLB (Figure 4)

Reproduce results related to Bonito (Table 2)

Reproduce results related to loss curve (Figure 5 and Appdenix H)

Reproduce results related to additional figure 1

Reproduce results related to additional figure 2 and additional figure.4

Reproduce results of additional figure 5 and additional figure 6

Reproduce results of additional figure 7 and 64 loss curve in all-64-figures.zip

About

Resources

License

Stars

Watchers

Forks

Languages