Amplicon region and Primer deduction

Requirements

python3
Python Modules:

import subprocess
import sys
import os
import glob
import pandas as pd

Download SRA Tool form the link (https://www.metagenomics.wiki/tools/short-read/ncbi-sra-file-format/sra-tools-install)
Download bbtools and local-blast
Silvadb: silva.fasta

STEPS

Step 1. To download raw sequences (SRR) from NCBI and Perform fasterq-dump for extract data in Fastq from SRA-accessions

Open bioproject_sample_ids.txt file and edit it with your bioproject and sample ID's which you want to download
Run "download_raw_reads.py" python script from the terminal. This script will read all the ids from bioproject_ids.txt file and it will download all the samples in fastq format

$ python3 download_raw_reads.py

*Note you will get all downloaded sample with BioprojectID folder in the data_sets directory

Step 2. Merging sequence and perform blastn with silva database

edit line no 7 of mergeblast.sh script according to your bbmerge.sh path(to find path use below command)

$ locate bbmerge.sh

run mergeblast.py python script

$ python3 mergeblast.py

Outputs will be store under data_sets directory

01. data_sets => this is the main folder all the downloaded and output file will be stored under this directory
        - PNRJ1 PNRJ2 PNRJ3 .... PNRJn => Project id directories (output of prefetch)
                - ERR1612265 => it stores ERR1612265.sra file (output of prefetch)
                - raw_reads =>  it stores raw reads _1.fastq, _2.fastq (output of fastq-dump)
                - output => storing outputs of merging=.fasta seqtk=top20_seq.fa, blastn_out=out.txt
02. silvadb_out => stored makedb outputs

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
resources/primers		resources/primers
README.md		README.md
bioproject_sample_ids.txt		bioproject_sample_ids.txt
download_samples.py		download_samples.py
download_samples.sh		download_samples.sh
extract_sample.py		extract_sample.py
mergeblast.py		mergeblast.py
mergeblast.sh		mergeblast.sh
remove_hypo_seq.sh		remove_hypo_seq.sh
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resources/primers

resources/primers

README.md

README.md

bioproject_sample_ids.txt

bioproject_sample_ids.txt

download_samples.py

download_samples.py

download_samples.sh

download_samples.sh

extract_sample.py

extract_sample.py

mergeblast.py

mergeblast.py

mergeblast.sh

mergeblast.sh

remove_hypo_seq.sh

remove_hypo_seq.sh

run.py

run.py

Repository files navigation

Amplicon region and Primer deduction

Requirements

STEPS

Step 1. To download raw sequences (SRR) from NCBI and Perform fasterq-dump for extract data in Fastq from SRA-accessions

Step 2. Merging sequence and perform blastn with silva database

Outputs will be store under data_sets directory

About

Releases

Packages

Languages

fgcsl/Bioinformatics_scripts

Folders and files

Latest commit

History

Repository files navigation

Amplicon region and Primer deduction

Requirements

STEPS

Step 1. To download raw sequences (SRR) from NCBI and Perform fasterq-dump for extract data in Fastq from SRA-accessions

Step 2. Merging sequence and perform blastn with silva database

Outputs will be store under data_sets directory

About

Resources

Stars

Watchers

Forks

Languages