Skip to content

Calculating the coverage depth for each coding gene and the percentage of each gene covered at ≥ 10X depth.

License

Notifications You must be signed in to change notification settings

Rcoppee/Scan_gene_coverage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scan_gene_coverage

An algorithm for calculating the coverage depth for each coding gene and the percentage of each gene covered at ≥ 10X depth.
The ouput file will give a list of gene identifiers with the corresponding mean coverage and their proportion covered at ≥ 10X depth.
We suppose you produced a sorted.bam file at this step (for example using Samtools).
The algorithm was written for Plasmodium falciparum species, but can also be used for other Plasmodium species and other organisms (by supplying fasta and gff files).


1. Preparing a per-base coverage file for Scan_gene_coverage analysis

Prior to compute depth coverage at each position of the genome, you must convert your sorted.bam file into a sorted.bed file with bedtools bamtobed:

bedtools bamtobed -i file_sorted.bam > file_sorted.bed

To compute depth coverage at each position of the genome with bedtools genomecov, you must specified a text file containing the list of chromosomes and corresponding length concomitantly with the sorted.bed file. A list_chromosomes.txt file for the Plasmodium falciparum species (version 39 on PlasmoDB) is provided in the data directory.

bedtools genomecov -d -i file_sorted.bed -g list_chromosomes.txt > file_coverage.txt


2. Calculating mean coverage of each coding gene and percentage of coding gene covered at ≥ 10x depth

To calculate the mean coverage of each coding gene and percentage of coding gene covered at ≥ 10x depth, you must provide a species.gff file (that contains the coordinates of each coding gene), a reference genome in fasta format (it must be the same version as the provided gff file), and your file_coverage.txt file previously obtained with bedtools genomecov.

An example of reference genome in fasta format and corresponding gff file are provided in the data directory.

python3 Scan_gene_coverage.py -p file_coverage.txt -f reference_genome.fasta -g reference_coordinates.gff -o output.txt


3. Citation

If you use this program for your own work, please cite:

Coppée et al. 5WBF: A low-cost and straightforward whole blood filtration method suitable for whole-genome sequencing of Plasmodium falciparum clinical isolates. (2022) Malaria Journal. DOI: 10.1186/s12936-022-04073-1

https://malariajournal.biomedcentral.com/articles/10.1186/s12936-022-04073-1

About

Calculating the coverage depth for each coding gene and the percentage of each gene covered at ≥ 10X depth.

Topics

Resources

License

Stars

Watchers

Forks

Languages