Skip to content

Commit

Permalink
fix: added missing dependencies and improved docs (#450)
Browse files Browse the repository at this point in the history
* Added mising dependencies and improved docs

* Reformat

* Small tweaks

* Improved description
  • Loading branch information
fgvieira committed Feb 7, 2022
1 parent f3874ac commit e99f2a1
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 4 deletions.
2 changes: 2 additions & 0 deletions bio/mapdamage2/environment.yaml
Expand Up @@ -4,3 +4,5 @@ channels:
- defaults
dependencies:
- mapdamage2 =2.2
- python >=3.9
- pysam >=0.17
28 changes: 27 additions & 1 deletion bio/mapdamage2/meta.yaml
@@ -1,4 +1,30 @@
name: mapDamage2
description: tracking and quantifying damage patterns in ancient DNA sequences. For more information about MapDamage2 see `MapDamage2 documentation <https://ginolhac.github.io/mapDamage/>`_.
description: mapDamage2 is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
authors:
- Filipe G. Vieira
input:
- reference genome
- SAM/BAM/CRAM alignemnt
output:
- Runtime_log.txt: log file with a summary of command lines used and timestamps.
# If plotting
- Fragmisincorporation_plot.pdf, a pdf file that displays both fragmentation and misincorporation patterns.
- Length_plot.pdf, a pdf file that displays length distribution of singleton reads per strand and cumulative frequencies of C->T at 5'-end and G->A at 3'-end are also displayed per strand.
- misincorporation.txt, contains a table with occurrences for each type of mutations and relative positions from the reads ends.
- 5pCtoT_freq.txt, contains frequencies of Cytosine to Thymine mutations per position from the 5'-ends.
- 3pGtoA_freq.txt, contains frequencies of Guanine to Adenine mutations per position from the 3'-ends.
- dnacomp.txt, contains a table of the reference genome base composition per position, inside reads and adjacent regions.
- lgdistribution.txt, contains a table with read length distributions per strand.
# If stats output
- Stats_out_MCMC_hist.pdf, MCMC histogram for the damage parameters and log likelihood.
- Stats_out_MCMC_iter.csv, values for the damage parameters and log likelihood in each MCMC iteration.
- Stats_out_MCMC_trace.pdf, a MCMC trace plot for the damage parameters and log likelihood.
- Stats_out_MCMC_iter_summ_stat.csv, summary statistics for the damage parameters estimated posterior distributions.
- Stats_out_post_pred.pdf, empirical misincorporation frequency and posterior predictive intervals from the fitted model.
- Stats_out_MCMC_correct_prob.csv, position specific probability of a C->T and G->A misincorporation is due to damage.
- dnacomp_genome.txt, contains the global reference genome base composition (computed by seqtk).
# If rescaled BAM output
- Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores.
notes: |
* The `extra` param allows for additional program arguments.
* For more information see, https://ginolhac.github.io/mapDamage/
6 changes: 3 additions & 3 deletions bio/mapdamage2/test/Snakefile
Expand Up @@ -11,11 +11,11 @@ rule mapdamage2:
len="results/{sample}/Length_plot.pdf",
lg_dist="results/{sample}/lgdistribution.txt",
misincorp="results/{sample}/misincorporation.txt",
# rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file
# rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file
params:
extra="--no-stats" # optional parameters for mapdamage2 (except -i, -r, -d, --rescale)
extra="--no-stats", # optional parameters for mapdamage2 (except -i, -r, -d, --rescale)
log:
"logs/{sample}/mapdamage2.log"
"logs/{sample}/mapdamage2.log",
threads: 1 # MapDamage2 is not threaded
wrapper:
"master/bio/mapdamage2"

0 comments on commit e99f2a1

Please sign in to comment.