Skip to content

13. Pipeline Change History

d-j-e edited this page Sep 7, 2016 · 23 revisions

Current version:

V1beta.10.3 (Calico Cat)
fix for SE and IT reads (no stand bias calling atm - need to validate b4 adding)
update to many programs used (local Helix installation)
User Manual update

V1beta.10.2 minor corrections for local install and R scripts
V1beta.10.1 fix for minor error in q30VarFilter
V1beta.10
added strand bias filter
manual update

Previous versions:

V0.1
converted to vcf output via mpileup instead of depreciated pileup
...
V0.2
tested version V0.1.1
V0.2.1
added statistic reporting and fixed "success" handling
V0.3
tested version of V0.2.1
V0.3.1
added alternative output paths to aid clean up
new name: pipe_VariantDiscovery.py
added q20 and q30 mpileups and associated stats collection
various cleanup of code/naming conventions used in pipeline
V0.3.2
added alternative path for IT or PE data analysis
V0.3.2.1
added alternative path for SE data analysis
added minimum read of depths option for variant filtering
V0.3.2.2
update in various tools (bwa, bamtools and tmap)
V0.3.3
tested version of V0.3.2.2
V0.3.4
added options for qc of IT reads
V0.3.4.1
added size option for qc of IT reads
updated statistics reporting - minimum reads
version numbers change (1.x to 0.x)
V0.3.4.2
added pass/fail to stats reporting
added outgroup/ingroup to stats reporting
V0.3.5
tested version of V0.3.4.2
V0.3.5.1
update to various tools (BWA and tmap)
output folders now created within the pipeline
added separate folder for success files within the temp folder
added two new output folders (bam and vcf)
V0.3.5.2
tested version of V0.3.4.1
V0.3.5.3
changed to pipe_vda including:
allow for merging of new sets of reads into a prior run
inclusion of analysis pipelets
pipe_VCFAnalysis and pipe_AllGeneCover
clean up of temp directory and/or output directory (if merging)
changed "type" to "readType"
slight change to pipeline order
V0.4
tested version of V0.3.5.3
V0.4.0.1
merging of bams from different read sets of same strain
(either during "new run" or "merge run")
fixed bug in gene cover and depth matrices script
V0.4.0.2
tested version of V0.4.0.1
V0.4.0.3
removal of QC from within pipeline (and testing)
V0.4.0.4
replace filter.awk with python-based filtering of all hets from Q30 vcfs
includes counting removed het SNPS and reporting same
in stat.tab (and testing)
V0.4.0.5
inclusion of parseSNPtable script
and tree generation
V0.4.0.6
corrections to many scripts, including allele matrix calling
and downstream effects to pipeline
allele matrix calling now uses consensus sequences
addition of differences of SNPs as distance matrix in NEXUS format
gene cover and depth matrices no longer contain "fails"
addition of parseGeneContent script
removal of q20 vcfs (and reporting) - not required
V0.4.0.7
removed duplicated stages from config file
V0.4.0.7.1
fix to allow sequences from different folders in the same run
V0.4.0.7.2
fix to deriveStats that let some failed reads pass on depth
V0.4.1
Map to reference with multiple "chromosomes": pangenome mapping
simplest case: new run (no merging of runs or samples)
up to stats collection ('collateRepStats', no post-stats analyses)
add final '/' to output path(s) if missing
V0.4.2
Map to reference with multiple "chromosomes": phylogenetic mapping
simplest case: new run (no merging of runs or samples)
up to stats collection ('collateRepStats', no post-stats analyses)
added start-up message
changed reference entry from GenBank and Fasta formats to GenBank or Fasta format
fasta reference generated from user GenBank reference
added pre-run checks including
pairs of reads exist before starting PE analysis
check for 'sequence' option - bad pattern entry
valid run and read types are entered
zeroing of SAM files when no longer needed
V0.4.3
added 'post-stats' analyses - pangenome and phylogeny - no Genbank
added pre-run reporting and run start confirmation
V0.4.4
added 'post-stats' analyses - pangenome and phylogeny - with Genbank
V0.4.4.1
various small fixes
V0.4.4.2
more various small fixes
V0.4.4.3
increased speed of deriveAllRepGeneCover and getCoverByRep
V0.4.4.4
conversion of pipeline to use SLURMed Rubra
V0.4.4.5
fix for bug in BWA sampe/samse v0.7.5
V0.4.5
renamed pipeline
add merging of runs for pangenome and phylogenetic mapping
remove single replicon run
added bowtie2 to mapping options (all read types)
removal of tmap
removal and replacement of bamtools (pileup for coverage instead)
cleanup of pipeline scripting (amalgamation of repeated stages)
converted emboss call to a biopython script
add 'check_reads_mapped' variable for multiple replicon runs
V0.4.5.1
fix for replicon statistics generation for pangenome runs
V0.4.5.1.1
fix for all statistics generation when no reads map
V0.4.5.2
check that replicons all have unique names
check that output and out_merge_target folders are different
check that output folder is not empty string
splitting of getRepAlleleMatrix to improve performance
includes sequence list generation (start of post-run report)
V0.4.6
update to newer version of parseSNPtable.py
generation of variable and conserved SNP tables
includes of additional option of setting conservation level
further early checks that include:
name of reference/replicons/isolates won't confuse post-NEXUS analysis (i.e. no '+')
fix for when a replicon consensus fasta is missing
includes new 'warning' file
change behaviour of outgroups - reported (also in outgroups.txt fle) but not removed
Editing and reorder of options in config file
V0.4.7
changed 'sequence_list.txt' generation to function
added stage counts and check before firing last stage deleteDir
added check for isolates/reads with identical names
fixes for errors in mergeRepStats and parseSNPtable
latter includes fixes in script to improve performance
V0.4.8
include post-run report file function
test for 'output' folder prior to run
default conservation changed to 0.95
consolidated chrom_info functions into pipe_utils
further replicon name checking
fix for pipe-generated gene 'tags' when missing
write cns warning files to outMerge on merge run
v0.4.9
split location of intermediate files to improve stability for large runs
changed -X switch in bowtie2 PE mapping from 500 (default) to 1500
removal of some redundant scripts and stages in config file
V0.5
added check for deletion of previous run success file on merge run
added checkpoints for better pipeline running - will halt on errors as expected
includes changes to complex stages - flagFiles behaviour
V0.5.1
added -X option for bowtie2 mapping
fixed parseGeneContent output
updated to fixed and extended parseSNPtable
added scripts for tutorial (filterCoords.py, get_cover.py, getRecomb.R, plotTree.R)
changed getDifferenceMatrix to optional output in pipe, changed script to take options
added option for VCF output of filtered hets
implemented changes to run report from user feedback
V0.5.2
run report provides settings for merge runs (continuity checks)
includes more robust 'check_reads_mapped'
update to use SAMtools v1+
includes (limited) addition of multiallelic SNP calling option - bcftools
added checkpoint_getSamStats to capture failure during initial BAM construction
changed bams from glob call to list call
various small fixes to some default values
fixed getRecomb.R
V1.0b(eta)
Fixes from final testing
Includes handling of "." in replicon name
Add licensing information to all scripts
One more post analysis script added (for Gubbins recombination analysis)
Update parseGenContent.py (P/A matrix based on cover and depth)
V1beta.1
fix for parseSNPTable - reported position of snp in non-coding feature
V1beta.2
fix for parseSNPTable - improved speed of reading and parsing snp table
V1beta.3
changed fasttree to raxml
added option to stop tree generation, or force tree if > 200 isolates
V1beta.4
added simple check for correct BAM generation
added checkpoint for consensus calling
V1beta.5
added further filter of SNPS in finalFilter
V1beta.6
changed back to FastTree - precision error in RAxML -m ASC_GTRCAT
changed maximum isolates for tree to 500
changed checkBam to pass BAMs from simulated reads
V1beta.7
fixed bug in quality filtering of variant calls
V1beta.8
tutorial update
V1beta.9
local system update
manual update

Previous Home Next