Releases · fulcrumgenomics/fgbio

03 Jan 23:24

nh13

2.2.1

7a2ed42

Release 2.2.1 Latest

Latest

What's Changed

Allow the PairedAssigner to use UMI pairs where one is absent by @clintval in #954

Full Changelog: 2.2.0...2.2.1

Contributors

clintval

Assets 3

08 Dec 22:55

tfenne

2.2.0

c668e81

Release 2.2.0

What's Changed

New Features

Duplicate marking in GroupReadsByUmi by @tfenne in #940 - GroupReadsByUmi can now optionally also set the pcr_duplicate flag field on all reads while duplicate marking. If duplicate marking mode is engaged then by default secondary, supplementary and mapq=0 reads are passed through to the output BAM
Addition of threading in GroupReadsByUmi and some other performance optimizations by @tfenne in #950 - threading is designed to help in the specific case where there are very large numbers of UMIs present on reads with the same coordinates (e.g. multiplex PCR with UMIs)
Add optional validation of kept read ratio to CorrectUmis by @mjhipp in #917
feat: allow TrimFastq to specify a length per input FASTQ by @nh13 in #928
feat: add an option to store sample base qualities in the QT for FastqToBam by @nh13 in #933
adds a barcode option to FastqToBam by @bwlang in #936

Bug Fixes

Ensure FilterSomaticVcf handles PASS variants correctly by @clintval in #909
Fix complement of W and S iupac codes. by @tfenne in #912
Fix pass-QC in output FASTQ read names by @nh13 in #923
bugfix: ZipperBams should consume any remaining mapped reads by @nh13 in #929

Other Changes

Update ZipperBams to state the sort is checked in the SAM header by @nh13 in #894
Improve docs for consensus reads being unaligned by @nh13 in #897
Fix typos in alignment by @nh13 in #914
Fix Alignment test by @jacarey in #913
Update intel gkl to 0.8.10 by @nh13 in #918
Update broad snapshot artifactory url in build.sbt by @mjhipp in #925
Fix link in DemuxFastqs.scala by @PeteHaitch in #938
Suggest fqtk in DemuxFastqs by @nh13 in #939
Fix reference to transient MI tag in DuplexConsensusCaller by @clintval in #946

New Contributors

@bwlang made their first contribution in #936
@PeteHaitch made their first contribution in #938

Full Changelog: 2.1.0...2.2.0

Contributors

bwlang, nh13, and 5 other contributors

Assets 3

05 Jan 22:51

tfenne

2.1.0

00c9e2c

Release 2.1.0

Minor release with mostly bug-fixes and one new tool.

New Tools

DownsampleAndNormalizeBam - performs semi-random downsampling to reduce coverage towards a specified target coverage while retaining reads in low-coverage areas. See #893

Bug Fixes

Fix overly aggressive overlap-clipping regressions in ClipBam by @jrm5100 in #850
ReviewConsensusVariants should not require grouped raw reads to by @nh13 in #860
Fix a bug where consensus reads are produced with zero depth by @nh13 in #859
Fix for nasty corner case described in issue #858 in CallDuplexConsensusReads by @nh13 in #864
Tweak the size of caches for parallel consensus calling down to reduce memory usage by @tfenne in #881

Minor Changes

Added ncRNA biotype to NCBI RefSeq GFF parser. by @tfenne in #854

Other

Adding a code of conduct by @nh13 in #848

Full Changelog: 2.0.2...2.1.0

Contributors

nh13, tfenne, and jrm5100

Assets 3

24 May 23:05

nh13

2.0.2

0f0a918

Release 2.0.2

Minor release with bug fixes and minor changes. If you use the 2.x version ClipBam, CallMolecularConsensusReads or CallDuplexConsensusReads, please upgrade to this version.

Bug fixes

SamRecordClipper.clipOverlappingReads now accounts for soft-clipped bases starting before the ends (#842) by @jrm5100. This affects ClipBam and consensus calling tools (CallMolecularConsensusReads and CallDuplexConsensusReads). This bug was introduced in #761 and in 2.0 release.

Minor Changes

Add a missing param to constructor of StreamingPileupBuilder via apply() (#845) by @clintval .
Update scala-xml to a much more recent version and drop the collections-compat requirement we no longer need (#838) by @tfenne.
Ensure SamWriter always logs how many it wrote before close() (#829) by @clintval .

Contributors

tfenne, jrm5100, and clintval

Assets 3

18 Apr 21:42

nh13

2.0.1

43365e2

Release 2.0.1

Minor release with bug fixes.

Please upgrade in particular if you use either CallMolecularConsensusReads or CallDuplexConsensusReads.

Minor Changes

Bug fixes in the OverlappingBasesConsensusCaller (introduced in 2.0.0), which apply to the tools CallMolecularConsensusReads, CallDuplexConsensusReads, and CallOverlappingConsensusBases. Fixes:

A case when the alignments for a read and its mate overlap but share no _mapped_bases by @nh13 in #824.
Logging the number of bases examined and corrected for overlapping bases in the overlapping consensus caller by @nh13 in #825.
See issue #821 for more discussion on the above.

Thank-you to @blackbeerd for providing the initial report and test cases to debug!

Full Changelog: 2.0.0...2.0.1

Contributors

nh13 and blackbeerd

Assets 3

04 Apr 22:09

nh13

2.0.0

9854cee

Release 2.0.0

Overview

This is the second major release of fgbio. A lot has changed in this release, including a significant number of backward incompatible changes to tools.

A major theme of this release is performance of the UMI-related tools. The consensus callers now have options to parallelize using --threads options as well as some internal optimizations. Sorting of data has been eliminated in many places (more on this below). And a new tool (ZipperBams) has been added as a much lighter weight and therefore faster alternative to picard MergeBamAlignment.

A best practices document has been drafted to show the recommended way to go from FASTQ files through to sorted and filtered consensus BAMs.

Major Changes

Major performance improvements in CallMolecularConsensusReads and CallDuplexConsensusReads by i) adding an optimized path for creating a "consensus" from a single read and ii) enabling efficient parallelization in #776 and #790
New tool ZipperBams, which is a replacement for picard's MergeBamAlignment by @tfenne in #778. ZipperBams handles any query-grouped BAM files and does not require sorting of the input or output.
Make GroupReadsByUmi more permissive in the alignments it accepts by @tfenne in #768. Starting with this release GroupReadsByUmi will accept inter-chromosomal read-pairs by default, the --min-map-q parameter has had its default changed from 30 to 1, and read-pairs with one mapped and one unmapped reads are also accepted.
GroupReadsByUmi can be run with no internal sorting if the input is already in TemplateCoordinate order by @nh13 in #794. This can be achieved using either fgbio SortBam or a template-coordinate sort in a forthcoming release of samtools.
New tool CallOverlappingConsensusBases to consensus call overlapping bases in paired end reads. Adds direct support in the consensus calling tools (CallMolecularConsensusReads and CallDuplexConsensusReads) too. By @nh13 #805

Backward Incompatibilities

Change default sort orders of consensus callers by @nh13 in #781. Now, by default, consensus callers will emit reads in the same order they are read in and perform no sorting. Sorting of the output is available, but is opt-in.
Specify an output sort order in FilterConsensusReads by @nh13 in #782. Previously FilterConsensusReads would always sort its output into coordinate order. The new behaviour is to emit reads in the same order as the input, with sorting being opt-in via the --sort-order option.
Require template sort orders in ClipBam and FilterConsensusReads by @nh13 in #807. Previously ClipBam and FilterConsensusReads would sort their input if it was neither queryname sorted nor query-grouped. This behaviour was surprising to many users and led to extended runtimes. The tools now require the input BAM be either queryname-sorted of query-grouped and will fail fast if they are not. Output sorting is still available, but the default is to emit reads in the same order as the input.
Both ClipBam and FilterConsensusReads require the reference to be full loaded into memory, versus previously iterating contig-by-contig by @nh13 in #807. This is required as both tools modify the bases and alignment and so need to update the NM/UQ/MD SAM tags (e.g. NM/UQ/MD). ClipBam also needs to update mate information (SAM flag) depending on if reads are fully clipped. Therefore the JVM heap size may need to be increased to fit the full reference in memory (e.g. -Xmx8g for a human genome).

Minor Changes

Add a tool to copy the UMI from the read name by @nh13 in #775
Add the --annotate-all option to AssignPrimers by @nh13 in #669
Added ability for FastqToBam to also extract UMIs from read names. by @tfenne in #800
Bugfix for "ConsensusCallingIterator could fail when no consensus reads are called" by @tfenne in #780
Change default validation stringency to SILENT and make common option… by @tfenne in #793
Do not return zero-length alignments by @nh13 in #552
More ergonomic methods for converting between HTSJDK and fgbio SequenceDictionary objects by @tfenne in #767
Reduce memory usage by GroupReadsByUmi in a corner case by @tfenne in #774
Support for clipping reads that extend past their mate by @nh13 in #761
Updates version of snappy to support Apple Silicon by @tfenne in #772
Fixes a bug where VcfWriter was not writing VCF index files by @clintval #816
Improved documentation of LogProbability methods by @wmchad #817
Make SamWriter stop checking sort order when emitting pre-sorted records by @tfenne #820

Full Changelog: 1.5.1...2.0.0

Contributors

nh13, tfenne, and 2 other contributors

Assets 3

11 Mar 21:29

tfenne

2.0.0-beta1

ac26e7b

v2.0.0-beta1 Pre-release

Pre-release

Overview

This is the first beta for the fgbio 2.0.0 release. A lot has changed in this release, including a significant number of backward incompatible changes to tools. This release is not being pushed to maven central (for use as a library) or to bioconda, and is only available as a download here, or by building from source.

A best practices document has been drafted to show the recommended way to go from FASTQ files through to sorted and filtered consensus BAMs.

Major Changes

Major performance improvements in CallMolecularConsensusReads and CallDuplexConsensusReads by i) adding an optimized path for creating a "consensus" from a single read and ii) enabling efficient parallelization in #776 and #790
New tool ZipperBams, which is a replacement for picard's MergeBamAlignment by @tfenne in #778. ZipperBams handles any query-grouped BAM files and does not require sorting of the input or output.
Make GroupReadsByUmi more permissive in the alignments it accepts by @tfenne in #768. Starting with this release GroupReadsByUmi will accept inter-chromosomal read-pairs by default, the --min-map-q parameter has had its default changed from 30 to 1, and read-pairs with one mapped and one unmapped reads are also accepted.
GroupReadsByUmi can be run with no internal sorting if the input is already in TemplateCoordinate order by @nh13 in #794. This can be acheived using either fgbio SortBam or a template-coordinate sort in a forthcoming release of samtools.

Backward Incompatibilities

Change default sort orders of consensus callers by @nh13 in #781. Now, by default, consensus callers will emit reads in the same order they are read in and perform no sorting. Sorting of the output is available, but is opt-in.
Specify an output sort order in FilterConsensusReads by @nh13 in #782. Previously FilterConsensusReads would always sort its output into coordinate order. The new behaviour is to emit reads in the same order as the input, with sorting being opt-in via the --sort-order option.
Require template sort orders in ClipBam and FilterConsensusReads by @nh13 in #807. Previously ClipBam and FilterConsensusReads would sort their input if it was neither queryname sorted nor query-grouped. This behaviour was surprising to many users and led to extended runtimes. The tools now require the input BAM be either queryname-sorted of query-grouped and will fail fast if they are not. Output sorting is still available, but the default is to emit reads in the same order as the input.
Both ClipBam and FilterConsensusReads require the reference to be full loaded into memory, versus previously iterating contig-by-contig by @nh13 in #807. This is required as both tools modify the bases and alignment and so need to update the NM/UQ/MD SAM tags (e.g. NM/UQ/MD). ClipBam also needs to update mate information (SAM flag) depending on if reads are fully clipped. Therefore the JVM heap size may need to be increased to fit the full reference in memory (e.g. -Xmx8g for a human genome).

Minor Changes

Add a tool to copy the UMI from the read name by @nh13 in #775
Add the --annotate-all option to AssignPrimers by @nh13 in #669
Added ability for FastqToBam to also extract UMIs from read names. by @tfenne in #800
Bugfix for "ConsensusCallingIterator could fail when no consensus reads are called" by @tfenne in #780
Change default validation stringency to SILENT and make common option… by @tfenne in #793
Do not return zero-length alignments by @nh13 in #552
More ergonomic methods for converting between HTSJDK and fgbio SequenceDictionary objects. by @tfenne in #767
Reduce memory usage by GroupReadsByUmi in a corner case by @tfenne in #774
Support for clipping reads that extend past their mate by @nh13 in #761
Updates version of snappy to support Apple Silicon by @tfenne in #772

Full Changelog: 1.5.1...2.0.0-beta1

Contributors

nh13 and tfenne

Assets 3

15 Feb 18:19

nh13

1.5.1

4003822

Release 1.5.1

Minor release.

New tools in this release:

Added a SortSequenceDictionary tool to re-sort a sequence dictionary #769. This is useful for tools that perform contig renaming.

Updates to tools in this release:

Speed up FilterSomaticVcf by using a fast coordinate streaming pileup builder #763

Updates to the docs:

Improve the description of the number of values in command line args #766

Assets 3

11 Jan 07:25

nh13

1.5.0

32f71c5

Release 1.5.0

Major security bug:

Forcing log4j transitive dependency (through GKL) to version that doesn't have zero day exploit (#747 and #751)
See CVE-2021-44228.

Updates to tools in this release:

AnnotateBamWithUmis
- Should ignore extra FASTQ records with --sorted (#735)
- Optionally annotate UMI base qualities (#733)
- Fix a bug where molecular barcodes be truncated. This only occurs
  with read structures that have either no molecular barcodes or two or more
  molecular barcodes (#742).
- Add support for multiple input FASTQs (#657)
PickIlluminaIndices to choose from an existing set of candidates (#641)
FastqToBam can output UMI qualities (#740)
FilterSomaticVcf adds the end repair artifact filter (#677)

Updates to APIs in this release:

Add better error messages for malformed input to Metric classes (#755)
Log the last record when sorting and writing SAM/BAM (#650)
Removed the IterableThreadLocal class and use the one in commons (#730)
Add queryname sorted SamRecord and Template iterators (#516)
Allow VcfWriter to write to file links, devices, and named pipes (#753)
Update Intel GKL to 0.8.8 to pull in bug fixes (#676)
Speed up property access on Cigar case class (#754)
Skip empty lines at end of sample sheet when parsing sample data (#737)
Updates the commons dependency to 1.3.0, to include a bug fix (fulcrumgenomics/commons#74)

Thank-you to existing and new contributors:

Fulcrum Genomics:
- Tim Fennell (@tfenne)
- Nils Homer (@nh13)
- Kari Stromhaug (@kstromhaug)
Twinstrand Biosciences:
- Clint Valentine (@clintval)
- Michael Hipp (@mjhipp)
- Thomas Smith (@ThomasHSmith)
Outside contribtors
- Jordi (@Poshi)

And thank-you to the users!

Contributors

nh13, tfenne, and 5 other contributors

Assets 3

19 Oct 20:32

nh13

1.4.0

c2a8bd3

Release 1.4.0

Important: Scala 2.12 cross-build support has been removed. (#614)

New tools in this release:

FixVcfPhaseSet: - Add a tool to fix the VcfPhaseSet (#612)

Updates to tools in this release:

SplitBam: Add an option to reduce memory usage if the input has many read groups (#622)
EstimatePoolingFractions: exclude sites at min coverage (#638)
EstimatePoolingFractions: use GT.AF for per-sample allele frequencies (#637)
MakeMixtureVcf: make more tolerant of fractions that don't add up to 1 because floating point math is hard with lots of samples. (#640)
GroupReadsByUmi: optionally allows inter-contig pairs (#648)
AnnotateBamWithUmis: Support a read structure for the FASTQ (#670)
TrimPrimers: can trim only R1s (#681)
CollectDuplexSeqMetrics: type in the usage (#691)
CollectDuplexSeqMetrics: add a plot for duplex yield (#692)
DemuxFastqs: remove the erroneous mention of --sample-sheet (#658)
CorrectUmis: add a cache (#702)
DemuxFastqs: Add an option to to insert sample barcodes in the FASTQ header (#711)
- Added --omit-fastq-read-numbers to skip appending the trailing /1 and /2 to the output FASTQs.
- Added --include-sample-barcodes-in-fastq to replace the last field in the first comment in the FASTQ header.
- Added --illumina-file-names to name output FASTQs according to Illumina filename conventions
- Deprecated --illumina-standards option in favor of the three options above
- Added --platform option to specify the sequencing platform in the BAM read group header. Input FASTQ header must conform to Illumina standards when adding the sample barcode above
DemuxFastqs: Add an option to filter reads on the header filter flag (#713)
Added the option --omit-failing-reads to only output reads marked as passing in the FASTQ header comments. replaced with N's.
DemuxFastqs: Adding option to filter on the internal control flag, and accompanying tests (#714)
- Added --omit-control-reads to omit any reads marked as control in the FASTQ read header comment.
DemuxFastqs: Add an option to mask bases below a specified quality threshold (#716)
- Added --quality-threshold to specify a threshold to use for masking bases. Bases with a quality score below the threshold are
ErrorRateByReadPosition: Improve error message when no reference fasta .dict is provided (#728)
DemuxFastqs: Add metrics on base quality to the sample barcode metrics output (#720)
AnnotateBamWithUmis: Option to indicated sorted FASTQ to add UMIs more quickly (#729)

Updates to APIs in this release:

Updates to make the VCF api code considerably faster when reading VCFs with may samples (#609)
Have Metric classes correctly serialize EnumEntry fields to string (#601)
Add a brief description to AssignPrimersMetric (#616)
Support assembling JAR files with Java 11 (#645)
SampleSheet checks ID unique between samples with/without Lane (#684)
Log the last progress in Bams.queryGroupedIterator (#700)
Validate that a Variant and its Genotypes have the same alleles (#703)
Add "biotype" to Gene and update NcbiRefSeqParser to support more gene biotypes (#706)
Updates how NCBI RefSeq GFFs are parsed to enable parsing of genes that do not have canonical transcript entries below them (#706).
Add methods to make a Variant locatable (#699)
GenomicRange to support contig names with colons (#708)
Add helpers for mateCigar and matesOverlap on SamRecord (#717)
Resolve bug where empty string fields in Metric files would yield ':none:' values in the case class (#724).
Unify and add caching to the way Metric class names are accessed (#724)
Adding one more gene biotype for SRP_RNA. (#726)

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Features

Bug Fixes

Other Changes

New Contributors

Contributors

New Tools

Bug Fixes

Minor Changes

Other

Contributors

Bug fixes

Minor Changes

Contributors

Minor Changes

Contributors

Overview

Major Changes

Backward Incompatibilities

Minor Changes

Contributors

Overview

Major Changes

Backward Incompatibilities

Minor Changes

Contributors

Contributors

Releases: fulcrumgenomics/fgbio

Release 2.2.1

What's Changed

Contributors

Release 2.2.0

What's Changed

New Features

Bug Fixes

Other Changes

New Contributors

Contributors

Release 2.1.0

New Tools

Bug Fixes

Minor Changes

Other

Contributors

Release 2.0.2

Bug fixes

Minor Changes

Contributors

Release 2.0.1

Minor Changes

Contributors

Release 2.0.0

Overview

Major Changes

Backward Incompatibilities

Minor Changes

Contributors

v2.0.0-beta1

Overview

Major Changes

Backward Incompatibilities

Minor Changes

Contributors

Release 1.5.1

Release 1.5.0

Contributors

Release 1.4.0