Releases: nextstrain/augur
Releases · nextstrain/augur
24.4.0
These release notes are automatically extracted from the full changelog.
Features
- All commands: Allow repeating an option that takes multiple values. Previously, if multiple option flags were specified (e.g.
--exclude-where 'region=A' --exclude-where 'region=B'
), only the last one was used. Now, all values are used. #1445 (@victorlin) - ancestral, translate: output node data files are now validated. The argument
--validation-mode
is added which controls this behaviour (default: error). This argument also controls validation of the input node-data file (ancestral only). #1440 (@jameshadfield) - export: Updated default latitudes and longitudes for geography traits. This only applies if you are not using
--lat-longs
to override the built in mappings. #1449 (@trvrb)
Bug Fixes
- validation: we no longer exit with a non-zero exit code when the requested validation mode is "warn" #1440 (@jameshadfield)
- validation: we no longer perform any validation when the requested validation mode is "skip" #1440 (@jameshadfield)
- filter: Send all log messages to
stderr
. This allows output to be written tostdout
(e.g.--output-strains /dev/stdout
). #1459 (@victorlin)
24.3.0
These release notes are automatically extracted from the full changelog.
Features
- filter: Added a new option
--max-length
to filter out sequences that are longer than a certain amount of base pairs. #1429 (@victorlin) - parse: Added support for environments that use pandas 2.x. #1436 (@emollier, @victorlin)
Bug Fixes
- filter: Updated docs with an example of tiered subsampling. #1425 (@victorlin)
- export: Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with
nuc
, e.g.nucleocapsid
. #1434 (@corneliusroemer) - import: Fixes bug introduced in v24.2.0 that prevented
import beast
from running. #1439 (@tomkinsc) - translate, ancestral: Compound CDS are now exported as segmented CDS and are now viewable in Auspice. #1438 (@jameshadfield)
24.2.3
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: Updated the help and report text of
--min-length
to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521) - frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented
--regions
from working when providing regions other than the default "global" region. #1424
24.2.2
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: In versions 24.2.0 and 24.2.1,
--query
stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin) - filter: Handle backtick quoting in internal optimizations of
--query
. #1417 (@victorlin)
24.2.1
These release notes are automatically extracted from the full changelog.
Bug Fixes
- frequencies: Fixed a bug introduced in 24.2.0 that prevented
--method diffusion
from working alongside--tree
. #1412 (@victorlin)
24.2.0
These release notes are automatically extracted from the full changelog.
Features
- filter: Added a new option
--query-columns
that allows specifying what columns are used in--query
along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin) augur.io.read_metadata
: A new optionalcolumns
argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)augur parse
: A new optional--output-id-field
argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)- When no
--output-id-field
is given and the data has bothname
andstrain
fields, continue to preferentially usename
overstrain
as the sequence ID field; but, throw a deprecation warning that the order will be switched to preferstrain
overname
in the future to be consistent with the rest of Augur. - Added entry to DEPRECATED.md.
- When no
- Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)
Bug Fixes
- filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as
None
. #1410 (@victorlin) - filter: The order of rows in
--output-metadata
and--output-strains
now reflects the order in the original--metadata
. #1294 (@victorlin) - filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
- For filter, this comes with increased writing times for
--output-metadata
and--output-strains
. However, net I/O speed still decreased during testing of this change.
- For filter, this comes with increased writing times for
- filter: Updated the help text of
--include
and--include-where
to explicitly state that this can add strains that are missing an entry from--sequences
. #1389 (@victorlin) - filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from
--sequences
. #1389 (@victorlin) - filter: Updated wording of summary messages. #1389 (@victorlin)
- Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)
24.1.0
These release notes are automatically extracted from the full changelog.
Features
augur.io.read_metadata
: A new optionaldtype
argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. #1252 (@victorlin)augur.io.read_vcf
has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. #1366 (@jameshadfield)
Bug Fixes
- filter, frequencies, refine: Speed up reading of the metadata file. #1252 (@victorlin)
- traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. #1252 (@victorlin)
- Support Biopython
≥1.82
by requiring bcbio-gff≥0.7.1
. #1400 (@victorlin)
24.0.0
These release notes are automatically extracted from the full changelog.
Major Changes
- ancestral, translate: For VCF inputs please ensure you are using TreeTime 0.11.2 or later. A large number of bugfixes and improvements have been added in both Augur and TreeTime. #1355 and TreeTime #263 (@jameshadfield)
- ancestral, translate: GenBank files now require the (GFF mandatory) source feature to be present. #1351 (@jameshadfield)
- ancestral, translate: For GFF files, we extract the genome/sequence coordinates by inspecting the sequence-region pragma, region type and/or source type. This information is now required. #1351 (@jameshadfield)
Features
- ancestral, translate: Improvements to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Output VCF will better match the input VCF, including CHROM name and ploidy encoding.
- VCF inputs now require
--vcf-reference-output
- AA sequences are now exported for the tree root
- VCF writing is now 3 orders of magnitude faster (dataset dependent)
- ancestral, translate: A range of improvements to how we parse GFF and GenBank reference files. #1351 (@jameshadfield)
- translate will now always export a 'nuc' annotation in the output JSON, allowing it to pass validation
- Gene/CDS names of 'nuc' are now forbidden.
- If a Gene/CDS in the GFF/GenBank file is unparsed we now print a warning.
- ancestral: For VCF alignments, a VCF output file is now only created when requested via
--output-vcf
. #1344 (@jameshadfield) - ancestral: Improvements to command line arguments. #1344 (@jameshadfield)
- Incompatible arguments are now checked, especially related to VCF vs FASTA inputs.
--vcf-reference
and--root-sequence
are now mutually exclusive.
- translate: Tree nodes are checked against the node-data JSON input to ensure sequences are present. #1348 (@jameshadfield)
- utils::load_features: This function may now raise
AugurError
. #1351 (@jameshadfield) - export v2: Automatically minify large outputs. Use
--no-minify-json
to disable this default behavior. #1352 (@victorlin) - Added a new file DEPRECATED.md to document timelines and progress of deprecated features in the Augur CLI and Python API. #1371 (@victorlin)
Bug Fixes
- ancestral, translate: Various fixes to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Fix incorrect (but passing) tests
- Fix case-sensitive sequence comparisons between the root and reference sequences.
- Fix a bug where ambiguous alleles are not inferred (see #1380 for full details).
- Fix a bug where positions with no sequence information were assigned a base because the mask was not being computed (see #1382 for full details).
- More than one ALT allele is now correctly parsed
- Mutations followed by an insertion are now parsed
- Unchanged ref genotypes are now encoded as '0' rather than '.'
- ALT alleles "*" are now valid (introduced in VCF spec 4.2, but observed in VCF 4.1 files)
- Positions with no variation are no longer exported
- ancestral, translate: Fixes for JSON (non-VCF) inputs. #1355 (@jameshadfield)
- The "reference" translations are now from the provided reference sequence, not from the root of the tree. #1355 (@jameshadfield)
- Fix a bug where positions with no sequence information were assigned a base because the mask was not applied (see #1382 for full details)
- ancestral, translate: Avoid incompatibilities with Biopython >=1.82. #1374, #1387 (@victorlin)
- ancestral, translate: Address Biopython deprecation warnings. #1379 (@victorlin)
- ancestral: Previously, the help text for
--genes
falsely claimed that it could accept a file. Now, it can truly claim that. #1353 (@victorlin) - translate: The 'source' ID for GFF files is now ignored as a potential gene feature (it is still used for overall nuc coords). #1348 (@jameshadfield)
- translate: Improvements to command line arguments. #1348 (@jameshadfield)
--tree
and--ancestral-sequences
are now required arguments.- separate VCF-only arguments into their own group
- translate: Fixes a bug in the parsing behaviour of GFF files whereby the presence of the
--genes
command line argument would change how we read individual GFF lines. Issue #1349, PR #1351 (@jameshadfield) - If
TreeTimeError
is encountered Augur now exits with code 2 rather than 0. (This restores the original behaviour.) #1367 (@jameshadfield) - Deprecate
read_strains
fromaugur.utils
and add it to the public API underaugur.io
. #1353 (@victorlin)
23.1.1
These release notes are automatically extracted from the full changelog.
Bug Fixes
- Fix Python 3.11 installation for Conda environments. #1334 (@victorlin)
- Bump
pyfastx
dependency to major versions 1 and 2. #1335 (@victorlin)
23.1.0
These release notes are automatically extracted from the full changelog.
Features
- Support treetime 0.11.* #1310 (@corneliusroemer)
- export: Allow minimal export using only a (newick) tree in
augur export v2
. #1299 (@jameshadfield) - A number of schema updates and improvements #1299 (@jameshadfield)
- We now require all nodes to have
node_attrs
on them with one ofdiv
ornum_date
present - Some never-used properties are removed from the schemas, including a pattern for defining nucleotide INDELs which was never used by augur or auspice.
- Tip label defaults are now settable within the auspice-config JSON
- Empty colorings definitions are allowed (the tree will be grey in Auspice)
- We now require all nodes to have