Skip to content

Releases: nextstrain/augur

24.4.0

15 May 23:20
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • All commands: Allow repeating an option that takes multiple values. Previously, if multiple option flags were specified (e.g. --exclude-where 'region=A' --exclude-where 'region=B'), only the last one was used. Now, all values are used. #1445 (@victorlin)
  • ancestral, translate: output node data files are now validated. The argument --validation-mode is added which controls this behaviour (default: error). This argument also controls validation of the input node-data file (ancestral only). #1440 (@jameshadfield)
  • export: Updated default latitudes and longitudes for geography traits. This only applies if you are not using --lat-longs to override the built in mappings. #1449 (@trvrb)

Bug Fixes

  • validation: we no longer exit with a non-zero exit code when the requested validation mode is "warn" #1440 (@jameshadfield)
  • validation: we no longer perform any validation when the requested validation mode is "skip" #1440 (@jameshadfield)
  • filter: Send all log messages to stderr. This allows output to be written to stdout (e.g. --output-strains /dev/stdout). #1459 (@victorlin)

24.3.0

18 Mar 17:20
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • filter: Added a new option --max-length to filter out sequences that are longer than a certain amount of base pairs. #1429 (@victorlin)
  • parse: Added support for environments that use pandas 2.x. #1436 (@emollier, @victorlin)

Bug Fixes

  • filter: Updated docs with an example of tiered subsampling. #1425 (@victorlin)
  • export: Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with nuc, e.g. nucleocapsid. #1434 (@corneliusroemer)
  • import: Fixes bug introduced in v24.2.0 that prevented import beast from running. #1439 (@tomkinsc)
  • translate, ancestral: Compound CDS are now exported as segmented CDS and are now viewable in Auspice. #1438 (@jameshadfield)

24.2.3

23 Feb 22:08
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • filter: Updated the help and report text of --min-length to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521)
  • frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented --regions from working when providing regions other than the default "global" region. #1424

24.2.2

16 Feb 22:58
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • filter: In versions 24.2.0 and 24.2.1, --query stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin)
  • filter: Handle backtick quoting in internal optimizations of --query. #1417 (@victorlin)

24.2.1

14 Feb 00:36
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • frequencies: Fixed a bug introduced in 24.2.0 that prevented --method diffusion from working alongside --tree. #1412 (@victorlin)

24.2.0

12 Feb 21:07
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • filter: Added a new option --query-columns that allows specifying what columns are used in --query along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin)
  • augur.io.read_metadata: A new optional columns argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)
  • augur parse: A new optional --output-id-field argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)
    • When no --output-id-field is given and the data has both name and strain fields, continue to preferentially use name over strain as the sequence ID field; but, throw a deprecation warning that the order will be switched to prefer strain over name in the future to be consistent with the rest of Augur.
    • Added entry to DEPRECATED.md.
  • Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)

Bug Fixes

  • filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as None. #1410 (@victorlin)
  • filter: The order of rows in --output-metadata and --output-strains now reflects the order in the original --metadata. #1294 (@victorlin)
  • filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
    • For filter, this comes with increased writing times for --output-metadata and --output-strains. However, net I/O speed still decreased during testing of this change.
  • filter: Updated the help text of --include and --include-where to explicitly state that this can add strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Updated wording of summary messages. #1389 (@victorlin)
  • Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)

24.1.0

30 Jan 20:56
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • augur.io.read_metadata: A new optional dtype argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. #1252 (@victorlin)
  • augur.io.read_vcf has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. #1366 (@jameshadfield)

Bug Fixes

  • filter, frequencies, refine: Speed up reading of the metadata file. #1252 (@victorlin)
  • traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. #1252 (@victorlin)
  • Support Biopython ≥1.82 by requiring bcbio-gff ≥0.7.1. #1400 (@victorlin)

24.0.0

22 Jan 23:25
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Major Changes

  • ancestral, translate: For VCF inputs please ensure you are using TreeTime 0.11.2 or later. A large number of bugfixes and improvements have been added in both Augur and TreeTime. #1355 and TreeTime #263 (@jameshadfield)
  • ancestral, translate: GenBank files now require the (GFF mandatory) source feature to be present. #1351 (@jameshadfield)
  • ancestral, translate: For GFF files, we extract the genome/sequence coordinates by inspecting the sequence-region pragma, region type and/or source type. This information is now required. #1351 (@jameshadfield)

Features

  • ancestral, translate: Improvements to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
    • Output VCF will better match the input VCF, including CHROM name and ploidy encoding.
    • VCF inputs now require --vcf-reference-output
    • AA sequences are now exported for the tree root
    • VCF writing is now 3 orders of magnitude faster (dataset dependent)
  • ancestral, translate: A range of improvements to how we parse GFF and GenBank reference files. #1351 (@jameshadfield)
    • translate will now always export a 'nuc' annotation in the output JSON, allowing it to pass validation
    • Gene/CDS names of 'nuc' are now forbidden.
    • If a Gene/CDS in the GFF/GenBank file is unparsed we now print a warning.
  • ancestral: For VCF alignments, a VCF output file is now only created when requested via --output-vcf. #1344 (@jameshadfield)
  • ancestral: Improvements to command line arguments. #1344 (@jameshadfield)
    • Incompatible arguments are now checked, especially related to VCF vs FASTA inputs.
    • --vcf-reference and --root-sequence are now mutually exclusive.
  • translate: Tree nodes are checked against the node-data JSON input to ensure sequences are present. #1348 (@jameshadfield)
  • utils::load_features: This function may now raise AugurError. #1351 (@jameshadfield)
  • export v2: Automatically minify large outputs. Use --no-minify-json to disable this default behavior. #1352 (@victorlin)
  • Added a new file DEPRECATED.md to document timelines and progress of deprecated features in the Augur CLI and Python API. #1371 (@victorlin)

Bug Fixes

  • ancestral, translate: Various fixes to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
    • Fix incorrect (but passing) tests
    • Fix case-sensitive sequence comparisons between the root and reference sequences.
    • Fix a bug where ambiguous alleles are not inferred (see #1380 for full details).
    • Fix a bug where positions with no sequence information were assigned a base because the mask was not being computed (see #1382 for full details).
    • More than one ALT allele is now correctly parsed
    • Mutations followed by an insertion are now parsed
    • Unchanged ref genotypes are now encoded as '0' rather than '.'
    • ALT alleles "*" are now valid (introduced in VCF spec 4.2, but observed in VCF 4.1 files)
    • Positions with no variation are no longer exported
  • ancestral, translate: Fixes for JSON (non-VCF) inputs. #1355 (@jameshadfield)
    • The "reference" translations are now from the provided reference sequence, not from the root of the tree. #1355 (@jameshadfield)
    • Fix a bug where positions with no sequence information were assigned a base because the mask was not applied (see #1382 for full details)
  • ancestral, translate: Avoid incompatibilities with Biopython >=1.82. #1374, #1387 (@victorlin)
  • ancestral, translate: Address Biopython deprecation warnings. #1379 (@victorlin)
  • ancestral: Previously, the help text for --genes falsely claimed that it could accept a file. Now, it can truly claim that. #1353 (@victorlin)
  • translate: The 'source' ID for GFF files is now ignored as a potential gene feature (it is still used for overall nuc coords). #1348 (@jameshadfield)
  • translate: Improvements to command line arguments. #1348 (@jameshadfield)
    • --tree and --ancestral-sequences are now required arguments.
    • separate VCF-only arguments into their own group
  • translate: Fixes a bug in the parsing behaviour of GFF files whereby the presence of the --genes command line argument would change how we read individual GFF lines. Issue #1349, PR #1351 (@jameshadfield)
  • If TreeTimeError is encountered Augur now exits with code 2 rather than 0. (This restores the original behaviour.) #1367 (@jameshadfield)
  • Deprecate read_strains from augur.utils and add it to the public API under augur.io. #1353 (@victorlin)

23.1.1

07 Nov 21:42
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

23.1.0

22 Sep 16:44
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • Support treetime 0.11.* #1310 (@corneliusroemer)
  • export: Allow minimal export using only a (newick) tree in augur export v2. #1299 (@jameshadfield)
  • A number of schema updates and improvements #1299 (@jameshadfield)
    • We now require all nodes to have node_attrs on them with one of div or num_date present
    • Some never-used properties are removed from the schemas, including a pattern for defining nucleotide INDELs which was never used by augur or auspice.
    • Tip label defaults are now settable within the auspice-config JSON
    • Empty colorings definitions are allowed (the tree will be grey in Auspice)

Bug fixes

  • ancestral: Export amino acid sequences inferred for the root node of the tree in the node data JSON output for compatibility with augur translate output. #1317 (@huddlej)