Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Releases: chanzuckerberg/shasta

0.10.0

09 May 20:46
Compare
Choose a tag to compare

Additions and improvements since release 0.9.0

  • Phased diploid assembly improvements result in more sequence assembled diploid, with larger diploid N50:

    • Algorithmic improvements.
    • Bug fixes.
    • New assembly configurations, listed below.
  • New Bayesian model guppy-5.0.7-b improves repeat count calls on homopolymer repeats for reads generated by the Guppy 5 base caller with "super" accuracy. Used in all new assembly configurations listed below.

  • New assembly configurations:

    • For standard nanopore reads at standard coverage (40x to 80x):
      • Haploid assembly: Nanopore-May2022
      • Phased diploid assembly: Nanopore-Phased-May2022
    • For Ultra-Long (UL) nanopore reads at standard coverage (40x to 80x):
      • Haploid assembly: Nanopore-UL-May2022
      • Phased diploid assembly: Nanopore-UL-Phased-May2022
    • Specialized for human assemblies with one flowcell per genome (low coverage, around 30x):
      • Haploid assembly: Nanopore-Human-SingleFlowcell-May2022
      • Phased diploid assembly: Nanopore-Human-SingleFlowcell-Phased-May2022
  • As announced with release 0.9.0, several items of obsolete functionality were removed.

Deprecated platform (macOS) that will be removed

MacOS support (all versions) is deprecated and will be removed soon. It is likely that this is the last Shasta release that includes macOS support. If you would like this platform to continue to be supported, please file an issue on this repository, with motivation.

Platforms

Linux

  • The shasta-Linux-0.10.0 executable will run on most current 64-bit Linux systems that use kernel version 3.2.0 or later. This includes all Ubuntu versions starting at 12.04 plus CentOS 7 and 8.

  • The release includes tar file shasta-Ubuntu-20.04-0.10.0.tar which is a complete Shasta build on Ubuntu 20.04. It will not be needed by most users.

macOS

Two macOS executables are included in this release:

  • shasta-macOS-11-Intel-0.10.0, for macOS 11.0 (Big Sur) on Apple systems that use Intel x86-64 processors.

  • shasta-macOS-11-ARM-0.10.0, for macOS 11.0 (Big Sur) on Apple systems that use Apple ARM processors, including Apple M1 processors.

Windows

As in previous releases, the Linux executable shasta-Linux-0.10.0 can be used on Windows under Windows Subsystem for Linux (WSL).

Linux ARM

The ARM executable, shasta-Linux-ARM-0.10.0, can be used on 64-bit ARM version 8 platforms. It is known to work at least in the following environments:

  • Graviton2 processors on AWS EC2 instances.
  • Raspberry Pi Model 4 running 64-bit Ubuntu 20.04.

It will not work on macOS systems with ARM processors, including Apple M1 processors (use shasta-macOS-11-ARM-0.10.0 instead).

Compatibility

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.10.0 for postprocessing of an assembly done using a previous release.

0.9.0

03 Feb 19:41
Compare
Choose a tag to compare

Additions and improvements since release 0.8.0

  • New implementation of phased assembly (Mode 2) improves the quality of phased assemblies with fewer artifacts (see the documentation for details).

  • Ability to run assemblies without using Run-Length Encoding (RLE) for the reads, especially useful for phased assemblies. RLE encoding is effective for noisy reads, less so for higher-accuracy reads, especially for phased assemblies.

  • New assembly configurations. Use shasta --command listConfiguration --config ??? to list applicability and details of each.

    • Nanopore-UL-Jan2022 (Ultra-Long nanopore reads, haploid assembly).
    • Nanopore-Phased-Jan2022 (nanopore reads, phased diploid assembly).
    • Nanopore-UL-Phased-Jan2022 (Ultra-Long nanopore reads, phased diploid assembly).
  • Many messages useful for performance evaluation previously written to the assembly log output (stdout) are now written to a new file performance.log in the assembly directory, improving the readability of the assembly log.

  • Assembly log output (stdout) is now duplicated to stdout.log in the assembly directory. Use command line option --suppressStdoutLog to suppress this behavior.

  • Several usability improvements and bug fixes.

Deprecated functionality that will be removed

The following functionality is deprecated and will be removed soon. If you would like one of these items to continue to be supported, please file an issue on this repository with motivation.

  • Alignment method 0 (--Align.method 0).

  • Marker graph refinement (--MarkerGraph.refineThreshold).

  • Reverse transitive reduction of the marker graph (--MarkerGraph.reverseTransitiveReduction).

  • Detangle method 1 (--Assembly.detangleMethod 1).

  • Support for macOS 10 (10.15 Catalina and 10.14 Mojave).

  • Ability to build a Shasta AppImage.

Platforms

Linux

  • The shasta-Linux-0.9.0 executable will run on most current 64-bit Linux systems that use kernel version 3.2.0 or later, including all Ubuntu versions starting at 12.04 plus CentOS 7 and 8. It will not run on Linux systems with older kernels, including CentOS 6, which reached the end of support on November 30, 2020.

The release includes tar file shasta-Ubuntu-20.04-0.9.0.tar, a complete Shasta build on Ubuntu 20.04. Most users will not need it.

macOS

This release includes three macOS executables:

  • shasta-macOS-11-Intel-0.9.0, for macOS 11.0 (Big Sur) on Apple systems that use Intel x86-64 processors.

  • shasta-macOS-11-ARM-0.9.0, for macOS 11.0 (Big Sur) on Apple systems that use Apple ARM processors, including Apple M1 processors.

  • shasta-macOS-10-0.9.0, for macOS 10.15 (Catalina). It also runs on macOS 10.14 (Mojave).

Windows

As in previous releases, the Linux executable shasta-Linux-0.9.0 can be used on Windows under Windows Subsystem for Linux (WSL).

Linux ARM

The ARM executable, shasta-Linux-ARM-0.9.0, can be used on 64-bit ARM version 8 platforms. It is known to work at least in the following environments:

  • Graviton2 processors running 64-bit Ubuntu 20.04 on AWS instance types r6g, m6g, c6g, and x2gd.
  • Raspberry Pi Model 4 running 64-bit Ubuntu 20.04.

It will not work on macOS systems with ARM processors, including Apple M1 processors (use shasta-macOS-11-ARM-0.9.0 instead).

Compatibility

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.9.0 for postprocessing of an assembly done using a previous release.

0.8.0

07 Oct 20:45
Compare
Choose a tag to compare

Additions and improvements since release 0.7.0

  • Phased diploid assembly via --Assembly.mode 2. Two assembly configurations to facilitate phased diploid assembly for current Oxford Nanopore reads generated by the Guppy 5 base caller are also provided, and can be invoked via --config Nanopore-Phased-Aug2021 (standard reads) or --config Nanopore-UL-Phased-Oct20211 (Ultra-Long reads). These configurations are tentative and are subject to improvements. Please file an issue on the Shasta GitHub repository to discuss unsuccessful assemblies. The documentation page on computational methods includes a description of mode 2 assembly.

  • The --config option is now mandatory. It can specify a Shasta configuration file, as in previous releases, or one of several built-in configurations available within the Shasta executable. Shasta command shasta --command listConfigurations writes a list of available built-in configuration names. Use shasta --command listConfiguration --config name to see details of a specific built-in configuration. A new documentation page describing functionality related to built-in configurations was added.

  • Alignment method 4 (experimental). Selectable via --Align.alignMethod 4. It can be useful for assembly of centromeres.

  • More flexibility in reading fastq files: characters following the plus sign on the third line for each read are now accepted.

  • Additional configuration for assembly of plant genomes. Invoke via --config Nanopore-Plants-Apr2021. Use shasta --command listConfiguration --config Nanopore-Plants-Apr2021 for details.

  • Bayesian consensus callers for Bonito 0.3.1 and Guppy 5.0.7 base callers.

  • Several usability improvements and bug fixes.

Platforms

Linux

  • The shasta-Linux-0.8.0 executable will run on most current 64-bit Linux systems that use kernel version 3.2.0 or later. This includes all Ubuntu versions starting at 12.04 plus CentOS 7 and 8. It will not run on Linux systems with older kernels, including CentOS 6, which reached end of support on November 30, 2020.

  • The release includes tar file shasta-Ubuntu-20.04-0.8.0.tar which is a complete Shasta build on Ubuntu 20.04. It will not be needed by most users.

macOS

Two macOS executables are included in this release:

  • shasta-macOS-11-0.8.0, for macOS 11.0 (Big Sur). This will only work on Apple systems that use Intel x86-64 processors. Systems with ARM processors, including Apple M1 processors, are not supported.

  • shasta-macOS-10.15-0.8.0, for macOS 10.15 (Catalina). It also runs on macOS 10.14 (Mojave).

Windows

As in previous releases, the Linux executable shasta-Linux-0.8.0 can be used on Windows under Windows Subsystem for Linux (WSL).

ARM

The ARM executable, shasta-Linux-ARM-0.8.0, can be used on 64-bit ARM version 8 platforms. It is known to work at least in the following environments:

  • Graviton2 processors running 64-bit Ubuntu 20.04 on AWS instance types r6g and m6g.
  • Raspberry Pi Model 4 running 64-bit Ubuntu 20.04.

It will not work on macOS systems with ARM processors, including Apple M1 processors.

Compatibility

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.8.0 for postprocessing of an assembly done using a previous release.

0.7.0

01 Dec 22:58
87f5e40
Compare
Choose a tag to compare

Additions and improvements since release 0.6.0

  • New command line options:

    • --Assembly.pruneLength can be used to request a final pruning step on the assembly graph. This is useful with iterative assembly.
  • Various bug fixes, including the following which caused assembly failure or incorrect assembly results:

    • #209: Incorrect CIGAR strings in single-stranded GFA output.
    • #212: Bug in superbubble removal.
    • #213: Assertion during detangling.
  • Usability improvements in the http server.

  • Code cleanup and removal of obsolete code.

  • Platform changes (see below).

Platforms

Linux

  • The shasta-Linux-0.7.0 executable will run on most current 64-bit Linux systems that use kernel version 3.2.0 or later. This includes all Ubuntu versions starting at 12.04 plus CentOS 7 and 8. It will not run on Linux systems with older kernels, including CentOS 6, which reached end of support on November 30, 2020.

macOS

The macOS executable, shasta-macOS-0.7.0, can be used both on macOS 10.14 (Mojave) and macOS 10.15 (Catalina). It will not run on macOS 11.0 (Big Sur).

Windows

As in previous releases, the Linux executable shasta-Linux-0.7.0 can be used on Windows under Windows Subsystem for Linux (WSL).

ARM

The ARM executable, shasta-Linux-ARM-0.7.0, can be used on 64-bit ARM version 8 platforms. It is known to work at least in the following environments:

  • Graviton2 processors running 64-bit Ubuntu 20.04 on AWS instance types r6g and m6g.
  • Raspberry Pi Model 4 running 64-bit Ubuntu 20.04.

Compatibility

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.7.0 for post-processing of an assembly done using a previous release.

0.6.0

06 Oct 15:42
Compare
Choose a tag to compare

Additions and improvements since release 0.5.0

  • Option --ReadGraph.creationMethod 2 activates a more robust way to create the read graph. It uses the statistical distribution of various alignment metrics to select alignment criteria - see the documentation for more details. Use with one of the new sample configuration files (see below). It provides the following benefits:
    • Less sensitive to choice of alignment metric thresholds, as long as those thresholds are chosen in a very permissive way.
    • Improves assembly contiguity and accuracy.
    • Less sensitive to the amount of available coverage. Works well down to 20X coverage using the sample configuration files provided, although assembly contiguity decreases with coverage.
  • Experimental iterative assembly functionality for partial haplotype separation (phased diploid assembly) and improved resolution of long repeats such as segmental duplications. It currently requires Ultra-Long (UL) reads and high coverage, 80X. Use with configuration file Nanopore-UL-iterative-Sep2020.conf.
  • Option --MarkerGraph.minCoverage can now be set to 0 for automatic selection of a reasonable value.
  • Option --MarkerGraph.minCoveragePerStrand can be used to specify a minimum required per-strand coverage (number of supporting reads) for a marker graph vertex to be generated. This can reduce assembly errors due to strand-dependent systematic errors.
  • Option --ReadGraph.desiredCoverage can be used to automatically increase the read length cutoff to reduce coverage to a desired value.
  • Option --Assembly.detangleMethod 2 can be used to select a less conservative detangling method, which is also configurable with various new command line options.
  • Memory optimization results in significant reductions memory requirements. Peak virtual memory usage is now reported at the end of an assembly and in AssemblySummary.html.
  • Support for the ARM platform (see below under Platforms for more information).
  • New script GenerateConfig.py aids in creating a custom configuration file.
  • New script GenerateFeedback.py can be used to assess a completed assembly. When filing a Shasta issue for an unsatisfactory assembly, please include the output of this script plus AssemblySummary.html.
  • Documentation and benchmarks to permit running on machines with less than the ideal amount of memory.
  • New sample configuration files, all of which include --ReadGraph.creationMethod 2. Use with Shasta option --config.
    • Nanopore-Sep2020.conf best currently known parameter set for standard nanopore reads generated by the Guppy base caller version 3.6.0 or later.
    • Nanopore-UL-Sep2020.conf best currently known parameter set for Ultra-Long (UL) nanopore reads generated by the Guppy base caller version 3.6.0 or later.
    • Nanopore-OldGuppy-Sep2020.conf best currently known parameter set for standard nanopore reads generated by the Guppy base caller versions 3.0 through 3.5.
    • Nanopore-UL-iterative-Sep2020.conf experimental configuration file for iterative assembly using high coverage (80X) with Ultra-Long (UL) nanopore reads generated by the Guppy base caller version 3.6.0 or later. Provides partial haplotype separation (phased diploid assembly) and improved resolution of segmental duplications.
  • Usability improvements.
  • Improvements and additions in the HTTP server.
  • Documentation improvements and additions, including significant additions to the page on Shasta computational methods.

Platforms

Linux

  • The shasta-Linux-0.6.0 executable will run on most current 64-bit Linux systems that use kernel version 3.2.0 or later. This includes all Ubuntu versions starting at 12.04 plus CentOS 7 and 8.

  • The shasta-OldLinux-0.6.0 executable will run on most current 64-bit Linux systems that use kernel version 2.6.32 or later. This includes CentOS 6. CentOS 6 reaches end of support on November 30, 2020, and kernel versions older than 3.2.0 are aging and no longer widely used or supported. Therefore, the shasta-OldLinux executable will not be included in future Shasta releases. Future Shasta releases will only run on systems that use Linux kernel 3.2.0 or later. They will not run on older systems, including CentOS 6.

macOS

In contrast with previous Shasta releases, in this release a single macOS executable is provided, shasta-macOS-0.6.0. This executable can be used both on macOS 10.14 (Mojave) and macOS 10.15 (Catalina).

Windows

As in previous releases, the Linux executable shasta-Linux-0.6.0 can be used on Windows under Windows Subsystem for Linux (WSL).

ARM

This Shasta release includes an ARM executable, shasta-Linux-ARM-0.6.0, which can be used on 64-bit ARM version 8 platforms. It is known to work at least in the following environments:

  • Graviton2 processors running 64-bit Ubuntu 20.04 on AWS instance types r6g and m6g.
  • Raspberry Pi Model 4 running 64-bit Ubuntu 20.04.

Compatibility

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.6.0 for postprocessing of an assembly done using a previous release.

0.5.1

23 Jun 15:09
Compare
Choose a tag to compare

This is a bug fix release which addresses issue 157. This issue only affects the shasta-OldLinux-0.5.0 executable and the Ubuntu 16.04 build. If you are using the shasta-Linux-0.5.0 executable or one of the macOS executables you are not affected by this issue and you don't need to upgrade to this release.

This release is compatible with Shasta release 0.5.0. However, it is not compatible with previous Shasta releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.5.0 or 0.5.1 for post-processing of an assembly done using a previous release.

0.5.0

15 Jun 17:46
3a4d81d
Compare
Choose a tag to compare

Changes from release 0.4.0:

  • Performance improvements:​
    • A new method to compute marker alignments via SeqAn banded alignments is now the default and has improved performance and accuracy. To revert to the old marker alignment algorithm used in previous Shasta releases, use --Align.alignMethod 0.
    • Alignments are now stored, in a highly compressed format, so they don't have to be recomputed when creating marker graph vertices using the disjoint set computation.
    • Option --Reads.noCache can be used to bypass the Linux cache when loading reads. This can improve performance in some situations.
    • Several other performance improvements, including upgrading to a new, faster release of the Spoa library. As a combined result of these and the above, a human genome assembly at coverage 60x now takes about 3 hours on a x1.32xlarge AWS instance.
  • New functionality:
    • Option --Assembly.detangle performs basic detangling in the assembly graph and can improve assembly contiguity.
    • Option --Assembly.writeReadsByAssembledSegment can be used to write a csv file containing the reads and orientations that were used to assemble each segment.
    • New options to generate the k-mers to be used as markers.
    • Usability improvements in the Shasta http server, including improved display of a read and its markers. ​
  • New configuration files and Bayesian model for Oxford Nanopore reads created by the Guppy 3.6.0 base caller.​
  • Fixed a long-standing bug in the computation of CIGAR strings in GFA output.​
  • Platform changes:
    • Shasta can now be built on Ubuntu 20.04, in addition to 16.04 and 18.04. As for previous releases, the static executable built on Ubuntu continues to run on most current 64-bit Linux platforms, has no dependencies, and requires no installation.
    • Support for MacOS 10.15 Catalina (both build and run).
    • GPU support was removed. Because of the above improvements in performance, the GPU code was no longer providing performance benefits.
  • Many documentation improvements which make it easier to locate the desired information.
  • Code reorganization and cleanup.

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.5.0 for postprocessing of an assembly done using a previous release. If you plan to build the code yourself and you have done so for previous Shasta releases, make sure to rerun InstallPrerequisites-Ubuntu.sh or InstallPrerequisites-macOS.sh in order to get updated prerequisites.

0.4.0

10 Jan 17:18
Compare
Choose a tag to compare

Changes from release 0.3.0:

  • Shasta can now read uncompressed Fastq files directly without the need to convert to Fasta first. Each read in a Fastq file must be on exactly 4 lines.

  • Restrictions on reading Fasta files have been removed. Input reads can now be on multiple lines. Reads containing no-calls are discarded.

  • GPU acceleration.

  • Sample configuration files for various kinds for Oxford Nanopore and Pacific Biosciences reads at coverage around 60x. These reflect best known parameter choices as of December 2019. Default values of command line options remain mostly compatible with previous releases and are no longer recommended for any specific application.

  • New built-in Bayesian model for nanopore reads generated by the Guppy 3.0.5 base caller. Invoked via --Assembly.consensusCaller Bayesian:guppy-3.0.5-a. The default remains --Assembly.consensusCaller Bayesian:guppy-2.3.5-a.

  • New command line option --MinHash.minBucketSize provides more control of the MinHash/LowHash step. New output file LowHashBucketHistogram.csv can be used to select optimal values of --MinHash.minBucketSize and --MinHash.maxBucketSize.

  • Command line option --MinHash.minHashIterationCount can now be set to zero to adaptively select the number of MinHash iterations. Iteration stops when a specified number of alignment candidates per read
    (controlled by a new option --MinHash.alignmentCandidatesPerRead) is reached.

  • New command line option --Align.sameChannelReadAlignment.suppressDeltaThreshold can be used to suppress alignments between reads originating close in time from the same nanopore channel. This helps eliminate some assembly artifacts caused by some types of pathological reads.

  • New command line option --Align.maxDrift provides more control over the selection of marker alignments.

  • More options to select the k-mers to be used as markers (--Kmers.suppressHighFrequencyMarkers, --Kmers.enrichmentThreshold, --Kmers.file).

  • More flexible build system allows building Shasta as static or dynamic executable, static library (callable from C++), shared library (callable from C++ and Python), or AppImage. The static executable and the AppImage require no installation and run on a variety of Linux platforms.

  • Many usability and documentation improvements.

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.4.0 for postprocessing of an assembly done using a previous release.

Shasta Release 0.3.0

12 Sep 21:44
Compare
Choose a tag to compare

Changes from release 0.2.0:

  • Bash completion feature to simplify typing of Shasta commands - see the documentation under About/Command line options/Saving some typing.

  • Improvements to documentation and some error messages.

  • A new output file ReadSummary.csv summarizing various assembly metrics for each read used in the assembly.

  • A new option to suppress the usage of high frequency k-mers as markers. The effect of this option on assembly results was not tested extensively. This option in not turned on by default.

  • Other minor usability improvements.

  • Some bugs were fixed, including the following. None of the bug fixes affected assembly results.

    • Issue #52: the Linux executable of release 0.2.0, shasta-Linux-0.2.0, had a performance problem that was particularly severe on machines with many virtual processors and was causing assemblies to slow down by as much as a factor of 2.

    • Issues #49, #50: the shasta-Linux-0.2.0 executable did not work on some older Linux kernels. Release 0.3.0 includes a new executable shasta-OldLinux-0.3.0 which works on Linux kernels as old as 2.6, such as those used by CentOS 6.

    • Issue #49: some binary files were always written on disk regardless of --memoryMode and --memoryBacking settings.

This release is not compatible with previous releases. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.3.0 for postprocessing of an assembly done using a previous release.

Shasta Release 0.2.0

20 Aug 17:08
Compare
Choose a tag to compare

Note added 09/12/2019: The Linux version of Shasta Release 0.2.0 has a performance bug that could slow down assemblies by as much as a factor of 2. Please discontinue usage of Release 0.2.0 for large assemblies.

Changes from release 0.1.0:

  • The only algorithmic change is the addition of the Bayesian model for repeat counts, which results in a significant decrease of erroneous indels. This is the new default. To recover 0.1.0 behavior, use command line option --Assembly.consensusCaller Modal. More information is available in the documentation by navigating to About / Computational methods, see the section entitled Assembling repeat counts.

  • Http server functionality is now available directly via the Shasta executable. It allows interactive exploration of assembly results and data structures after an assembly is complete. To use it, run the assembly with --memoryMode filesystem, then run the Shasta assembler with --command explore. More information is available in the documentation by navigating to How to / Explore assembly results.

  • Command line option --threads was added to allow specifying the number of threads to be used. If omitted, Shasta uses a number of threads equal to the number of virtual processors, the same behavior as in release 0.1.0.

  • More informative output files in the assembly directory. See the documentation by navigating to How to / Run an assembly in the section entitled Output files.

  • Functionality to run Shasta under Docker was added. More information is available in the documentation by navigating to How to / Run an assembly in Docker.

  • Many error messages were improved and are now more informative and easier to interpret.

  • Many additions and improvements in the documentation.

  • Some obsolete code was removed. Some code refactoring/restructuring for readability/maintainability.

This release is not compatible with release 0.1.0. There were incompatible changes in some command line option names, the binary formats used, and the Python API. You cannot use release 0.2.0 for postprocessing of an assembly done using release 0.1.0.