Skip to content

Releases: Golob-Minot/geneshot

Enhance tools for corncob analysis

09 Jun 17:31
Compare
Choose a tag to compare

New functionality for statistical analysis using the corncob algorithm includes:

  • Support for testing multiple formulae in parallel, e.g. --formula label1 + label2,label3
  • Utility to run corncob on an existing geneshot results file object (run_corncob.nf)
  • Support for Nextflow version 20.04.1

Add utility to update manifest in results HDF5

29 May 03:42
Compare
Choose a tag to compare

The script update_manifest.nf can be used to change the /manifest table in an HDF5 ouput file. See update_manifest.nf --help for usage instructions.

This release also includes some bugfixes, including one for the addEggnogResults step which is needed for very large datasets.

Also, /annot/gene/eggnog now has fewer columns to save space and memory

Refactor download_sra.nf

22 May 22:52
Compare
Choose a tag to compare

The biggest addition to this release is a refactor to the download_sra.nf utility. In its new form, the process of fetching metadata from the NCBI API is parallelized to help with execution for very large datasets.

The release also includes some bugfixes for corncob, specifically the code to set the buffer size for very large amounts of memory.

Script to run corncob in isolation, and bugfixes

20 May 17:55
Compare
Choose a tag to compare

This minor release includes the following:

  • Added a script to run just corncob on a geneshot results file - run_corncob.nf
  • When a specimen is missing metadata specified in the formula, just mask that sample
  • Dynamically set the buffer size used to read in CAG abundances for corncob
  • Fix the bug in download_sra.nf which was previously causing errors when a single BioProject is associated with multiple numeric IDs in the NCBI database

Added pairwise distances and entropy

05 May 16:08
af67f3d
Compare
Choose a tag to compare

Since v0.4.3:

  • The /annot/gene/all table is now indexed by the column CAG
  • Distance matrices have been added under /distances/<METRIC> for euclidean, aitchison, braycurtis, and jaccard
  • The /annot/cag/all table now includes an entropy column

Fix errors joining eggNOG results

01 May 00:06
Compare
Choose a tag to compare

The only change in this minor release is that one specific bug has been fixed which was causing an error processing datasets with >1M genes. There is no change to functionality or output format with this minor release.

Add metaPhlAn2 results to HDF5

29 Apr 17:34
b34a086
Compare
Choose a tag to compare

This minor release includes two changes from v0.4.1:

  • The metaPhlAn2 results created with --composition are now included in the results HDF5 output file
  • DIAMOND is run twice in the diamond task, once independently for R1 and R2

The motivation in the change to DIAMOND is that extremely large datasets are running up against a memory limit which can be hard to overcome with the default resource allocation. This change should help with extremely large datasets, at some small cost of runtime for smaller datasets.

Disable metaPhlAn2 table join

27 Apr 22:31
Compare
Choose a tag to compare

The only update for v0.4.0 -> v0.4.1 is that the step to create a single merged table with metaPhlAn2 results has been disabled. The current behavior with --composition is that all of the single-sample metaPhlAn2 results will be published to the output directory, but they will not be included in the output HDF5 table.

This minor release fixes a bug which users may have encountered while using the --composition flag, but which did not arise during automated testing.

Added --composition and q-values

25 Apr 00:09
Compare
Choose a tag to compare

This release incorporates a few major bugfixes, as well as some slightly altered functionality.

The only minor change for backwards compatibility is that the /stats/cag/corncob table is now in wide format instead of long. We had previously duplicated the wide-format results as /stats/cag/corncob_wide, but in this release there is only a single table, at /stats/cag/corncob.

The other major change is that we now include the option of running metaPhlAn2 for compositional analysis by adding the --composition flag.

Other changes include:

  • Gene catalog is now also published in FASTA format
  • The download_sra.nf workflow is now compatible with BioProjects in which the ID does not match the accession
  • Added q-values to the corncob output table
  • Pinned pandas to v1.0.3
  • Removed the R1/R2/I1/I2 columns from the /manifest table
  • Fixed bug with /ref/taxonomy table
  • Removed comment lines from eggNOG output

Improved processing and output format

10 Apr 22:44
Compare
Choose a tag to compare

From the previous release, we've added quite a bit of improvements to functionality and bugfixes. The format of the output files has also changed insofar as the HDF5 output files are now split into "results.hdf5" and "details.hdf5". The motivation for this change is that the "results.hdf5" file contains the bulk of what a user may need, while "details.hdf5" just contains the voluminous details generated by de novo assembly as well as the detailed alignment outputs.

No further major updates are intended prior to publication, but there will likely be bugfixes. We plan to make a future v1.0.0 to match the published version, but that should be relatively similar to the functionality found in this release.