Skip to content

Releases: zellerlab/GECCO

v0.9.10

27 Feb 16:11
v0.9.10
Compare
Choose a tag to compare

Fixed

  • Progress reading display when reading from compressed files.
  • Change labeling routine to use broad overlaps when annotating genes with cluster tables (#15).

Changed

  • Bump supported polars dependency to v0.20.
  • Bump supported statsmodels dependency to v0.14.
  • Report identifier of sequences with uni-valued labels when training.

v0.9.9

23 Nov 09:50
Compare
Choose a tag to compare

Added

  • Support for gzip, bzip2, lz4 and xz-compressed input files.

Fixed

  • Outdated use of pandas API in gecco cv command.

Changed

  • Bump pyhmmer dependency to v0.10.0.
  • Bump pyrodigal dependency to v3.0.0.
  • Make gecco cv output a gene table with a ground truth column.

v0.9.8

09 Jun 11:17
Compare
Choose a tag to compare

Fixed

  • ClusterTable.from_clusters extracting cluster IDs in the wrong column.
  • Deprecation warnings in polars.read_csv and polars.write_csv with recent polars versions.
  • Deprecation warnings in importlib_resources with recent Python versions.

v0.9.7

26 May 12:02
Compare
Choose a tag to compare

Added

  • Command line option to annotate proteins using bitscore cutoffs from HMMs.
  • Command line option to disentangle overlapping domains after HMM annotation.

Changed

  • Bump pyhmmer dependency to v0.8.0.
  • Bump pyrodigal dependency to v2.1.0.
  • Rewrite gecco.model to use polars for managing tabular data.
  • Replace pandas dependencies with polars
  • Update gecco run to skip type classification for tasks without an assigned cluster type.

Fixed

  • Cluster.to_seq_record crashing when called on a cluster with types attribute unset.
  • Progress bar resetting when performing domain annotation with multiple HMMs.

Removed

  • Support for Python 3.7.

v0.9.6

11 Jan 19:36
Compare
Choose a tag to compare

Added

  • Gene Ontology annotations to gecco.interpro local metadata.
  • Reference to Gene Ontology terms and derived functions to gecco.model.Domain objects.
  • Gene color based on predicted function in gecco.model.Gene.to_seq_feature.

Fixed

  • Missing gzip import in the CLI preventing usage of gzip-compressed inputs.
  • Invalid coordinates of domains found in reverse-strand genes.
  • Detection of entry points with importlib.metadata on older Python versions.

Changed

  • bgc_id columns of cluster tables are renamed cluster_id.
  • gecco.model.ProductType is renamed to gecco.model.ClusterType.
  • Bumped pyrodigal dependency to v2.0.
  • Bumped pyhmmer dependency to v0.7.

v0.9.5

10 Aug 12:29
Compare
Choose a tag to compare

Added

  • gecco predict command to predict BGCs from an annotated genome.
  • Protein.with_seq function to assign a new sequence to a protein object.

Fixed

  • Issue with antiSMASH sideload JSON file generation in gecco run and gecco predict.
  • Make gecco.orf handle STOP codons consistently (#9).

v0.9.4

31 May 10:44
Compare
Choose a tag to compare

Added

  • classes_ property to TypeClassifier to access the classes_ attribute of the TypeBinarizer.
  • Alternative ORF finder CDSFinder which simply extracts CDS features from input sequences (#8).
  • Support for annotating domains with "exclusive" HMMs to annotate genes with at most one HMM from the library.

Changed

  • ProductType is not restricted to MIBiG types anymore and can support any string as a base type identifier.
  • PyrodigalFinder now uses multiprocessing.pool.ThreadPool instead of custom thread code thanks to OrfFinder.find_genes reentrancy introduced in Pyrodigal v1.0.
  • PyrodigalFinder can now be used in single / non-meta mode from the API.
  • BUmped minimum rich version to 12.3 to use None total in progress bars when the size of an HMM library is unknown.

Fixed

  • Broken MyPy type annotations in the gecco.model and gecco.cli modules.

v0.9.3

13 May 14:29
Compare
Choose a tag to compare

Changed

  • --format flag of gecco annotate and gecco run CLI commands is now made lowercase before giving value to Bio.SeqIO.

Fixed

  • Genes with duplicate IDs being silently ignored in HMMER.run.

v0.9.2

11 Apr 17:11
Compare
Choose a tag to compare

Added

  • Padding of short sequences with empty genes when predicting probabilities in ClusterCRF.

v0.9.1

05 Apr 16:03
Compare
Choose a tag to compare

Changed

  • Make the genes.tsv and features.tsv table contain all genes even when they come from a contig too short to be processed by the CRF sliding window.
  • Replaced the --force-clusters-tsv flag with a --force-tsv flag to force writing TSV tables even when no genes or clusters were found in gecco run or gecco annotate.