Skip to content

Releases: EI-CoreBioinformatics/mikado

Version v2.2.0

15 Mar 15:27
Compare
Choose a tag to compare

Version 2.2.0

Removed Cython from the requirements.txt file. This allows to perform the tests correctly in a Conda environment (as Conda disallows installing Cython as part of a distributed package).
As a result of this change, the preferred installation procedure from source has to be slightly amended:

  • either install using pip wheel -w dist . && pip install dist/Mikado*whl
  • or install with python setup.py bdist_wheel after having forcibly installed Cython, with pip install Cython or the like.

Other changes:

  • Fix #381: now Mikado will be able to guess correctly
    the input file format, instead of relying on the file name extension or user's settings. Sniffing for files
    provided as a stream is disabled though.
  • Fix #382: now Mikado can accept generic BED12 files
    as input junctions, not just Portcullis junctions. This allows e.g. a user to provide a set of gene models
    in BED12 format as sources of valid junctions.
  • Fix #384: now Mikado convert deals properly with
    unsorted GTFs/GFFs.
  • Fix #386: dealing better with unsorted GFFs/GTFs for
    the stats utility.
  • Fix #387: now Mikado will always use a static seed,
    rather than generating a new one per call unless specifically instructed to do so. The old behaviour can still be
    replicated by either setting the seed parameter to null (ie None) in the configuration file, or by
    specifying --random-seed during the command invocation.
  • General increase in code unit-test coverage; in particular:
    • Slightly increased the unit-test coverage for the locus classes, e.g. properly covering the as_dict and load_dict
      methods. Minor bugfixes related to the introduction of these unit-tests.
  • Mikado.parsers.to_gff has been renamed to Mikado.parsers.parser_factory.
  • The code related to the transcript padding has been moved to the submodule Mikado.transcripts.pad, rather than
    being part of the Mikado.loci.locus submodule.
  • Mikado will error informatively if the scoring configuration file is malformed.

Patch release

23 Feb 23:06
Compare
Choose a tag to compare

Hotfix release:

  • IMPORTANT Mikado now uses correctly the scores associated to a given source.
  • IMPORTANT Mikado was not forwarding the original source to transcripts derived by chimera splitting. This compounded the issue above.
  • Corrected the issue that caused the issues above, ie transcripts where not dumping and reloading all relevant fields. Now implemented properly and tested with specific new routines.
  • Corrected an issue that caused Mikado to erroneously calculate twice the metrics and scores of loci, therefore reporting some wrong ones in the output files.
    • affected metrics where e.g. selected_cds_intron_fraction and combined_cds_intron_fraction.
  • Removed quicksect from the requirements.

v2.1.0: Issue 375 (#379)

22 Feb 23:22
8b887ab
Compare
Choose a tag to compare

Bugfix and speed improvement release.

  • Fix a bug that prevented Mikado from reporting the correct metrics/scores in the output of loci files. This bug only affected reporting, not the results themselves. See issue 376
  • Fix a bug in printing out the statistics for an annotation file with mikado util stats (issue 378)
  • When doing serialising, Mikado now by default will drop and reload everything. The previous default behaviour results in hard-to-parse errors and is not what is usually desired anyway.
  • Improved the performance of pick in multiple ways (issue 375):
    • now only external metrics that are requested in the scoring file will be printed out in the final metrics files. This reduces runtime in e.g. Minos. The new CLI switch --report-all-external-metrics (both in configure and pick) can be used to revert to the old behaviour.
    • the external table in the Mikado database now is indexed properly, increasing speed.
    • batch and compress the results before sending them through a queue (@ljyanesm)
    • @brentp enhanced the bcbio intervaltree.pyx into quicksect. Copied this new version of interval tree and adapted it to Mikado.
    • Using sqlalchemy bakeries for the SQLite queries, as well as LRU caches in various parts of Mikado.
    • Removed excessive copying in multiple parts of the program, especially regarding the configuration objects and during padding.
    • Using operator.attrgetter instead of a custom (and slower) recursive getattr function.
  • Removed unsafe calls to tempfile.mktemp and the like, for increased security according to CodeQL.

2.0.2

13 Feb 18:09
4f5571c
Compare
Choose a tag to compare

Bugfix release.

  • Fix infinite recursion bug when trying to recover lost transcripts
  • Fix performance regression by passing the configuration to Excluded locus objects.

Marshmallow mate

09 Feb 16:27
6c7ba51
Compare
Choose a tag to compare
  • Fixed a bug that caused Mikado configure (but not daijin configure, or "mikado configure --daijin") to print out invalid configuration files.
  • Restored the functionality of "--full" - now Mikado can print out both partial (but still valid) or fully-fledged configuration files.
  • Ported also the scoring configuration to MarshMallow dataclass. As a direct results, removed from the dependencies jsonschema.
  • Configured bumpversion
  • Corrected a small bug in parsing EnsEMBL GFF3
  • Cured some deprecation warning messages from marshmallow and numpy
  • Small bug fix in the CLIs of mikado/daijin configure.
  • Default value of the seed is now 0 (ie: undefined, a random one will be selected). Only integers are allowed values.
  • Small bugfixes/extensions in the test suite.
  • Minor code reorganisation, without changes to the API.

Mikado version 2

28 Jan 17:57
Compare
Choose a tag to compare

Official second release of Mikado. All users are advised to update as soon as possible.

See https://github.com/EI-CoreBioinformatics/mikado/milestone/22?closed=1 for a non-comprehensive list of all the issues closed in
relation to this release.

Mikado 2, public release candidate 2

13 Apr 09:27
Compare
Choose a tag to compare

Minor amendments to 2.0rc1 - in order to get Mikado to install properly in BioConda.

Mikado 2, public release candidate 1

09 Apr 08:18
Compare
Choose a tag to compare

This version of Mikado is finally ready to go into Conda, DockerHub, PyPI and Singularity Hub.
Many thanks to @ljyanesm, thanks to whom Mikado has become much more performant.

Most notable changes:

  • Mikado serialise will now accept tabular BLAST files (with the extra columns ppos and btop). Both XML and TSV loading have parts written in Cython. Thank you to @srividya22 for first asking about improvements in this sense. #280
  • Mikado prepare now will remove redundancies based on intron chains, not perfect to-the-base identity. This should massively reduce the input data. The redundancy filter can be controlled per-source: ie, Mikado is able to keep all transcripts from certain input files (reference annotations, ab initio predictions, transcript assemblies, etc) while removing any redundant transcript from others (long-read alignments). Thanks to @lijing28101. #270
  • Mikado prepare now will try to split transcripts with very long introns, rather than outright discard them.
  • Mikado pick will now operate in stringent mode by default (ie: only split transcripts when there is strong evidence of them being chimeras, as per the BLAST data).
  • Mikado now uses TOML as default configuration language, as it is much more human-readable than either YAML or JSON (#239).
  • Various bugfixes.

Version 2.0, release candidate 6

15 Oct 10:34
48b6c0c
Compare
Choose a tag to compare
Pre-release
  • #216: now mikado prepare will explicitly tell users to use the mikado_prepared.fasta for the serialise step. Moreover, mikado serialise will informatively crash if users try to do something different (a common mistake seems to be to use a FASTA file derived directly from the input assemblies).
  • #220: Fixed a bug in mikado serialise
  • #222: now daijin will make prodigal or TransDecoder use alternative genetic codes, upon request. IMPORTANT: TransDecoder does not support all of the known genetic codes listed by NCBI.
  • #223: fixed the start-adjustment method in the ORF module.
  • #226: mikado compare, mikado util stats and mikado util grep are now compatible with non-standard NCBI GFF3 files (having e.g. pseudogene features without any associated transcript but associated exons, or rRNA transcript features without any parent gene)
  • #227: now mikado compare will always consider valid transcripts, even if they are multiexonic yet missing a defined strand orientation.
  • #229:
    • mikado pick will now:
      • report the padding as INFO, not as WARNING
      • report on finishing the analysis of a chromosome, not the parsing
      • report the temporary analysis directory
      • provide --max-intron-length as a command line option
    • fixed a small bug in mikado serialise
    • fixed a bug in the ORF module that caused a crash when the sequence was not completely uppercase
  • #230: fixed some bugs related to the daijin conda environments and to updates to the snakemake code upstream.
  • Fix a small bug in reference_gene.py and transcript.py, related to sys.intern
  • #232: typo in the help for mikado serialise.

Version 2.0, release candidate 5

26 Sep 15:30
ab172f1
Compare
Choose a tag to compare
Pre-release
  • Switched from ujson to rapidjson (actively maintained and as performant)
  • Fix #209: daijin has been debugged and it is now properly tested. Also, when using daijin mikado, the number of XMLs will be equal or greater than the number of requested threads.
  • #177: mikado serialise is now completely parallelised. This allows for very significant speed-ups, especially when loading a large number of ORFs.
  • Speedups for mikado pick: now the GTF will be parsed much more quickly, by avoiding to create a full GTFline object for each line during the parsing (which was extra-slow).
  • daijin can now optionally use conda environments, using the conda directive of snakemake.
  • Speedup in mikado pick: now everything is written to databases (#218). This allows for cleaner temporary directories and parsing of the partial outputs.
  • mikado pick now will not, by default, print out the subloci file.
  • Speed up in mikado pick: now using a lightweight graph also for the splicing.
  • Amend #134 - now the minimum CDS overlap is 50%, not 75%.
  • Fixed a bug for mikado compare in multiprocessing mode
  • Fixed a bug in mikado configure - the scoring file will not be embedded within the printed file (otherwise it will be impossible to change it dynamically).