Skip to content

Releases: EI-CoreBioinformatics/mikado

Version 2.0, release candidate 4

18 Aug 23:28
Compare
Choose a tag to compare
Pre-release

Users are advised to update as soon as possible. This release fixes a bug that had removed chimera splitting capabilities from Mikado since version 1.2.

In this RC:

  • solved an extremely serious bug which caused Mikado not to perform the chimera splitting during pick. The behaviour is now properly tested to avoid regressions.
  • Removed serious bottlenecks in the creation of splicing graphs - now the algorithm is less than quadratic. This should make Mikado more amenable to denser inputs.
  • #206: now mikado serialise will crash informatively when trying to add transcript ORFs from transcripts that are not present in the mikado_prepared.fasta file. This should prevent a common user error.
  • Solved #207: improved performance of Mikado
  • Mikado prepare will correctly keep the CDS of transcripts
  • Mikado pick will not overload the ORF (or coding/non-coding status) of a transcript if it is marked as reference
  • redundant class codes (=, _ and n) are now valid splice codes for the alternative splicing stage in pick. This is to allow mikado pick to include e.g. the transcripts from which an ab initio prediction was derived.

General improvements

  • Now the superlocus class has been revamped a bit:
    • made the definition of transcript graphs in superloci a O(nlogn) rather than O(n**2) algorithm.
    • removed the third method to reduce complex loci
    • rewrote for speed the method one to reduce complex loci
    • now both reduction methods will consider whether a transcript is reference
    • The definition of alternative splicing events has also been moved into a O(n*logn) algorithm.
  • Mikado pick was not leveraging correctly the multiple processors. This was due to the fact that the main process was taking up the job of checking transcripts and creating loci - expensive operations that acted as bottlenecks. Now the main process will only collate transcripts as GTF rows, do a minimal check on the fact that they do not have introns longer that the maximum size, and then and only then dispatch them.

Version 2.0, release candidate 3

05 Aug 11:19
68d3c60
Compare
Choose a tag to compare
Pre-release

Fixed #203 and #205.

Version 2.0, release candidate 2

25 Jul 21:44
2043db9
Compare
Choose a tag to compare
Pre-release

Solved #196, #197 and #198. Inching towards 2.0.

Version 2.0, release candidate 1

10 Jul 13:56
Compare
Choose a tag to compare
Pre-release

Same as the previous pre-release, differences:

  • decided to switch to v2 rather than 1.5, due to too many incompatibilities with version 1 from over a year ago
  • Fixed #194

=====

Please see the CHANGELOG file for details.

Major notes:

  • this release fixes a bug (#139) whereupon cDNAs completely or partially in letters different from ATGCNn (eg. lowercase, ie soft-masked nucleotides) would not have been reversed-complemented correctly. Therefore, any run on soft-masked genomes with prior releases would be invalid.
  • this release changes the format of the mikado database. As such, old mikado databases have to be regenerated with Mikado serialise in order for the run not to fail.
  • this release has completely overhauled the scoring files. We now provide only two ("plant.yaml" and "mammalian.yaml"). "Plant.yaml" should function also for insect or fungal species, but we have not tested it extensively. Old scoring files can be found under "HISTORIC".
  • this release completes the "padding" functionality. Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release). The behaviour is controlled by the "pad" boolean switch, and by the "ts_max_splices" and "ts_distance" parameters under "pick". Please note that now "ts_distance" refers to the transcriptomic distance, ie, long introns are not considered for this purpose. Moreover, padding is now enabled by default.
  • general improvements in speed and multiprocessing, as well as flexibility, for the Mikado compare utility.

With this release, we are also officially dropping support for Python 3.4. Python 3.5 will not be automatically tested for, as many Conda dependencies are not up-to-date, complicating the TRAVIS setup.

Version 1.5, release candidate

09 Jul 16:18
Compare
Choose a tag to compare
Pre-release

Please see the CHANGELOG file for details.

Major notes:

  • this release fixes a bug (#139) whereupon cDNAs completely or partially in letters different from ATGCNn (eg. lowercase, ie soft-masked nucleotides) would not have been reversed-complemented correctly. Therefore, any run on soft-masked genomes with prior releases would be invalid.
  • this release changes the format of the mikado database. As such, old mikado databases have to be regenerated with Mikado serialise in order for the run not to fail.
  • this release has completely overhauled the scoring files. We now provide only two ("plant.yaml" and "mammalian.yaml"). "Plant.yaml" should function also for insect or fungal species, but we have not tested it extensively. Old scoring files can be found under "HISTORIC".
  • this release completes the "padding" functionality. Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release). The behaviour is controlled by the "pad" boolean switch, and by the "ts_max_splices" and "ts_distance" parameters under "pick". Please note that now "ts_distance" refers to the transcriptomic distance, ie, long introns are not considered for this purpose. Moreover, padding is now enabled by default.
  • general improvements in speed and multiprocessing, as well as flexibility, for the Mikado compare utility.

With this release, we are also officially dropping support for Python 3.4. Python 3.5 will not be automatically tested for, as many Conda dependencies are not up-to-date, complicating the TRAVIS setup.

1.3beta

15 Oct 13:48
Compare
Choose a tag to compare
1.3beta Pre-release
Pre-release

[Beta, the finalised version will be released soon]

One of the major highlights of this release is the completion of the "padding" functionality.
Briefly, if instructed to do so, now Mikado will be able to uniform the ends of transcripts within a single locus (similar to what was done for the last Arabidopsis thaliana annotation release).
The behaviour is controlled by the "pad" boolean switch, and by the "ts_max_splices" and "ts_distance" parameters under "pick".

Bugfixes and improvements:

  • Fixed a bug which caused some loci to crash at the last part of the picking stage
  • Now coding and non-coding transcripts will be in different loci.
  • Mikado prepare now can accept models that lack any exon features but still have valid CDS/UTR features
  • Fixed #34: now Mikado can specify a valid codon table among those provided by NCBI through BioPython. The default is "0", ie the Standard table but with only the canonical "ATG" being accepted as valid start codon.
  • Fixed #123: now add_transcript_to_feature.gtf automatically splits chimeric transcripts and corrects mistakes related the intron size.
  • Fixed #126: now reversing the strand of a model will cause its CDS to be stripped.
  • Fixed #127: previously, Mikado prepare only considered cDNA coordinates when determining the redundancy of two models. In some edge cases, two models could be identical but have a different ORF called. Now Mikado will also consider the CDS before deciding whether to discard a model as redundant.
  • #129: Mikado is now capable of correctly padding the transcripts so to uniform their ends in a single locus. This will also have the effect of trying to enlarge the ORF of a transcript if it is truncated to begin with.
  • #130: it is now possible to specify a different metric inside the "filter" section of scoring.
  • #131: in rare instances, Mikado could have missed loci if they were lost between the sublocus and monosublocus stages. Now Mikado implements a basic backtracking recursive algorithm that should ensure no locus is missed.
  • #132: Mikado will now evaluate the CDS of transcripts during Mikado prepare.

BED12 galore

08 Aug 09:28
Compare
Choose a tag to compare

Enhancement release. Following version 1.2.3, now Mikado can accept BED12 files as input for convert, compare and stats (see #122). This is becoming necessary as many long-reads alignment tools are preferentially outputting (or can be easily converted to) this format.

BugFix and BED12

12 Jul 11:12
Compare
Choose a tag to compare

Mainly this is a bug fix release. It has a key advancement though, as now Mikado can accept BED12 files as input assemblies. This makes it compatible with Minimap2 PAF > BED12 system.

BugFix for 1.2

10 May 14:27
Compare
Choose a tag to compare

Minor bugfixes:

  • Now Daijin should handle correctly the lack of DRMAA
  • Now Dajin should treat correctly single-end short reads

1.2.1

04 May 10:54
Compare
Choose a tag to compare

Highlights for this version:

  • The version of the algorithm for retained introns introduced in 1.1 was too stringent compared to previous versions. The code has been updated so that the new version of Mikado will produce results comparable to those of versions 1 and earlier. ALL MIKADO USERS ARE ADVISED TO UPDATE THE SOFTWARE.
  • Daijin now supports Scallop.
  • Now Mikado will print out also the alias in the final picking tables, to simplify lookup of final Mikado models with their original assembly (previously, the table for the .loci only contained the Mikado ID).
  • Various changes on the BED12 internal representation. Now Mikado can also convert a genomic BED12 into a transcriptomic BED12.
  • Updated the documentation, including a tutorial on how to create scoring files, and how to adapt Daijin to different user cases.
  • Now finalised transcripts will always contain a dictionary containing the phases of the various CDS exons.
  • Mikado prepare now will always reverse the strand for mixed-splicing events.
  • Added unit-tests to keep in check the regression in calling retained introns, and for the new BED12 features.
  • Minor bugfixes.