Skip to content

aaiezza/FLiCK

Repository files navigation

FLiCK

 [File] Format Leveraging Compression framework

  A Bioinformatics thesis project by Alessandro Aiezza II
    Defended on July 20, 2016 @ the Rochester Institute of Technology

 Committee
    Dr. Gary Skuse, Dr. Greg Babbitt, Dr. Larry Buckley

 Citation
Aiezza, A.,II. (2016). The FLiCK framework; enabling rapid development and performance benchmarking of compression applications for genetic data files (Order No. 10144070). Available from ProQuest Dissertations & Theses Global. (1825611935). Retrieved from http://search.proquest.com/docview/1825611935?accountid=13567

A Java framework that makes it easier to develop file compressors/decompressors by leveraging ab inito knowledge about a specific file format. FLiCK runs independently as a file compressor and currently will ZIP any files it is given.

A developer can create a module in FLiCK for any file format. A module associates a file's format with one or many file extension names. (For example, the FASTA module will work on files with extenstions .fa, .fasta, and .fna.) When the classes or jar of a FLiCK module is found on the CLASSPATH at runtime, FLiCK will check for all associated file names and use a module's compression algorithm as oppose to the default ZIP algorithm.

FLiCK comes preloaded with FASTA and FASTQ file format compression modules


Usage - users

  1. Download from release page FLiCK Releases
  2. Untarball/unzip contents into a directory on your PATH
  • flick.jar
  • flick (executable)
  • unflick (executable)
  1. You should be ready to go! FLiCK User tutorial unFLiCK User tutorial

Usage - Developers (Module Creation)

  1. Download flick.jar from the releases page and add to CLASSPATH
$ export CLASSPATH=path/to/other/jars:flick.jar
  1. Five classes need to be implemented to create a module:
FileDeflator FileInflator DeflationOptionSet InflationOptionSet FileArchiver
Implementation of the file format compression algorithm Implementation of the file format decompression algorithm Options/flags available for altering the behavior and of the algorithm responsible for file compression Options/flags available for altering the behavior and of the algorithm responsible for file decompression (1) Holds aspects that are important to both the deflator and inflator. (2) Connects other 4 classes together. (3) Declares file extensions the module is appropriate for.
  1. The FileArchiver class must be annotated with the RegisterFileDeflatorInflator class to identify the class names of the other 4 component classes as well as to list what file extensions the module should be used for.
      (It is recommended to jar your implementing classes for ease of use and portability of your module.)

  2. Place your classes (or jar) on the CLASSPATH so that they are visible to FLiCK at runtime.


FASTA and FASTQ File Format Modules come preloaded in FLiCK

The entirety of both these modules exists in the edu.rit.flick.genetics package. The FLiCK [platform] is fully functional and executable without this package, as the package serves as an outside module.

FASTA & FASTQ file format specification

FASTA & FASTQ file format specification

Architecture of FLiCK

FLiCK UML Diagram

Example Module Registration for the FLiCK FASTA compression module

@RegisterFileDeflatorInflator (
    deflatedExtension = FastaFileArchiver.DEFAULT_DEFLATED_FASTA_EXTENSION,
    inflatedExtensions =
{ "fna", "fa", "fasta" },
    fileDeflator = FastaFileDeflator.class,
    fileInflator = FastaFileInflator.class,
    fileDeflatorOptionSet = FastaDeflationOptionSet.class,
    fileInflatorOptionSet = FastaInflationOptionSet.class )
public interface FastaFileArchiver extends FastFileArchiver
{ ...
    public static final String DEFAULT_DEFLATED_FASTA_EXTENSION       = ".flickfa";
... }

More details behind sample FASTA/FASTQ module implementations

The modules use a 2-bit compression algorithm for the nucleotides:

Nucleotide Mapped bits
A 00
C 01
G 10
T 11

Example: ACTGATTACA00011110001111000100 → 123844

FLiCK FASTQ 2-bit compression module performance analysis

Program Average Compression Ratio Average Compression Runtime Average Decompression Runtime
Path Encoding 90.9% - -
LW-FQZip 80.5% 44:39 02:52
FLiCK
(2-bit module)
77.3% 31:55 20:46
gzip 75.6% 19:03 10:24
bzip2 78.3% 32:18 16:33
Quip 77.3% 11:52 01:57
LEON 91.5% 32:10 07:52

FLiCK 2-bit module performance