Skip to content
bstabile edited this page Jun 17, 2015 · 2 revisions

DeltaCodec

A Time Series Compression Library in C# : Directly Encode/Decode Generic Lists

Summary

The simplest way to use a codec implemented in this framework looks something like this:

var codec = DeflateCodec.Instance;
var bytes = codec.Encode(list);
var listOut = codec.Decode<DateTime>(bytes);

As you can see, lists of intrinsic data types are first-class citizens. And it is very simple to create derivations that handle more complex data structures (v1.2 adds basic multi-field functionality and demonstrations).

A DeltaCodec is a combination of a transform and a finisher. In the example above we are using a trivial codec that uses a NullTransform to simply pass the data through to a DeflateFinisher. The latter uses built-in DeflateStream compression. The codec itself is only responsible for handling the serialization of results with header information required for decoding.

The framework not only simplifies usage, it also allows us to mix and match transforms and finishers to create custom codecs in endless variety.

Performance

To give you an idea of what is available out-of-the-box, we'll show some output from a few of the many included performance tests that compare different codecs, data types, and parameter settings. We'll start with the codec described above. And then we'll make a few simple adjustments to see how we can improve results.

My company, Stability Systems LLC, develops highly optimized commercial codecs such as RandomWalkCodec (RWC). We use that as a benchmark since it shows just how high the bar can be raised in pure C# implementations.

DateTimeBySeconds_SerialOptimal

Now we're going to replace DeflateCodec with DeflateDeltaCodec. This replaces NullTransform with DeltaTransform. The data will now be differenced before being passed to the finisher.

DateTimeBySeconds_SerialDeltaNoFactorOptimal

The Ratios/Multiples are all greatly improved, simply because we differenced the data. Unfortunately, the encoding and decoding speeds still leave much to be desired. Now let's throw a little parallelism into the mix and see what happens:

DateTimeBySeconds_ParallelDeltaGranularOptimal

If you notice the field called Parts (partitions), the value has been increased from 1 to 4. That indicates the number of blocks that we are encoding in parallel. The speeds are now increased by roughly 3X. Not bad!

There are, of course, a tremendous variety of other possible optimizations, as evidenced by the performance of the benchmark, RWC. But this is just a framework and we leave it to your imagination to come up with clever transforms.

Complex Data Types

I'll show one more example here dealing with new functionality available in v1.2 : multi-field encoding.

Struple_13

This shows a list of generic Struples (structure tuples) that has fields for all intrinsic data types (except Boolean, which I may add later). There is nothing much to say about this except that it works and is handy as a template for creating strongly-typed custom codec methods for your own complex data types.

Finally

Be sure to visit the [Documentation] wiki pages, and look over the docs included in the solution folder. The docs include discussion of [Architecture] and [Usage], as well as extensive [Performance] results to sift through. And, of course, you can play around with the actual tests in Visual Studio.

NOTE: Performance test output is best viewed in NUnit or Resharper runners. The default MSTest display is bad.

Clone this wiki locally