Comparison

#Comparison of StochHMM with other Compilers, Libraries, Toolboxes, Implementation Applications

I've been asked, how does StochHMM compare to existing HMM solutions and where does it fit in exactly?

StochHMM is written and focused on being more accessible and easier to implement HMM's for biologists and bioinformaticians. This comparison focuses on tools that have been published in bioinformatics journals.

StochHMM isn't the holy grail of HMMs; it focuses on the traditional HMM. I wish I had funding and time to make it the swiss army knife of HMMs(pair,generalized,inhomogeneous), but I don't. When I started on StochHMM I was a wet lab biologist that was frustrated with existing implementation tools. My focus on StochHMM has been to make it easy to implement and understand the model files, flexible so that the models can be adapted to many different situations, accessible to people with minimal bioinformatics skills, and lastly somewhat memory efficient and fast.

To help answer the comparison question, I've compiled a list of HMM utilities, their features, and what StochHMM has or doesn't have.

ComparisonTable

Comparison of features available in existing HMM compilers, libraries, toolkits, and applications. StochHMM is included in both Libraries and Applications because it is provided as both. (Lunter, 2007; Lam and Meyer, 2009b; Schliep et al., 2003; Sand et al., 2010; Schütz and Delorenzi, 2008)

One of the most common questions that I get from bioinformaticians is: How does it compare to HMMoc?

First, StochHMM requires a lot less programming knowledge to implement a simple model like the common Dishonest Casino model. (see Dishonest Casino Compared)[dishonest_comp] Researchers can create, run, and quickly adapt a model without ever writing a line of code. Something that isn't possible with HMMoc. This makes it a lot more accessible to non-bioinformaticians. I have introduced StochHMM to non-bioinformaticians and they are now making very simple and some very complex HMMs to analyze data in ways that they were inaccessible to them before.

Second, StochHMM doesn't support pair, generalized HMMs. But it does support a lot of features that allows it to be adapted. Many features can be accessed without coding a line of C++. But just a few lines of code in the main function can allow it to call utilities like HMMer from within the Viterbi algorithm, work as a inhomogeneous HMM for classifying parental origin of haplotypes, use user-defined functions as transitions or emissions.

Third, almost everyone is interested in speed and memory efficiency compared to HMMoc. To help with the comparison, I've ran some simple benchmarks on a 13" MacBook Pro (2.5Ghz Intel Core i5 with 16GB DDR3 memory). All programs using Mamot, HMMoc, and StochHMM were compiled in Xcode 5.0.1.

Memory usage for Mamot, HMMoc-derived programs, and StochHMM were performed using the Instrument's Memory Allocation profiling tool. R was ran using (RHMM script)[rscript] from the command-line and memory usage was evaluated by totaling the total memory in displayed by gc().

Time was evaluated using unix time function and all values reported are the user time.

##Dishonest Casino Benchmark

##Viterbi Algorithm ###Time (seconds): Dishonest Casino HMM on Dice Sequence

Sequence Length (bp)	Mamot	StochHMM	R-HMM	HMMoc
300	0.005	0.005	0.232	0.004
300000	0.247	0.218	20.449	0.124
3000000	19.724	2.056	213.206	1.229
30000000	211.187	19.453	2534.004	11.331

###Memory (Megabytes): Dishonest Casino HMM on Dice Sequence

Sequence Length (bp)	Mamot	StochHMM	R-HMM	HMMoc
300	4.60	0.08	21.00	0.07
300000	4.60	12.76	79.50	32.45
3000000	4.60	145.45	444.20	300.64
30000000	4.60	1360.00	4517.00	2300

##Posterior Algorithm

###Time (seconds): Dishonest Casino HMM on Dice Sequence

Sequence Length (bp)	Mamot	StochHMM	R-HMM	HMMoc
300	0.045	0.005	0.262	0.006
300000	0.120	0.498	34.896	1.452
3000000	0.809	5.255	369.684	15.241
30000000	7.271	52.797	---	148.121

###Memory (Megabytes): Dishonest Casino HMM on Dice Sequence

Sequence Length (bp)	Mamot	StochHMM	R-HMM	HMMoc
300	48.07	0.01	21.6	0.01
300000	48.07	14.55	71.4	9.45
3000000	48.07	144.23	589.1	94.42
30000000	48.07	1400.00	---	976.15

Note: Mamot is limited to 300,000bp and must be alphabetic. If longer sequences are supplied they are split into 300K chunks and ran separately. The sequences were supplied as alphabetic (1=a, 2=b, 3=c, 4=d, 5=e, 6=f)

Note: R-HMM was run from the command-line and memory was evaluated using gc() function. (see RHMM script)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison

Clone this wiki locally