Skip to content

Commit

Permalink
Update on Readme.md file
Browse files Browse the repository at this point in the history
  • Loading branch information
andresmasegosa committed Jul 7, 2016
1 parent 50e3f46 commit 0a8de79
Showing 1 changed file with 9 additions and 10 deletions.
19 changes: 9 additions & 10 deletions README.md
Expand Up @@ -22,8 +22,10 @@ AMIDST provides tailored parallel (powered by Java 8 Streams) and distributed (p

#Features<a name="features"></a>

* **Probabilistic Graphical Models**: Specify your model using probabilistic graphical models with latent variables
and temporal dependencies.
* **Probabilistic Graphical Models**: Specify your model using probabilistic graphical models with [latent variables](http://amidst.github.io/toolbox/examples/bnetworks.html)
and [temporal dependencies](http://amidst.github.io/toolbox/DynamicCodeExamples.html). AMIDST contains a large list of predefined latent variable models:

![](http://amidst.github.io/toolbox/docs/web/figs/amidstModels-crop.png)

* **Scalable inference**: Perform inference on your probabilistic models with powerful approximate and
scalable algorithms.
Expand All @@ -32,13 +34,13 @@ scalable algorithms.
appropriate for learning from (massive) data streams.

* **Large-scale Data**: Use your defined models to process massive data sets in a distributed
computer cluster using Apache Flink or (soon) **Apache Spark**.
computer cluster using **Apache Flink** or (soon) **Apache Spark**.

* **Extensible**: Code your models or algorithms within AMiDST and expand the toolbox functionalities.
Flexible toolbox for researchers performing their experimentation in machine learning.

* **Interoperability**: Leverage existing functionalities and algorithms by interfacing
to other software tools such as Hugin, MOA, Weka, R, etc.
to other software tools such as [Hugin](http://amidst.github.io/toolbox/examples/bnetworks.html#sec:bns:huginlink), [MOA](http://amidst.github.io/toolbox/examples/bnetworks.html#sec:bns:moalink), Weka, R, etc.


#Simple Code Example<a name="example"></a>
Expand Down Expand Up @@ -148,15 +150,12 @@ architecture and 32 cores. The size of the processed data set was measured accor
</p>


## Distributed Scalablity using [Apache Flink](http://flink.com)
## Distributed Scalablity using Apache Flink

If your data is really big and can not be stored in a single laptop, you can also learn
your probabilistic model on it by using the AMIDST distributed learning engine based on
a novel and state-of-the-art [distributed message passing scheme](http://amidst.github.io/toolbox/docs/dVMP.pdf).
As detailed in this [paper](http://amidst.github.io/toolbox/docs/dVMP.pdf), we learn a probabilistic
model similar to the one detailed in the above section from 42 million samples in an Amazon's cluster with
2, 4, 8 and 16 nodes, each node containing 8 processing units. The following figure shows the scalability of
our approach under these settings.
a novel and state-of-the-art [distributed message passing scheme](http://amidst.github.io/toolbox/docs/dVMP.pdf) implemented on top
of [Apache Flink](http://flink.com). As detailed in this [paper](http://amidst.github.io/toolbox/docs/dVMP.pdf), we were able to perform inference in a billion node (i.e. 10^9) probabilistic model in an Amazon's cluster with 2, 4, 8 and 16 nodes, each node containing 8 processing units. The following figure shows the scalability of our approach under these settings.

<p align="center">
<img src="https://amidst.github.io/toolbox/docs/web/figs/flink-scalability.png" width="800">
Expand Down

0 comments on commit 0a8de79

Please sign in to comment.