Skip to content
Khaled Sharif edited this page Aug 18, 2017 · 2 revisions

Athena

You've reached the Github Wiki for the Athena project

Athena is a high-level framework for equation building and curve fitting, written in Python and built on top of Tensorflow; this means you can build large equations and perform curve fitting on your CPU, GPU, or cluster, without the constraints of traditional curve fitting toolboxes or any degradation in performance. Athena was developed with academia and researchers in mind: it is therefore abstract and simple to use (quickly fit an equation of choice to tabular data), while still remaining powerful and highly customizable (automatically search through millions of different mathematical equation forms and find the most accurate one).

The Science Behind Athena

Athena works by utilizing a novel method for generating interpretable regression models by forming a generalized additive model using symbolic constituents. You can read more about generalized additive models, symbolic regression, and their applications into the fields of equation building through the papers listed below.

  • Thomas W Yee and Neil D Mitchell. Generalized additive models in plant ecology. Journal of Vegetation Science, 1991.
  • Rich Caruana et al. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. ACM, 2015.
  • Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. Science, 2009.

The model proposed takes the form of a linear additive combination of constituents. Each constituent in the model takes the following form, where a is an input parameter, f is a differentiable function, and x are free-form variables.

This form was chosen because of its simplicity (the function f in the previous equation can be a sine, cosine, exponential, etc.) and is thought to be a ”building block” from which highly complex models can be built from. Assuming the function in the constituent is easily differentiable, finding the coefficients in the equation that best fit the model to the output is done by using Gradient Decent (GD) or any similar optimization method. It is also possible to capture the relationships in between parameters in this model M. We can form this in our model by the extension of the constituent form to two dimensions, as show below.

We can then extend this to multiple dimensions n, and therefore capture relationships between any number of parameters in one constituent C.

Our final model M is therefore the summation of a defined number m of constituents, each having the ability to incorporate multiple parameters as their input.

Finally, we define our cost function that will allow us to iteratively maximize the correlation between the model M and the desired output O, as M evolves and grows in complexity.

Installation

The easiest way to install Athena and all its dependencies is through pip:

pip install git+git://github.com/arabiaweather/athena.git

Development

You can clone the Athena Github repository through the following command. The command will clone the entirety of the Athena repository into a folder named athena.

git clone https://github.com/arabiaweather/athena.git

You can then modify the library as you see fit, then install the library locally through the following pip command.

pip install -e ./athena

When modifying the library, please keep in mind the LGPL licence restrictions.

Building your first equation

Working with Athena can be as simple or as advanced as you need it to be. To demonstrate Athena's equation building capabilities, we'll fit a straight line to noisy data.

x = numpy.linspace(0.0, 1.0, 100)
y = x + numpy.random.uniform(-0.1, 0.1, *x.shape)
df = pandas.DataFrame(data={"x": x, "y": y})

Everything in Athena starts and ends with a Framework. Optimization hyper-parameters are defined inside it, and your data-set and model are attached to it.

fw = Framework()
A, B = split_dataframe(df, 0.9)
fw.add_dataset(Dataset(A, B))

Here comes the fun part: Athena has built in hundreds of equation types that you can add, multiply, and composite together. We'll add the FlexiblePower and Bias functions to our model to form a straight line equation.

model = AdditiveModel(fw)
model.add(Bias)
model.add(FlexiblePower, "x")
fw.initialize(model, A["y"].values)

The only part left to get your equation is to train your model; this part can be sped up dramatically by using a CUDA-enabled GPU or by running Athena on a cluster. The result is very close to a straight line equation!

fw.train()
print(fw.produce_equation())
> y = 0.990 * x**0.981 - 0.005

The resulting equation can be pretty printed to a Python notebook, or better yet, can be converted to LaTeX for use in an academic paper easily.

Diving into Athena

What makes any open source project great is the contributions of the community. Below are many great tutorials (in the form of Python notebooks) that show real world examples of powerful equation building and modelling techniques. You can contribute to this list too by submitting a pull request.