GitHub - m-anish/autonomio: A very high level augmented intelligence API

Autonomio provides a very high level abstraction layer for rapidly testing research ideas and instantly creating neural network based decision making models. Autonomio is built on top of Keras, using Tensorflow as a backend and spaCy for word vectorization. Autonomio brings deep learning and state-of-the-art linguistic processing accessible to anyone with basic computer skills.

This document focus on a high-level overview of Autonomio's capabilities. If you're looking for the User Manual, you may want to look read the docs instead.

Key Features

intuitive single-command neural network training
training command accepts as little as 'x' and 'y' as inputs
'x' can be text, continues or categorial data
even 'y' and 'x' both as text yields a succesful result
15 optional configurations from a single command
seamlessly integrates word2vec with keras deep learning
interactive plots specifically designed for deep learning model evaluation

For most use cases succesfully running a state-of-the-art neural network works out of the box with zero configuration yielding a model that can be used to predict outcomes later.

Deep learning in two simple commands

For first time use:

python -m spacy download en

Open a jupyter notebook (or python console) and type:

from autonomio.commands import *
%matplotlib inline

train(x,y,data,labels)

Even if 'x' is unstructured/text, this command will yield a functional neural network trained to predict the 'y' variable. 'y' can be continuous, categorical or binary. The model can be saved and then used to predict other data without training:

test(x,data)

This will yield a pandas dataframe with the values and whichever label you were connecting the value with.

Slightly more involving use

To train a neural network, and then use it for making a prediction on a different dataset is almost as easy as the first example. This time let's also introduce some of the command parameters available for 'train' function.

train('text','quality_score',
      tweets.head(3000),
      epoch=10,
      dropout=.5,
      flatten=.3,
      save_model=True,
      verbose=0)

Instead of the default 5 epochs, we're setting epoch to 10 and increase dropout rate between layers to 50%. Also instead of using the default flattening (transforming y feature to 0 and 1), we take only the bottom 30% in the inter quartile range.

Standard training output

System configuration

Autonomio have been tested in various Ubuntu and Mac system with success using the provided setup scripts.

Minimum setup

You need a machine with at least 4gb of memory if you want to do text processing, and othewrise 2gb is totally fine and 1gb might be ok. Actually very low spec AWS instance runs Autonomio just fine.

Recommended setup

For research and production envrionments we recommend one server with at least 4gb memory as a 'work station' and a separate insatance with high-end CUDA supported GPU. The GPU instance costs roughly $1 per hour, and can be shut down when not used.

As setting up the GPU station from ground can be a bit of a headache, we recommend using the AWS Machine Learning AMI to get setup quickly.

Dependencies

You probably want to use the setup_ubuntu.sh script to automate the process of setting up in a new system.

Here is the list of linux commands to take care of all the depencies.

background

Up until today most of linguistic technologies, not to mention deep learning, have not been accessible in the way that they would allow seamless workflow that supports the need of even less computer savvy researchers. Yet the modern researcher can benefit significantly from unlocking the value in unstructured data, and there are by some estimates 9 times more of unstructured data than structured. Autonomio combines two cutting edge AI technologies - word vectorizing and deep learning - in to one intuitive tool researchers from wide range of backgrounds can benefit from.

performance

Because of the excellent out-of-the-box neural network performance provided by Keras, Autonomio users get state-of-the-art prediction capability with minimal setting configuration. Autonomio has been tested extensively and consistently provides in a single line of code the same result that you would get from Keras with 10 lines of code. Using real data involving bilions of dollars in advertising spend, we've proven autonomio to be cabably of yielding outstanding results in previously unsolved problems such as better than human classifier result for niche website category classification.

user experience

Artificial Intelligence and the signals intelligence method should be accessible to all researchers. Autonomio allows total non-programmers in most cases to easily create advanced neural networks without any data preparation through an easy to memorize single command interface. Autonomio has two commands:

train(x,y,data)

and

test(x,data)

An example of Autonomio's usability factor is how x can be ustructured data, as is the case in an increasing number of challenges research phase in the digital age.

Language processing

Autonomio uses a novel way of processing unstructured data,

pre-process text
use spaCy to vectorize the text
create 300 invididual features from the vector
use the features as a signal in a Keras model

Language support

Autonomio's vectorizing engine spaCy supports currently 13 languages:

English
German
Chinese
Spanish
Italian
French
Portuguese
Dutch
Swedish
Finnish
Hungarian
Bengali
Hebrew

NOTE: the spacy language libraries have to be downloaded each separately.

Read spaCy's language page

Adding new languages

spaCy makes it reletively streamlined to create support for any language and the challenge can (and should be) approached iteratively.

Tested Systems

Autonomio have been tested in several Mac OSX and Ubuntu environments (both server and desktop).

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.ipynb_checkpoints		.ipynb_checkpoints
autonomio		autonomio
docs		docs
non-core		non-core
.travis.sh		.travis.sh
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt
setup.py		setup.py
setup_on_mac.sh		setup_on_mac.sh
setup_on_ubuntu.sh		setup_on_ubuntu.sh
test_script.py		test_script.py

License

m-anish/autonomio

Folders and files

Latest commit

History

Repository files navigation

Key Features

Deep learning in two simple commands

Slightly more involving use

Standard training output

System configuration

Minimum setup

Recommended setup

Dependencies

Data Manipulation

Word Processing

Deep Learning

Visualization

background

performance

user experience

Language processing

Language support

Adding new languages

Tested Systems

About

Resources

License

Stars

Watchers

Forks

Languages