A library for distributed execution of workflows submitted through Columbus workflow engine. The library methods are intended to be used in the code composed for Components and Combiners of the Columbus platform.
Refer to the API for the methods that can be used in the code, here.
- Linux based OS
- Python 2.7
This section shows the instructions to instal the Columbus worker on Google Compute Engine instance running Debian 8 Jessie.
-
Installing prerequisites
$ sudo apt-get update $ curl -O https://bootstrap.pypa.io/get-pip.py $ sudo python get-pip.py $ sudo pip install virtualenv $ sudo apt-get install -y build-essential libmysqlclient-dev python-dev $ sudo apt-get install -y libssl-dev libffi-dev libxml2-dev libxslt-dev
-
Installing virtual environment
$ virtualenv --no-site-packages -p /usr/bin/python2.7 /home/$USER/venv $ cd /home/$USER/venv $ source bin/activate
-
Installing the worker
(venv)$ wget https://github.com/jkachika/columbus-worker/archive/master.zip (venv)$ unzip master.zip (venv)$ mv columbus-worker-master columbus-worker (venv)$ cd columbus-worker (venv)$ python setup.py sdist (venv)$ pip install --upgrade dist/columbusworker-0.1.0.tar.gz
-
Running the worker (master ip address and port number are required)
(venv)$ nohup python -m colorker <master-ip> <port-number> ./colorker.log &
-
Deactivating virtual environment
(venv)$ deactivate
Columbus supports the following output types - CSV List, Feature, Feature Collection, Multi Collection, and Blob. All output data is transferred as is to subsequent elements in the workflow.
Data is represented a list of python dictionaries.
[{'car_speed': 10.54, 'ch4': 3.56, 'locality':'Fort Collins'},
{'car_speed': 11.10, 'ch4': 6.5, 'locality': 'Denver'} ...]
Data is represented as geojson for Feature and must be an instance of geojson.Feature. "properties" must be a simple dictionary with key as string and value as any of the primitive types however a Feature can include any picklable value as part of its dictionary apart from "geometry", "properties" and "type".
>>> from geojson import Feature, Point
>>> my_point = Point((-3.68, 40.41))
>>> Feature(geometry=my_point) # doctest: +ELLIPSIS
{"geometry": {"coordinates": [-3.68..., 40.4...], "type": "Point"},
"properties": {}, "type": "Feature"}
Data is represented as geojson for FeatureCollection and must be an instance of geojson.FeatureCollection. Must contain "columns" dictionary property as part of its dictionary and it should have the property names of the features in the FeatureCollection as its keys and the data type of those properties as values. A FeatureCollection can include any picklable value as part of its dictionary apart from "features", "columns" and "type".
>>> from geojson import Feature, Point, FeatureCollection
>>> my_feature = Feature(geometry=Point((1.6432, -19.123)))
>>> my_feature["properties"]["temperature"] = 32.5
>>> my_other_feature = Feature(geometry=Point((-80.234, -22.532)))
>>> my_other_feature["properties"]["temperature"] = 20.7
>>> myftc = FeatureCollection([my_feature, my_other_feature]) # doctest: +ELLIPSIS
>>> myftc["columns"] = {"temperature" : "FLOAT"}
>>> print myftc
{"features": [{"geometry": {"coordinates": [1.643..., -19.12...], "type": "Point"},
"properties": {"temperature": 32.5},
"type": "Feature"
},
{"geometry": {"coordinates": [-80.23..., -22.53...], "type": "Point"},
"properties": {"temperature": 20.7},
"type": "Feature"
}],
"columns": {"temperature" : "FLOAT"},
"type": "FeatureCollection"
}
Data is represented as a python List of geojson.FeatureCollections
Data is represented as any pickable python object.
Scripts should make use of the internal variables __input__
to get the input data and assign the output to __output__
to make the data available to its dependents.
Reading data for a root component
csv_list = __input__
Reading data for a non-root component having component-1
and combiner-1
as its parents.
component-1
and combiner-1
are id values of the parent component and combiner respectively
parent1 = __input__["component-1"]
parent2 = __input__["combiner-1"]
Reading data for a combiner
a_list = __input__["workflow"]
To write data, build a structure of the chosen output type and assign it to __output__
__output__ = csv_list
To get the fusion table key in components whose parents are visualizers.
>>> component_ftkey = __input__["ftkey"]["component-1"]
>>> combiner_ftkey = __input__["ftkey"]["combiner-1"]
>>> print component_ftkey
10tSob7imDONyigihnAamYK7kmidDz2l6H5b1qVSf
>>> print combiner_ftkey
1oMf16v9Iw4lmoOLKmjRB5hnZIXVVcWfK_rGHrtC7,1QsFzkZJtLkBeF0NkN_piGjdXl_JEnxnCk__LAgSK
For MultiCollection output, a single string will have all fusion table keys separated by comma. In the above example,
combiner-1
is a visualizer that produces MultiCollection output andcomponent-1
is also a visualizer that produces a FeatureCollection as its output.
Reading fusion tables of a workflow in a combiner
a_list = __input__["ftkey"] # list of fusion table identifiers
Columbus makes use of Google Earth Engine to do all the GIS computations. Below are a few examples to obtain the earth engine from Columbus, getting the Columbus compatible geojson for the Earth Engine's FeatureCollection and how to use fusion tables with Earth Engine.
Obtaining Google Earth Engine to do GIS computations.
from colorker.security import CredentialManager
ee = CredentialManager.get_earth_engine()
Obtaining GeoJSON from Earth Engine FeatureCollection.
from colorker.service.gee import get_geojson
ftc = ee.FeatureCollection('ft:1oMf16v9Iw4lmoOLKmjRB5hnZIXVVcWfK_rGHrtC7')
ftc_geojson = get_geojson(ftc)
Using Fusion Table with Google Earth Engine.
ftkey = __input__['ftkey']['component-1']
# Loading data from fusion table
ftc = ee.FeatureCollection('ft:' + str(ftkey))
Columbus allows you to send an email from with in your script so you can decide when to send an email and completely take control of how the email content should look - whether to use plain text or rich HTML.
from colorker.service.email import send_mail
send_email(['abc@xyz.com', 'def@pqr.com'], 'Subject of the Message',
'Hi There! This is a plain text message body',
'<b>Hi There!</b><br/><p>This is a HTML message body</p>')