Skip to content

ohsu-comp-bio/tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Coverage Status License: MIT

tesseract

tesseract is a library that enables the remote execution of python code on systems implementing the GA4GH Task Execution API.

Install

Available on PyPi.

pip install py-tesseract

Quick Start

from __future__ import print_function

from tesseract import Tesseract, FileStore


def identity(n):
    return n


def say_hello(a, b):
    return "hello " + identity(a) + b


fs = FileStore("./test_store/")
r = Tesseract(fs, "http://localhost:8000")
r.with_resources(
    cpu_cores=1, ram_gb=4, disk_gb=None, 
    docker="python:2.7", libraries=["cloudpickle"]
)

future = r.run(say_hello, "world", b="!")
result = future.result()
print(result)

r2 = r.clone().with_resources(cpu_cores=4)
f2 = r2.run(say_hello, "more", b="cpus!")
r2 = f2.result()
print(r2)

Object store support

If you provide a swift, s3, or gs bucket url to your FileStore tesseract_ will attempt to automatically detect your credentials for each of these systems.

To setup your environment for this run the following commands:

  • Google Storage - gcloud auth application-default login
  • Amazon S3 - aws configure
  • Swift - source openrc.sh

Input files

If your function expects input files to be available at a given path then add:

r.with_input("s3://your-bucket/path/to/yourfile.txt", "/home/ubuntu/yourfile.txt")

The first argument specifies where the file is available, the second specifies where your function will expect to find the file.

Output files

If your function generates files during its run you can specify these files as shown below and tesseract will handle getting them uploaded to the designated bucket.

r.with_output("./relative/path/to/outputfile.txt")

Resources