Skip to content

4Quant/pyqae

Repository files navigation

pyqae Build Status Coverage Status Binder

python/pyspark image query analysis engine

Pyqae is a python-based tool for processing

install

The core pyqae package defines core data structures and read/write operations for image stacks

Conda

Create a new environment using the environment file in the binder folder

conda env create -f binder/environment.yml

Install the remaining packages and tools using pip in the root directory of the package

pip install .

Binder/Docker

You can use repo2docker to make a self-contained docker image directly from this repository

pip install jupyter-repo2docker

Dry-run

You can see what will be built by performing a dry run in the local directory

repo2docker --debug --no-build .

or build and run the image using

repo2docker .

other notes

It is built on numpy, scipy, scikit-learn, and scikit-image, and is compatible with Python 2.7+ and 3.4+. You can install it using:

The official procedure for installation is first running

pip install -r requirements.txt

And then running

python setup.py install

related packages

There are a number of different tools which pyqae utilizes for analysis

  • keras deep learning wrapper tools
  • elephas distribution code for ML and Keras
  • tensorflow core deep learning code
  • thunder thunder project for image and sequence analysis

You can install the ones you want with pip, for example

pip install thunder-python

using with spark

Thunder doesn't require Spark and can run locally without it, but Spark and Thunder work great together! To install and configure a Spark cluster, consult the official Spark documentation. Thunder supports Spark version 1.5+, and uses the Python API PySpark. If you have Spark installed, you can install Thunder just by calling pip install thunder-python on both the master node and all worker nodes of your cluster. Alternatively, you can clone this GitHub repository, and make sure it is on the PYTHONPATH of both the master and worker nodes.

Once you have a running cluster with a valid SparkContext — this is created automatically as the variable sc if you call the pyspark executable — you can pass it as the engine to any of Thunder's loading methods, and this will load your data in distributed 'spark' mode. In this mode, all operations will be parallelized, and chained operations will be lazily executed.

using notebooks with pyspark

PYSPARK_PYTHON=/Users/mader/anaconda/bin/python PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip 0.0.0.0" /Applications/spark-2.1.1-bin-hadoop2.7/bin/pyspark --driver-memory 8g --master local[8]

using an environment

PYSPARK_PYTHON=/Users/mader/anaconda/envs/py27/bin/python PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip 0.0.0.0" /Applications/spark-2.1.1-bin-hadoop2.7/bin/pyspark --driver-memory 8g --master local[8]

or the old version

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook /Volumes/ExDisk/spark-2.0.0-bin-hadoop2.7/bin/pyspark

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published