Python-Nats Pipeline Exploration

Overview

[STATUS: DEV / ALPHA]

Project to explore the use of python + nats to implement a scalable, low latency, distributed, computational pipeline using Python Asyncio and NATS.io.

The aim is to develop a generic implementation to:

Explore the effect of synchronous code in pipeline efficiciency.
Explore the effect of allocating finite resources across pipeline stages.
Provide source data to enable a structured analysis approach to be built.
Provide insight into pipeline optimisation strategy.

The example pipeline has been kept deliberately generic. Namely a pipeline Pn is composed of 4 sequential stages, S1 .. S4. Each stage taking a defined amount of time.

Five separate versions of the pipeline have been implemented:

P0 - No messaging, in-process, sequential implementation to provide a benchmark to assess subsequent implementations.

P1 - Implemented as a single process with synchronous / blocking calls within each stage.

P2 - Implemented with all blocking calls replaced with asynchronous non-blocking equivalients.

P3 - Implemented with each stage having its own dedicated process.

P4 - Implemented with each stage having 1 or more dedicated processes.

Future

Implement with a dockerised version of RabbitMQ for comparison
Implement with a dockerised version of Kafka for comparison
Implement using Docker Compose / Kubernetes to explore pipeline containerisation.
Build a DSL for Pipeline and Stage construction.
Build production-ready version to include enhanced monitoring / logging etc.
Use the project as part of broader research / teaching projects.
For completeness implement multiple process pipeline with sync calls (1 per stage, 1+ per stage)?
???

Summary findings

NATS.io & Python provide an easy route to implement a Staged message driven pipeline.
The stages can be easily implemented in process or using multiple processes over 1 or more nodes.
To improve pipeline efficiency / performance:

Implement logging to support pipeline analysis.
Baseline sequential in-process performance.
Baseline initial / current pipeline version.
Identify and remediate blocking parts of each stage.
Separate processes for each stage.
Experiment with different scaling options across each pipeline stage. [More to follow.]

See Log Analysis workbook for more detailed analysis.

Notes

NATS.io provides limited delivery guarantees (at most once), explore NATS-Streaming (at least once) [https://nats.io/documentation/streaming/nats-streaming-intro/].
Durability requirements may necessitate evaluation of alternatives depending on the use-case.
There is an open issue re the resolution of asyncio wrt time [https://bugs.python.org/issue31539] this manifests as some of the analysis showing stages taking less than their defined minimum durations.

Setup

Requires a running NATS server either in the cloud, local or docker.
Docker image can be started using:

docker run -p 4222:4222 -p 8222:8222 -p 6222:6222 --name gnatsd -ti nats:latest

References

[1] (http://usingcsp.com/cspbook.pdf) "Communicating Sequential Processes"
[2] "Enterprise Integration Patterns - Hohpe and Woolf"

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Version 1 - single process with blocking		Version 1 - single process with blocking
Version 2 - single process all async		Version 2 - single process all async
Version 3 - multiple processes all async		Version 3 - multiple processes all async
Version 4 - multiple scaled processes all async		Version 4 - multiple scaled processes all async
images		images
logs		logs
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
Log Analysis.ipynb		Log Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1 - single process with blocking

Version 1 - single process with blocking

Version 2 - single process all async

Version 2 - single process all async

Version 3 - multiple processes all async

Version 3 - multiple processes all async

Version 4 - multiple scaled processes all async

Version 4 - multiple scaled processes all async

images

images

logs

logs

utils

utils

.gitattributes

.gitattributes

.gitignore

.gitignore

Log Analysis.ipynb

Log Analysis.ipynb

README.md

README.md

Repository files navigation

Python-Nats Pipeline Exploration

Overview

Future

Summary findings

Notes

Setup

References

About

Releases

Packages

Languages

saboyle/python-nats-pipeline-exploration

Folders and files

Latest commit

History

Repository files navigation

Python-Nats Pipeline Exploration

Overview

Future

Summary findings

Notes

Setup

References

About

Topics

Resources

Stars

Watchers

Forks

Languages