Pandas pipeline in graphviz

Python package to build a nice explanative schema of a data processing pipeline in pandas.

It's heavily inspired by dask's .visualize method, but improved with 2 useful features:

visualize columns names in data nodes
highlight created columns at each task

Here is an example from the examples folder:

Installation

Pip

Install with pip:

$ pip install pandas-pipeline-graphviz

Manual installation

Install manually:

git clone
use python setup.py

Usage

Disclaimer

⚠️ WARNING — it's a hack!

There are no reliable methods in python to get variables names, either as input or as output. The methods used in this package are quite hacky, as discussed in this stackoverflow thread.

To build the graph, this package makes use of:

globals() to get the names of input dataframes, doing a comparison between the input dataframes and all the variables available in the global variables.
inspect.stack() to get the name of the output dataframe, gathering the code lines calling the function and parsing it to find the output. Currently it supports only single-output transformations.

Both methods should be considered as experimental and the behavior of the decorator is expected to break easily if it's not used as presented in the examples.

Conditions for use

do not use several decorators on your function, only this decorator, otherwise it will break the output dataframe name detection through inspect.stack()
use only single output transformation functions, i.e. functions which return only 1 dataframe.

Examples

See examples folder in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
pandas_pipeline_graphviz		pandas_pipeline_graphviz
tests		tests
.gitignore		.gitignore
README.md		README.md
build_and_publish.sh		build_and_publish.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

pandas_pipeline_graphviz

pandas_pipeline_graphviz

tests

tests

.gitignore

.gitignore

README.md

README.md

build_and_publish.sh

build_and_publish.sh

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Pandas pipeline in graphviz

Installation

Pip

Manual installation

Usage

Disclaimer

⚠️ WARNING — it's a hack!

Conditions for use

Examples

About

Releases

Packages

Languages

qchenevier/pandas-pipeline-graphviz

Folders and files

Latest commit

History

Repository files navigation

Pandas pipeline in graphviz

Installation

Pip

Manual installation

Usage

Disclaimer

⚠️ WARNING — it's a hack!

Conditions for use

Examples

About

Resources

Stars

Watchers

Forks

Languages