Skip to content

vsoch/watchme-sklearn

Repository files navigation

Watchme Sklearn

This is an example of using watchme, specifically the psutils decorator, to monitor resource usage for various functions run within Python. Since we build the dependencies into a Singularity container, and since Singularity has access to our home, the watcher and data are saved on the host with no extra work needed.

Note I created the watcher repository with watchme first, and then added the extra files for the README.md and container. If you use a decorator, you don't technically need to do this - the Python files being decorated can be separate from the watchme base with results. I wanted to keep them together, so I chose to add these files after.

1. Build the Container

First, build the Singularity container with Python dependencies installed:

sudo singularity build watchme-sklearn.sif Singularity

2. Run

Next, running the container is going to create a watcher called "watchme-sklearn" which by default will go into your $HOME/.watchme folder. You'll see the watcher generated, followed by the function runs.

singularity run watchme-sklearn.sif

Adding watcher /home/vanessa/.watchme/watchme-sklearn...
Generating watcher config /home/vanessa/.watchme/watchme-sklearn/watchme.cfg

=============================================================================
Manifold learning on handwritten digits: Locally Linear Embedding, Isomap...
=============================================================================

An illustration of various embeddings on the digits dataset.

The RandomTreesEmbedding, from the :mod:`sklearn.ensemble` module, is not
technically a manifold embedding method, as it learn a high-dimensional
representation on which we apply a dimensionality reduction method.
However, it is often useful to cast a dataset into a representation in
which the classes are linearly-separable.

t-SNE will be initialized with the embedding that is generated by PCA in
this example, which is not the default setting. It ensures global stability
of the embedding, i.e., the embedding does not depend on random
initialization.

Linear Discriminant Analysis, from the :mod:`sklearn.discriminant_analysis`
module, and Neighborhood Components Analysis, from the :mod:`sklearn.neighbors`
module, are supervised dimensionality reduction method, i.e. they make use of
the provided labels, contrary to other methods.

Computing random projection
Computing PCA projection
Computing Linear Discriminant Analysis projection
Computing Isomap projection
Done.
Computing LLE embedding
Done. Reconstruction error: 1.63546e-06
Computing modified LLE embedding
Done. Reconstruction error: 0.360659
Computing Hessian LLE embedding
Done. Reconstruction error: 0.212804
Computing LTSA embedding
Done. Reconstruction error: 0.212804
Computing MDS embedding
Done. Stress: 157308701.864713
Computing Spectral embedding
Computing t-SNE embedding

The functions are run fairly quickly, so we measure every quarter of a second. Watchme creates the the git repo and commits data to it (each time the decorator function is run, a decorator-psutils-<name> folder is created with a result.json. Every commit will coincide with a list of timepoints run for a single function. Here is what the repository looks like after the run (without adding these files yet):

$ tree
.
├── decorator-psutils-hessian_lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-isomap_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-lda_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-ltsa_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-mds_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-modified_lle_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-pca_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-plot_digits
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-plot_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-random_2d_projection
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-spectral_embedding
│   ├── result.json
│   └── TIMESTAMP
├── decorator-psutils-tsne_embedding
│   ├── result.json
│   └── TIMESTAMP
└── watchme.cfg

13 directories, 27 files

And you would next be able to push directly to a new GitHub repository:

cd $HOME/.watchme/watchme-sklearn
git remote add origin https://github.com/vsoch/watchme-sklearn.git
git push -u origin master
``

(add a README to have better documentation about what you've done). 
Or you can export full data for any particular decorator to analyze:

```bash
watchme export watchme-sklearn decorator-psutils-plot_digits result.json  --json

What is exporter? Each commit coincides Here is a programmatic way to export all results to a "data" folder in the repository:

mkdir -p data
for folder in $(find . -maxdepth 1 -type d -name 'decorator*' -print); do
    folder="${folder//.\/}"
    watchme export watchme-sklearn $folder --out data/$folder.json result.json --json --force
done

Advanced

If you already have a watchme repository, and it's located somewhere non-traditional, you can have watchme generate results in the folder where you happen to be by exporting the WATCHME_BASE_DIR first.

export WATCHME_BASE_DIR=$(dirname $PWD)

And for a run from within a Singularity, container you would need to have this export as a SINGULARITYENV_

export SINGULARITYENV_WATCHME_BASE_DIR=$(dirname $PWD)

About

use watchme psutils monitor wrapper to record metrics for a set of sklearn functions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages