Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Progress Logger #8105

Closed
wants to merge 2 commits into from
Closed

[WIP] Progress Logger #8105

wants to merge 2 commits into from

Conversation

Erotemic
Copy link
Contributor

Reference Issue

This is start of work to address the issues in #78 #7574

What does this implement/fix? Explain your changes.

This adds an initial draft of the ProgIter object. This object
wraps around long-running loops and reports progress in a simple but customization way.
The purpose is meant to simplify writing progress messages and create a construct that can simply be dropped into existing code.

I've added in usage of this object into kmeans++ and MiniBatchKmeans to address my original use case. The case with MiniBatchKmeans requires a bit more customization because other useful messages in the loop. However, the case with kmeans++ where the change is extremely minimal and really demonstrates how this can just be dropped into existing code.

Any other comments?

While this feature does work end-to-end it is not complete.
Because this is a work in progress I have left some debugging code, references to my utility library (utool), and documentation constructs that I use in my coding workflow to quickly test things.

I don't see myself able to work much on this feature in the near future, but I did want to push what I had so far because (a) just to put it out there, (b) so I don't lose it, and (c) in case it is useful to someone else.

  • Basic Usage
    Basic usage of the object looks like this:
    >>> from sklearn.externals.progiter import ProgIter
    >>> def is_prime(n):
    ...     return n >= 2 and not any(n % i == 0 for i in range(2, n))
    >>> for n in ProgIter(range(10000)):
    >>>     # do some work
    >>>     is_prime(n)
    10000/10000... rate=13294.94 Hz, eta=0:00:00, total=0:00:00, wall=13:34 EST 

By default the output is continually updated by clearing the previous line, so at the end only the last progress message shows.

  • Demo Function
    The progiter module itself contains a small demo function to demonstrate how it works. Running the demo results in this output:
-----
Demo #0: progress can be disabled and incur essentially 0 overhead
However, the overhead of enabled progress is minimal and typically insignificant
this is verbosity mode verbose=0

tic(u'demo0')
...toc(u'demo0')=0.6284s

-----
Demo #1: progress is shown by default in the same line
this is verbosity mode verbose=1

tic(u'demo1')
demo1 1000/1000... rate=1584.12 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
...toc(u'demo1')=0.6314s

-----
Demo #2: clearline=False prints multiple lines.
Progress is only printed as needed
Notice the adjustment behavior of the print frequency
this is verbosity mode verbose=2

tic(u'demo2')
demo2    0/1000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:20 EST 
demo2    1/1000... rate=2702.52 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo2  257/1000... rate=1522.38 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo2  642/1000... rate=1560.16 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo2 1000/1000... rate=1573.36 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
...toc(u'demo2')=0.6357s

-----
Demo #3: Adjustments can be turned off to give constant feedback
this is verbosity mode verbose=3

tic(u'demo3')
demo3    0/1000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:20 EST 
demo3  100/1000... rate=1476.12 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  200/1000... rate=1427.07 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  300/1000... rate=1479.27 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  400/1000... rate=1505.12 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  500/1000... rate=1491.30 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  600/1000... rate=1515.48 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  700/1000... rate=1512.45 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  800/1000... rate=1514.54 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3  900/1000... rate=1515.45 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
demo3 1000/1000... rate=1516.72 Hz, eta=0:00:00, total=0:00:00, wall=13:20 EST 
...toc(u'demo3')=0.6594s
  • Usage in MiniBatchKMeans
    An example showing how this works in the context of MiniBatchKMeans is here:
        >>> from sklearn.cluster import MiniBatchKMeans
        >>> from sklearn.datasets.samples_generator import make_blobs
        >>> import numpy as np
        >>> n_clusters = 4000
        >>> X, true_labels = make_blobs(n_samples=int(1E5), centers=n_clusters,
        ...                             cluster_std=1., random_state=42)
        >>> mbkm = MiniBatchKMeans(n_clusters=n_clusters,
        ...                        init_size=3 * n_clusters, n_init=2,
        ...                        random_state=0, verbose=2).fit(X)
        >>> print('mbkm.labels_ = %r' % (mbkm.labels_,))

First with verbose=2, which means the ProgIter object will not clear progress lines and always just append the next line to stdout.

Init cluster centers with k-means++
kmeans++    0/4000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:24 EST 
kmeans++    1/4000... rate=192.34 Hz, eta=0:00:20, total=0:00:00, wall=13:24 EST 
kmeans++  192/4000... rate=931.83 Hz, eta=0:00:04, total=0:00:00, wall=13:24 EST 
kmeans++  480/4000... rate=1034.86 Hz, eta=0:00:03, total=0:00:00, wall=13:24 EST 
kmeans++ 1117/4000... rate=1084.50 Hz, eta=0:00:02, total=0:00:01, wall=13:24 EST 
kmeans++ 1125/4000... rate=1084.81 Hz, eta=0:00:02, total=0:00:01, wall=13:24 EST 
kmeans++ 2250/4000... rate=1107.09 Hz, eta=0:00:01, total=0:00:02, wall=13:24 EST 
kmeans++ 2260/4000... rate=1107.16 Hz, eta=0:00:01, total=0:00:02, wall=13:24 EST 
kmeans++ 3390/4000... rate=1112.93 Hz, eta=0:00:00, total=0:00:03, wall=13:24 EST 
kmeans++ 4000/4000... rate=1114.58 Hz, eta=0:00:00, total=0:00:03, wall=13:24 EST 
Inertia for init 1/2: 169.317605
Init cluster centers with k-means++
kmeans++    0/4000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:24 EST 
kmeans++    1/4000... rate=3412.78 Hz, eta=0:00:01, total=0:00:00, wall=13:24 EST 
kmeans++  257/4000... rate=975.97 Hz, eta=0:00:03, total=0:00:00, wall=13:24 EST 
kmeans++  642/4000... rate=1059.49 Hz, eta=0:00:03, total=0:00:00, wall=13:24 EST 
kmeans++ 1123/4000... rate=1044.57 Hz, eta=0:00:02, total=0:00:01, wall=13:24 EST 
kmeans++ 2050/4000... rate=1060.59 Hz, eta=0:00:01, total=0:00:01, wall=13:24 EST 
kmeans++ 2160/4000... rate=1063.67 Hz, eta=0:00:01, total=0:00:02, wall=13:24 EST 
kmeans++ 3240/4000... rate=1074.09 Hz, eta=0:00:00, total=0:00:03, wall=13:24 EST 
kmeans++ 3285/4000... rate=1074.54 Hz, eta=0:00:00, total=0:00:03, wall=13:24 EST 
kmeans++ 4000/4000... rate=1082.62 Hz, eta=0:00:00, total=0:00:03, wall=13:24 EST 
Inertia for init 2/2: 169.461913
Begining mini-batch iterations
minibatch      0/100000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:24 EST 
minibatch      1/100000... rate=24.79 Hz, eta=1:07:13, total=0:00:00, wall=13:24 EST inertias: batch=0.024289, ewa=0.024289 
[MiniBatchKMeans] Reassigning 50 cluster centers.
minibatch     12/100000... rate=37.27 Hz, eta=0:00:00, total=0:00:00, wall=13:24 EST inertias: batch=0.022055, ewa=0.024292 
Converged (lack of improvement in inertia) at iteration 12/100000
Computing label assignment and total inertia
labels inertia    0/1000... rate=0.00 Hz, eta=?, total=0:00:00, wall=13:24 EST 
labels inertia    1/1000... rate=348.57 Hz, eta=0:00:02, total=0:00:00, wall=13:24 EST 
labels inertia  257/1000... rate=382.73 Hz, eta=0:00:01, total=0:00:00, wall=13:24 EST 
labels inertia  382/1000... rate=382.87 Hz, eta=0:00:01, total=0:00:00, wall=13:24 EST 
labels inertia  764/1000... rate=389.35 Hz, eta=0:00:00, total=0:00:01, wall=13:24 EST 
labels inertia  792/1000... rate=389.13 Hz, eta=0:00:00, total=0:00:02, wall=13:24 EST 
labels inertia 1000/1000... rate=390.99 Hz, eta=0:00:00, total=0:00:02, wall=13:24 EST 
mbkm.labels_ = array([ 972, 1783,  797, ..., 3203, 3363, 2020], dtype=int32)

Verbosity 2 can be a bit much, so when verbosity=1 a clearline sequence will tell the terminal to delete the previous verbosity line before it prints the next one. This has a nice where you see a single line in the terminal update every once in awhile. Obviously I can't show updates in static text, but at the end of the script it looks like this:

Init cluster centers with k-means++
kmeans++ 4000/4000... rate=1260.10 Hz, eta=0:00:00, total=0:00:03, wall=13:28 EST 
Inertia for init 1/2: 169.317605
Init cluster centers with k-means++
kmeans++ 4000/4000... rate=1243.40 Hz, eta=0:00:00, total=0:00:03, wall=13:28 EST 
Inertia for init 2/2: 169.461913
Begining mini-batch iterations
minibatch      1/100000... rate=24.85 Hz, eta=1:07:03, total=0:00:00, wall=13:28 EST inertias: batch=0.024289, ewa=0.024289 
[MiniBatchKMeans] Reassigning 50 cluster centers.
minibatch     12/100000... rate=36.89 Hz, eta=0:00:00, total=0:00:00, wall=13:28 EST inertias: batch=0.022055, ewa=0.024292 
Converged (lack of improvement in inertia) at iteration 12/100000
Computing label assignment and total inertia
labels inertia 1000/1000... rate=451.42 Hz, eta=0:00:00, total=0:00:02, wall=13:28 EST 
mbkm.labels_ = array([ 972, 1783,  797, ..., 3203, 3363, 2020], dtype=int32)

@Erotemic Erotemic changed the title [WIP] Progress Logger Iterator Created [WIP] Progress Logger Dec 22, 2016
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be achieved by a custom log handler and structured (e.g. JSON) log messages using the logging module.

iterable : sequence
A python iterable

label : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*length

@glemaitre
Copy link
Member

I'm closing this PR since we will need to rethink the API. At the time of speaking, I can think of two ways that could be discuss: via the callback or reusing the verbose parameter. But we will need to restart from scratch.

@glemaitre glemaitre closed this Mar 12, 2024
@Erotemic
Copy link
Contributor Author

This makes sense, I'll note that progiter has been put into a separate standalone repo / package: https://github.com/Erotemic/progiter

If this feature is desired (and I think it is) it likely makes sense to have some callback that any modern progress manager like tqdm, rich, or progiter can use.

@glemaitre
Copy link
Member

If this feature is desired (and I think it is) it likely makes sense to have some callback that any modern progress manager like tqdm, rich, or progiter can use.

We indeed leaning towards getting a progress bar monitoring using rich.

@Erotemic thank you for the info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants