Online quantization algorithm for gudhi #536

tlacombe · 2021-10-15T10:15:52Z

Provide a quantization algorithm to "summarize" a collection of persistence diagrams.

(At least) One thing that may be discussed :

I put the code in the python/gudhi/wasserstein/ repo, because it is of a "Wasserstein metric" flavor (we minimize something in terms of Wasserstein distance between persistence diagrams). However, it does not rely on POT as other functions in this repo do ; we actually never need to explicitly compute a Wasserstein distance/matching explicitly. Perhaps would it belong directly to the gudhi/ repo ?

Also TODO :

Check for quantization.py : is the copyright correct?

src/python/doc/wasserstein_distance_user.rst

biblio/bibliography.bib

src/python/doc/wasserstein_distance_user.rst

src/python/gudhi/wasserstein/quantization.py

src/python/doc/wasserstein_distance_user.rst

Co-authored-by: Vincent Rouvreau <10407034+VincentRouvreau@users.noreply.github.com>

…evel into quantization_v2

tlacombe · 2021-12-01T10:56:06Z

Corrected.
Any thought on if this code belong to the /wasserstein module? From a theoretical perspective, the goal is to solve some minimization with respect to the Wasserstein distance between persistence diagrams (so it makes sense to put it there), but from a practical perspective (code), it does not rely on pot contrarily to other functions in the /wasserstein module (because we can solve our optimization problem without explicitly computing such distances/matchings).
My feeling is that it can stay there, doing gudhi.wasserstein.quantization makes clear that we are quantizing something with respect to the Wasserstein distance ; but I am open to discussion of course.

mglisse

I guess keeping it in wasserstein/ is ok.

src/python/doc/wasserstein_distance_user.rst

mglisse · 2022-01-15T12:13:53Z

src/python/doc/wasserstein_distance_user.rst

+different tori with some additional noise.
+Starting from an initial codebook ``c0``, centroids are iteratively updated as new diagrams are provided.
+As we use the standard metrics between persistence diagrams (denoted here by :math:`\mathrm{OT}_2`), points in the
+diagrams that are close to the diagonal do not interfere in the codebook update process.


So it is the same as having an implicit point on the diagonal in the codebook?

More precisely, having a point in the codebook that represents "all the points on the diagonal" (or, formally, looking at the quotient space where you identify the points on the diagonal).

src/python/doc/wasserstein_distance_user.rst

src/python/gudhi/wasserstein/quantization.py

Co-authored-by: Marc Glisse <marc.glisse@inria.fr>

tlacombe · 2022-05-18T09:18:20Z

I just realized that I never managed to do the last requested modifications (my local build was broken for some reason at that time).
I finally did it.
As I'm working on a new machine, I hope I managed correctly the fork/branching/etc.

PS : and one day later I realize that I forgot to post this comment... 😴

mglisse

The algorithm is presented as on online algorithm. So it should be normal to give it some data, look at the codebook at that point, pass it more data, look at the updated codebook, etc. The init parameter could be used towards that goal, but the number of diagrams (or batches) already processed is forgotten, and indeed t (the learning rate) is reset to 0 at every call.

mglisse · 2022-06-20T19:17:10Z

src/python/doc/wasserstein_distance_user.rst

+(the two loops generating the tori).
+
+.. figure::
+     ./img/quantiz.gif


On the one hand, the GIF is cool. On the other hand, I have trouble reading the doc with that thing moving on my screen...

mglisse · 2022-06-20T19:30:09Z

src/python/gudhi/wasserstein/quantization.py

+    if withdiag:
+        a = np.argmin(M[:-1, :], axis=1)
+    else:
+        a = np.argmin(M[:-1, :-1], axis=1)


It feels a bit strange to call _build_dist_matrix, whose main difference with cdist is that it adds the diagonal, just to drop the diagonal immediately... But I don't think it really matters.

mglisse · 2022-06-20T19:32:12Z

src/python/gudhi/wasserstein/quantization.py

+        X_batch = np.concatenate(list_of_non_empty_diags)
+        return X_batch
+    else:
+        return np.array([])


It is sometimes useful to force the shape of empty arrays, to (0,2) for instance. I don't know if that's the case here.

mglisse · 2022-06-20T19:34:40Z

src/python/gudhi/wasserstein/quantization.py

+    :param internal_p: Ground metric to assess centroid affectation. Default is ``2.``.
+    :type internal_p: ``float``
+
+    :returns: The final codebook obtained after going through the all pdiagset.


:rtype: kx2 numpy array?

mglisse · 2022-06-20T19:36:12Z

src/python/gudhi/wasserstein/quantization.py

+
+def _init_c(pdiagset, k, internal_p=2):
+    """
+    A naive heuristic to initialize a codebook: we take the k points with largest distances to the diagonal


What if the first diagram has fewer than k points?

mglisse · 2022-06-20T19:40:25Z

src/python/gudhi/wasserstein/quantization.py

+    :param batch_size: Size of batches used during the online exploration of the ``pdiagset``.
+                        Default is ``1``.


As a user, should I stick to the default value of 1? If I already have all the diagrams, I may think that I don't need an online algorithm, which is for when data appears progressively, and consider using one huge batch under the impression that it disables the "online" stuff and gets the best result.

mglisse · 2022-06-20T19:48:18Z

src/python/gudhi/wasserstein/quantization.py

+                    # stochastic-gradient-descent like approach (decreasing learning rate).
+                    c_current[j] = c_current[j] - grad / (t + 1)
+        else:
+            raise NotImplemented('Order = %s is not available yet. Only order=2. is valid' %order)


I think you could error out earlier (or not provide this option at all and just say that it is W2).

tlacombe added 2 commits October 15, 2021 11:59

online quantization algorithm for gudhi

1b7b8bc

added header to quantization.py with Université Gustave Eiffel copyright

98a5b84

mglisse reviewed Nov 3, 2021

View reviewed changes

src/python/doc/wasserstein_distance_user.rst Show resolved Hide resolved

src/python/doc/wasserstein_distance_user.rst Show resolved Hide resolved

VincentRouvreau reviewed Nov 8, 2021

View reviewed changes

tlacombe and others added 7 commits November 10, 2021 16:26

Update biblio/bibliography.bib

f865e8c

Co-authored-by: Vincent Rouvreau <10407034+VincentRouvreau@users.noreply.github.com>

Update src/python/gudhi/wasserstein/quantization.py

59dee61

Co-authored-by: Vincent Rouvreau <10407034+VincentRouvreau@users.noreply.github.com>

Adding "Since GUDHI 3.5" in the file.

60844d4

Co-authored-by: Vincent Rouvreau <10407034+VincentRouvreau@users.noreply.github.com>

Merge remote-tracking branch 'upstream/master' into quantization_v2

5c36e72

updated wasserstein/__init__.py

797b7cf

Merge branch 'quantization_v2' of https://github.com/tlacombe/gudhi-d…

3f41b5e

…evel into quantization_v2

changed tutorials : one tutorial per section, at the end of the section

892b351

This comment was marked as resolved.

Sign in to view

updated date in copyright

a7f0f11

mglisse reviewed Jan 15, 2022

View reviewed changes

tlacombe and others added 8 commits January 16, 2022 22:48

Update src/python/doc/wasserstein_distance_user.rst

6fcba58

Co-authored-by: Marc Glisse <marc.glisse@inria.fr>

Update src/python/doc/wasserstein_distance_user.rst

f21abce

Co-authored-by: Marc Glisse <marc.glisse@inria.fr>

added _utils.py file in gudhi/wasserstein with some global functions.

cc8d8c1

changed standard distances to Wasserstein distances

231a3ca

refactor following the creation of _utils.py

94e0df4

update quantization including final remarks...

57c5de6

changed how funcs from _utils are imported.

9e3eaa1

changed how funcs from _utils are imported bis.

48d7909

mglisse reviewed Jun 20, 2022

View reviewed changes

mglisse marked this pull request as draft July 8, 2023 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online quantization algorithm for gudhi #536

Online quantization algorithm for gudhi #536

tlacombe commented Oct 15, 2021 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

tlacombe commented Dec 1, 2021

mglisse left a comment

mglisse Jan 15, 2022

tlacombe Jan 16, 2022

tlacombe commented May 18, 2022 •

edited

mglisse left a comment

mglisse Jun 20, 2022

mglisse Jun 20, 2022

mglisse Jun 20, 2022

mglisse Jun 20, 2022

mglisse Jun 20, 2022

mglisse Jun 20, 2022

mglisse Jun 20, 2022

		:param batch_size: Size of batches used during the online exploration of the ``pdiagset``.
		Default is ``1``.

Online quantization algorithm for gudhi #536

Are you sure you want to change the base?

Online quantization algorithm for gudhi #536

Conversation

tlacombe commented Oct 15, 2021 • edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

tlacombe commented Dec 1, 2021

mglisse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlacombe commented May 18, 2022 • edited

mglisse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlacombe commented Oct 15, 2021 •

edited

tlacombe commented May 18, 2022 •

edited