Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masked stochastic watershed #118

Open
anntzer opened this issue Sep 27, 2022 · 8 comments
Open

Masked stochastic watershed #118

anntzer opened this issue Sep 27, 2022 · 8 comments
Labels
component:DIPlib About the DIPlib library (C++ code) enhancement

Comments

@anntzer
Copy link
Contributor

anntzer commented Sep 27, 2022

Component
DIPlib 3.3.0. Actually I use PyDIP but this is a feature request for the core library.

Describe the bug
Would it be possible to add masking functionality to StochasticWatershed (similarly to the mask parameter in Watershed)? Currently I simply run the iterations myself by drawing seeds and using SeededWatershed. It would be slightly more practical if I didn't have to do that, but much more importantly, it would be great if the exact probabilities could also be computed in the masked case.

Thanks again for the great library :)

To Reproduce
N/A

System information:
N/A

@crisluengo
Copy link
Member

The watershed algorithms treat pixels where the mask is 0 exactly the same way as pixels where the input is infinity. So for the non-exact stochastic watershed you could simply set input pixels to infinity instead of using a mask image.

I'm not sure what would happen in the algorithm that computes exact probabilities, where the input is infinity. I guess it might do the right thing, but I have never tried it. Adding a mask image to this algorithm would be quite a challenge, because instead of a single connected graph we would end up with multiple disjoint graphs, and this would need to be taken into account in the MST-based algorithm, which now would be a minimum-spanning forest algorithm... I will have to ask the author of this algorithm about feasibility.

@crisluengo crisluengo added enhancement component:DIPlib About the DIPlib library (C++ code) labels Sep 27, 2022
@anntzer
Copy link
Contributor Author

anntzer commented Sep 28, 2022

The watershed algorithms treat pixels where the mask is 0 exactly the same way as pixels where the input is infinity. So for the non-exact stochastic watershed you could simply set input pixels to infinity instead of using a mask image.

Ah, thanks. It was not obvious for me, looking at the docs, that e.g. StochasticWatershed would know not to put seeds into the infinity region.

I'm not sure what would happen in the algorithm that computes exact probabilities, where the input is infinity. I guess it might do the right thing, but I have never tried it. Adding a mask image to this algorithm would be quite a challenge, because instead of a single connected graph we would end up with multiple disjoint graphs, and this would need to be taken into account in the MST-based algorithm, which now would be a minimum-spanning forest algorithm... I will have to ask the author of this algorithm about feasibility.

I was actually only considering the case of a single connected mask, which is the case I care about, so limiting support to that case would be enough for me, but your point about the general case is well taken.

@crisluengo
Copy link
Member

It was not obvious for me, looking at the docs, that e.g. StochasticWatershed would know not to put seeds into the infinity region.

Well, it will. But those seeds will not do anything. You might need to adjust your nSeeds input parameter to account for the area that is masked out. If you want N seeds within the mask area, then set nSeeds to N * image_area / mask_area. Other than that, things work as expected when setting pixels to infinity.

I just tried the 'exact' method, it looks like it does not stop at infinity pixels. We'd have to update the algorithm to take a mask into account when building the graph. I don't think I can justify doing that amount of work at the moment, though. If you're willing to take a stab at it, I can help with some pointers and so on. But I didn't write that code, and am not at all familiar with it.

@anntzer
Copy link
Contributor Author

anntzer commented Sep 28, 2022

I can try to look into it, but would it be OK to only support the case of connected masks? I don't think I feel up to generalizing to a minimum spanning forest.

@crisluengo
Copy link
Member

Yes, absolutely, just error out if the graph is not a single connected component.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 23, 2022

I looked a bit into this. These are mostly notes for myself...

I think the annoying part of the work is to adjust the Graph class to have a secondary mapping of vertex indices to unmasked pixels (because currently Graph assumes that vertex indices map directly to flat pixel indices). Once this is done, handling masking in CreateGraphLineFilter::Filter and ExactSWLineFilter::ComputePixel, and adding a mask parameter to all relevant functions should be "easy" (though a fair bit of work).

I doubt I'll be able to complete this any time soon, so feel free to close it (or leave it as a long-term todo, as you prefer).

@crisluengo
Copy link
Member

Yeah, I expected this to be significant work. Thank you for looking into it.

I’ll leave this open in case you or anyone else feels like working on it.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 23, 2022

Actually, I went back to your 2014 article; the algorithm on the MST is very simple to implement in Python (<40 lines, including mask handling). I am providing it below for reference (intentionally as a source dump; I don't want to maintain it as a separate package right now), together with an example.

If I understand the paper correctly (from a quick skim), there isn't actually anything special to do to handle the non-connected case; the arguments also hold in that case (implementation-wise we just rely on minimum_spanning_tree actually returning the minimum spanning forest in that case).

How to convert the probabilities on the MST to the probabilities on the pixels is less clear to me, but actually (as mentioned in the paper) the MST is easier to work with, so I'm happy to have that.

import numpy as np
from scipy import sparse
from scipy.cluster.hierarchy import DisjointSet


def esw(img, n, *, mask=None):
    """Exact stochastic watershed; Malmberg & Luengo (2014)."""
    index_to_coords = [
        coords for coords in np.ndindex(*img.shape)
        if mask is None or mask[coords]]
    coords_to_index = {
        coords: idx for idx, coords in enumerate(index_to_coords)}
    sz = len(index_to_coords)
    adj = sparse.dok_array((sz, sz), img.dtype)
    for a in coords_to_index:
        for d in range(img.ndim):
            b = a[:d] + (a[d] + 1,) + a[d+1:]
            if b not in coords_to_index:
                continue
            adj[coords_to_index[a], coords_to_index[b]] = img[a] + img[b]
    mst = sparse.csgraph.minimum_spanning_tree(adj, overwrite=True)
    edges_by_weight = sorted(zip(*sparse.find(mst)), key=lambda abw: abw[-1])
    uf = DisjointSet(range(sz))
    edges_by_proba = []
    for a, b, _ in edges_by_weight:
        ta = len(uf.subset(a)) / sz
        tb = len(uf.subset(b)) / sz
        p = 1 - (1-ta)**n - (1-tb)**n + (1-ta-tb)**n
        edges_by_proba.append((a, b, p))
        uf.merge(a, b)
    edges_by_proba.sort(key=lambda abp: abp[-1])
    # TODO: Expand to full saliency map?
    adj = sparse.dok_array((sz, sz))
    for a, b, _ in edges_by_proba[:-(n - 1)]:
        adj[a, b] = 1
    _, cc = sparse.csgraph.connected_components(adj)
    labels = np.zeros(img.shape, int)
    for a, l in enumerate(cc):
        labels[index_to_coords[a]] = l + 1
    return labels


if __name__ == "__main__":
    from matplotlib import pyplot as plt

    from diplib import StochasticWatershed

    ii, jj = np.mgrid[:20, :20]
    im = ((1 - (.1 * np.random.random(size=ii.shape)
                + np.exp(-((ii-5)**2+(jj-5)**2)/30)
                + np.exp(-((ii-5)**2+(jj-15)**2)/30)
                + np.exp(-((ii-15)**2+(jj-5)**2)/30)
                + np.exp(-((ii-15)**2+(jj-15)**2)/30)))
        * 255).astype("i2")
    mask = np.ones(im.shape, dtype=bool)
    mask[:8, -8:] = mask[-8:, :8] = 0

    sw_nomask = esw(im, 2)
    sw_mask = esw(im, 2, mask=mask)

    axs = plt.figure(layout="compressed").subplots(2, 2).flat
    for ax in axs:
        ax.set(xticks=[], yticks=[])
    axs[0].imshow(im)
    axs[1].imshow(StochasticWatershed(im, 2, 0), clim=(0, 0.25))
    axs[2].imshow(sw_nomask, vmin=0)
    axs[3].imshow(sw_mask, vmin=0)
    plt.show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:DIPlib About the DIPlib library (C++ code) enhancement
Projects
None yet
Development

No branches or pull requests

2 participants