998 improving abc methods for trial based data using statistical distances #1104

theogruner · 2024-03-22T14:10:59Z

What does this implement/fix? Explain your changes

Adding an approximation of the squared 2-Wasserstein distance based on Sinkhorn iterations as an additional statistical distance to the available metrics. Furthermore, extending MCABC and SMCABC to allow conditioning on multiple observations using statistical distances.

Does this close any currently open issues?

Fixes #998

Checklist

I have read and understood the contribution
guidelines
I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have reported how long the new tests run and potentially marked them
with pytest.mark.slow.
New and existing unit tests pass locally with my changes
I performed linting and formatting as described in the contribution
guidelines
I rebased on main (or there are no conflicts with main)

… based on regularized optimal transport

…hods-for-trial-based-data-using-statistical-distances

janfb

Thanks a lot, this is great!
Added a couple of comments.

sbi/inference/abc/abc_base.py

janfb · 2024-03-22T15:44:11Z

sbi/inference/abc/abc_base.py

@@ -98,7 +127,47 @@ def l2_distance(xo, x):
        def l1_distance(xo, x):
            return torch.mean(abs(xo - x), dim=-1)

-        distance_functions = {"mse": mse_distance, "l2": l2_distance, "l1": l1_distance}
+        def mmd_squared(xo, x):


given we have so many different distance functions now, I think it is time to refactor this and move them out of this function to the top level or to a separate file. would you agree?

please add types and docstrings as well.

I agree. My suggestion would be to add a Distance or Metric class in the sbi/utils/metrics.py which builds one of the chosen distances or a custom one. We can further set the allow_iid flag within the new class.

I now moved the distance functions to a separate distance class within the abc folder. I did not want to put it into sbi/utils/metrics.py as its implementation is specific to ABC and should not be used outside of it.

sbi/inference/abc/abc_base.py

janfb · 2024-03-22T15:47:18Z

sbi/inference/abc/abc_base.py

        if isinstance(distance_type, Callable):
-            return distance_type
+            if allow_iid is None:


what happens with allow_iid else? I think it would be good to not keep is None, but to set it to either True or False at the beginning of this functions. Otherwise, pyright will likely complain.

Isn't the type already specified to be a bool or None? Therefore, if it not None upright will treat it as a bool.

janfb · 2024-03-22T15:49:15Z

sbi/inference/abc/abc_base.py


            return distance(observed_data, simulated_data)

-        return distance_fun
+        is_statistical_distance = distance_type in implemented_statistical_distances
+        if allow_iid is not None:


allow_iid should be True or False

sbi/utils/metrics.py

tests/abc_test.py

janfb · 2024-03-22T16:05:58Z

tests/metrics_test.py

@@ -128,3 +130,35 @@ def test_c2st_scores(dist_sigma, c2st_lowerbound, c2st_upperbound):
    assert obs2_c2st.mean() <= c2st_upperbound

    assert np.allclose(obs2_c2st, obs_c2st, atol=0.05)
+
+
+@pytest.mark.slow


could you add tests for the other distances as well? that'd be great!

sure, I'll add them :)

Added tests for the unbiased and biased MMD based on hypothesis tests.

…hods-for-trial-based-data-using-statistical-distances

janfb

Great, thanks for the edits!

I added some last comments, once those addressed, the PR can be merged 🎉

janfb · 2024-04-04T08:25:30Z

sbi/inference/abc/abc_base.py

@@ -54,7 +66,9 @@ def __init__(
        self.x_shape = None

        # Select distance function.
-        self.distance = self.get_distance_function(distance)
+        self.distance = Distance(


janfb · 2024-04-04T08:26:11Z