add: prototype implementation of tarp in sbi #1106

psteinb · 2024-03-22T14:38:52Z

What does this implement/fix? Explain your changes

Tarp is a diagnotics method, which can help identify over-/underdispersion and bias in trained neural posteriors. The corresponding paper is located here:
https://arxiv.org/abs/2302.03026

the repo code for numpy is here:
https://github.com/Ciela-Institute/tarp/

Does this close any currently open issues?

No, this was part of the Mar 2024 SBI hackathon in Tübingen

Any relevant code examples, logs, error output, etc?

Not yet, I am trying to reproduce the examples given in the paper. At a later point in time, I'd like to bring the tests as well as a tutorial in line to what is available with sbc.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

I have read and understood the contribution
guidelines
I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have reported how long the new tests run and potentially marked them
with pytest.mark.slow.
New and existing unit tests pass locally with my changes
I performed linting and formatting as described in the contribution
guidelines
I rebased on main (or there are no conflicts with main)

psteinb · 2024-03-28T11:31:43Z

Note, I am stopping work on this PR for the time being. I ran into issues reproducing the tarp paper: Ciela-Institute/tarp#8
Once they are resolved, I'll continue working on this.

- does not work yet

psteinb · 2024-04-12T10:38:33Z

Dear @janfb and @JuliaLinhart,
an alpha version of TARP (arxiv) as a SBI diagnostic is now ready from my point of view. I'd love someone of you to have a look. There are two files that I added sbi/diagnostics/tarp.py and tests/tarp_tests.py. The last unit test also documents how tarp would be used with SBI posterior predictions.
Feel free to have a look.

I'd have some questions though:

at this point, the tarp coverage estimates are returned as raw numbers, i.e. I don't perform any hypothesis testing on them, should I add (i.e. a KS test) that?
the TARP diagnostic class currently implements a run and a check function (to be aligned with the SBC code); the run function practically doesn't do anything TARP related but rather draws samples from the posterior, check actually performs tarp without any hypothesis test. I'm unclear if we should rather have run to compute the coverage stats and check do the hypothesis test for example. What do you think?
the TARP paper also offers a bootstrapped version of the diagnostic, would we want to have that in SBI too?
I think, if TARP is included in SBI, there should be a tutorial about it. I'd rather not make this part of this PR though. Is that OK?

janfb

thanks a lot for contributing this @psteinb ! 🙏

I did a first pass (not the tests) and added a couple of comments.

Two high-level comments:

do we really need a TARP class? Couldn't we do it in a function gets the samples, thetas, and a bunch of kwargs and basically does what check currently is doing?
It seems that TARP is very similar to SBC except that it uses a different reduce_fn (l2 or l1) for the ranking, and that it needs the references, no?
Thus, maybe there is way to incorporate it into the current implementation of sbc.py?

janfb · 2024-04-25T14:43:49Z

sbi/diagnostics/tarp.py

+def l2(x: Tensor, y: Tensor, axis=-1) -> Tensor:
+    """
+    Calculates the L2 distance between two tensors.
+    Args:
+        x (Tensor): The first tensor.
+        y (Tensor): The second tensor.
+        axis (int, optional): The axis along which to calculate the L2 distance.
+                Defaults to -1.
+    Returns:
+        Tensor: A tensor containing the L2 distance between x and y along the
+                specified axis.
+    """
+    return torch.sqrt(torch.sum((x - y) ** 2, axis=axis))
+
+
+def l1(x: Tensor, y: Tensor, axis=-1) -> Tensor:
+    """
+    Calculates the L1 distance between two tensors.
+    Args:
+        x (Tensor): The first tensor.
+        y (Tensor): The second tensor.
+        axis (int, optional): The axis along which to calculate the L1 distance.
+                Defaults to -1.
+    Returns:
+        Tensor: A tensor containing the L1 distance between x and y along the
+                specified axis.
+    """
+    return torch.sum(torch.abs(x - y), axis=axis)


I suggest to move these into sbi/utils/metrics

janfb · 2024-04-25T14:55:06Z

sbi/diagnostics/tarp.py

+            posterior.set_default_x(xo)
+            posterior.train()


Suggested change

posterior.set_default_x(xo)

posterior.train()

posterior.train()

janfb · 2024-04-25T14:57:54Z

sbi/diagnostics/tarp.py

+
+    def __init__(
+        self,
+        references: Tensor = None,


Suggested change

references: Tensor = None,

references: Optional[Tensor] = None,

janfb · 2024-04-25T15:04:48Z

sbi/diagnostics/tarp.py

+        self,
+        references: Tensor = None,
+        metric: str = "euclidean",
+        num_alpha_bins: Union[int, None] = None,


this will be set to n_bins in __init__ so I suggest to just rename it and be consistent with num_

Suggested change

num_alpha_bins: Union[int, None] = None,

num_bins: Optional[int] = None,

janfb · 2024-04-25T15:05:23Z

sbi/diagnostics/tarp.py

+          num_alpha_bins: number of bins to use for the credibility values.
+                If ``None``, then ``n_sims // 10`` bins are used.


Suggested change

num_alpha_bins: number of bins to use for the credibility values.

If ``None``, then ``n_sims // 10`` bins are used.

num_bins: number of bins to use for the credibility values.

If ``None``, then ``num_sims // 10`` bins are used.

janfb · 2024-04-25T15:25:16Z

sbi/diagnostics/tarp.py

+        if theta.shape[-2] != num_sims:
+            raise ValueError("theta must have the same number of rows as samples")
+        if theta.shape[-1] != num_dims:
+            raise ValueError("theta must have the same number of columns as samples")


Suggested change

if theta.shape[-2] != num_sims:

raise ValueError("theta must have the same number of rows as samples")

if theta.shape[-1] != num_dims:

raise ValueError("theta must have the same number of columns as samples")

theta.shape == samples.shape[1:], "number and dimensions of ground truth thetas must match the posterior samples."

janfb · 2024-04-25T15:26:58Z

sbi/diagnostics/tarp.py

+        """
+        # TARP assumes that the predicted thetas are sampled from the "true"
+        # PDF num_samples times
+        theta = theta.detach() if len(theta.shape) != 2 else theta.detach().unsqueeze(0)


can't we assert already here that theta.shape == samples.shape[1:]? Why do we need the unsqueeze(0)?

janfb · 2024-04-25T15:29:17Z

sbi/diagnostics/tarp.py

+            samples = (samples - lo) / (hi - lo + 1e-10)
+            theta = (theta - lo) / (hi - lo + 1e-10)
+
+        assert len(theta.shape) == len(samples.shape)


I am confused by this assert. I assumed that samples always has one dimension more than theta because it contains samples for many different x?

janfb · 2024-04-25T15:31:13Z

sbi/diagnostics/tarp.py

+        if not isinstance(self.references, Tensor):
+            # obtain min/max per dimension of theta
+            lo = (
+                torch.min(theta, dim=-2).values.min(axis=0).values
+            )  # should be 0 if normalized
+            hi = (
+                torch.max(theta, dim=-2).values.max(axis=0).values
+            )  # should be 1 if normalized
+
+            refpdf = torch.distributions.Uniform(low=lo, high=hi)
+            self.references = refpdf.sample((1, num_sims))
+        else:
+            if len(self.references.shape) == 2:
+                # add singleton dimension in front
+                self.references = self.references.unsqueeze(0)
+
+            if len(self.references.shape) == 3 and self.references.shape[0] != 1:
+                raise ValueError(
+                    f"""references must be a 2D array with a singular first
+                    dimension, received {self.references.shape}"""
+                )
+
+            if self.references.shape[-2] != num_sims:
+                raise ValueError(
+                    f"references must have the same number samples as samples,"
+                    f"received {self.references.shape[-2]} != {num_sims}"
+                )
+
+            if self.references.shape[-1] != num_dims:
+                raise ValueError(
+                    "references must have the same number of dimensions as "
+                    f"samples or theta, received {self.references.shape[-1]}"
+                    f"!= {num_dims}"
+                )


it seems that these lines act only on self.references. I suggest to move them to a separate function, e.g., def _check_references(...) and call this function only once during init. Or am I missing something?

janfb · 2024-04-25T15:32:56Z

sbi/diagnostics/tarp.py

+        if self.metric_name.lower() in ["l2", "euclidean"]:
+            distance = l2
+        elif self.metric_name.lower() in ["l1", "manhattan"]:
+            distance = l1
+        else:
+            raise ValueError(
+                "metric must be either 'euclidean' or 'manhattan',"
+                f"received {self.metric_name}"
+            )


this could be done during __init__ as well, and then just set self.distance.

psteinb self-assigned this Mar 22, 2024

psteinb closed this Mar 28, 2024

psteinb reopened this Mar 28, 2024

psteinb added 10 commits April 12, 2024 12:15

add: prototype implementation of tarp in sbi

ac943c2

fix: wrong use of torch.nn loss functions

533c970

fix: wrong use of min values per dim

faca49a

introduced overconfident / underdispersed samples

a6d2b76

fix: wrong generation of toy gaussian data

2329306

add simple test to check detection of pathological cases

03f30af

added biased case

fc0a0fb

separate quite long test file

7f34d12

prepared test case with trained NPE

4d2ea3d

- does not work yet

first draft of TARP implementation for now

f7c3cc2

psteinb force-pushed the psteinb-prototyping-tarp branch from 9e4bde5 to f7c3cc2 Compare April 12, 2024 10:17

formatting code according to ruff

6bde631

psteinb force-pushed the psteinb-prototyping-tarp branch from c331c6d to 6bde631 Compare April 12, 2024 10:31

janfb reviewed Apr 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add: prototype implementation of tarp in sbi #1106

add: prototype implementation of tarp in sbi #1106

psteinb commented Mar 22, 2024 •

edited

psteinb commented Mar 28, 2024

psteinb commented Apr 12, 2024

janfb left a comment

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

janfb Apr 25, 2024

	posterior.set_default_x(xo)
	posterior.train()
	posterior.train()

	references: Tensor = None,
	references: Optional[Tensor] = None,

	num_alpha_bins: Union[int, None] = None,
	num_bins: Optional[int] = None,

		num_alpha_bins: number of bins to use for the credibility values.
		If ``None``, then ``n_sims // 10`` bins are used.

add: prototype implementation of tarp in sbi #1106

Are you sure you want to change the base?

add: prototype implementation of tarp in sbi #1106

Conversation

psteinb commented Mar 22, 2024 • edited

What does this implement/fix? Explain your changes

Does this close any currently open issues?

Any relevant code examples, logs, error output, etc?

Checklist

psteinb commented Mar 28, 2024

psteinb commented Apr 12, 2024

janfb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psteinb commented Mar 22, 2024 •

edited