Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the determinant Jacobian of SoftmaxCentered #1777

Open
hylkedonker opened this issue Dec 15, 2023 · 0 comments
Open

Understanding the determinant Jacobian of SoftmaxCentered #1777

hylkedonker opened this issue Dec 15, 2023 · 0 comments

Comments

@hylkedonker
Copy link

hylkedonker commented Dec 15, 2023

Hi,

I have been tinkering with the SoftmaxCentered bijector but I would like help understanding how the determinant of the Jacobian is calculated. I've already found the function SoftmaxCentered._inverse_log_det_jacobian, which is very nicely documented. From what I understand from the comments in that function, the SoftmaxCentered transformation $x = B(z)$ is composed of two transformation $B(\pmb{z}) = G[F(\pmb{z})]$.
I think I understand how the first transformation: $\pmb{z} = F^{-1}(\pmb{y}) = [ \ln y_1 - \ln y_0, \dots, \ln y_n - \ln y_0]$ with $y_0 = 1 - \sum_{i=1}^n y_i$ gives rise to

$$ |\det \frac{\partial F(\pmb{z})}{\partial \pmb{z}} | = \prod_{i=0}^n y_i $$

(using the Sherman-Morrison formula and the Matrix determinant lemma). I have trouble understanding what the second transformation $\pmb{x} = G(\pmb{y})$ is good for, and why its Jacobian is det{(DG)^T(DG)} so as to give rise to an additional constant factor $\sqrt{n+1}$ in the determinant of $B(z)$.

As an example, lets assume I use the SoftmaxCentered to transform uniform samples.
$$z_1 \sim \mathrm{Uniform}(0, 1), \quad [x_0, x_1] = B([z_1]). $$
The SoftmaxCentered.forward_log_det_jacobian should keep my distribution $p(\pmb{x}) = p(z_1) |\det \frac{\partial z}{\partial B} |$ normalised.
However, when I numerically compute
$$1 = \int \mathrm{d}\pmb{x} p(\pmb{x}) = \int_0^1 \mathrm{d}x_0 \int_0^1 \mathrm{d}x_1 p(\pmb{x}) \delta(x_1 + x_0 -1) = \int_0^1 \mathrm{d}x_1 p([1 - x_1, x_1]) $$
I get exactly a factor $1/\sqrt{2}$ off, which I think is related to the function $G(\pmb{y})$.

import numpy as np
from tensorflow_probability.substrates import jax as tfp
tfb = tfp.bijectors

softmax = tfb.SoftmaxCentered()
ln_p_z = 0.0  # Uniform distribution.
z = np.linspace(0, 1, 1000).reshape(-1, 1)
x = softmax(z)
ln_p_x = ln_p_z - softmax.forward_log_det_jacobian(z)
p_x = np.exp(ln_p_x)

# Approximate integration with Riemann sum.
dx = np.diff(x[:, 0], axis=0)
normalisation = np.sum(p_x[1:] * dx)
# normalisation is 1/sqrt[2] instead of one.

I am probably overlooking something trivial. Can you help me fill in the gaps?

Thanks in advance,

Hylke

@hylkedonker hylkedonker changed the title Understanding the log determinant Jacobian of SoftmaxCentered Understanding the determinant Jacobian of SoftmaxCentered Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant