Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal transport R matrix rows don't sum to 1 #72

Open
jaysonjeg opened this issue Oct 8, 2023 · 10 comments
Open

Optimal transport R matrix rows don't sum to 1 #72

jaysonjeg opened this issue Oct 8, 2023 · 10 comments

Comments

@jaysonjeg
Copy link

jaysonjeg commented Oct 8, 2023

alignment_methods.OptimalTransportAlignment() results in transformation matrix R whose rows and columns sum to 1/nfeatures. This does not seem ideal. On the other hand, alignment_methods.POTAlignment()'s R matrix has rows and columns that sum to 1, which seems better.

If the columns of R sum to 1/nfeatures, then if you transform a functional brain map, the transformation is "shrinking" each parcel's mean value depending on the size of each parcel.

Minimal reproducing example:

from fmralign import alignment_methods as am
import numpy as np
x = np.random.random((20,50)) #20 nsamples, 50 nfeatures
y = np.random.random((20,50))
ot = am.OptimalTransportAlignment()
pot = am.POTAlignment()
ot.fit(x,y)
pot.fit(x,y)
print(ot.R.sum(axis=0)[0:3])
print(pot.R.sum(axis=0)[0:3])

Solution:

At the end of alignment_methods.OptimalTransportAlignment().fit(), add a line: self.R = self.R * self.R.shape[0]

@emdupre
Copy link
Collaborator

emdupre commented Oct 11, 2023

Hi @jaysonjeg,

Thanks for reporting ! Could you confirm which version of fmralign you're using ? Reproducing your example from the current main branch, I get:

$ print(ot.R.sum(axis=0)[0:3])
[0.9999997 0.9999997 1.       ]

$ print(pot.R.sum(axis=0)[0:3])
[1. 1. 1.]

Since the .fit() call includes the multiplication you suggested, I think this should be correct. But please let me know if you're still seeing this with the latest code.

We haven't had a release in a long time, but that will happen soon ! I'd just like to merge #70 first. Hopefully that will help to clarify these concerns !

@jaysonjeg
Copy link
Author

Hi Elizabeth, you are right. I was using a previous version of fmralign. You have fixed this issue already.

@emdupre
Copy link
Collaborator

emdupre commented Oct 12, 2023

Thanks for confirming ! I'll close this, but please open a new issue if you encounter any problems with the newest code.

@emdupre emdupre closed this as completed Oct 12, 2023
@alexisthual
Copy link
Collaborator

Hi all!
I just came across this, and it feels to me that it is normal that each row / column of an OT plan should sum to 1 / n_samples. Indeed, in our case, a transport plan is a soft mapping between point clouds ; each point is associated with a mass (1 / n_samples in the case of a uniform distribution) and each row / column describes how this mass should be spread across points of the other cloud.
However, in the general case, the total mass mass of the plan (sum of all coefficients) should be close to 1.

@emdupre emdupre reopened this Oct 24, 2023
@jaysonjeg
Copy link
Author

jaysonjeg commented Oct 24, 2023 via email

@bthirion
Copy link
Contributor

In larger parcels, the coefficients are indeed smaller, but this should be compensated by the fact that there are more coefficients. Is this an effect you have observed (cancellations effects may indeed be present, but more so in the noise case where the signal sign varies a lot) ?

@jaysonjeg
Copy link
Author

So if I continue the reproducing example in the first post in this thread.
z = np.random.random((50))
z_transformed = z @ ot.R
print(z.sum())
print(z2.sum())

The sum (or mean) of z_transformed is 1/50th the sum (or mean) of z. This is with the older version of fmralign. With the latest version, the rows/cols of R sum to 1, so this no longer happens.

@alexisthual
Copy link
Collaborator

If I understand correctly what you are looking for, one indeed needs to normalize the z_transformed tensor defined in your example (i.e. divide by the plan's sum of rows / columns). You can find an example here: https://github.com/alexisthual/fugw/blob/58cdfee03f39e6ae6f66b701489071af24664391/src/fugw/mappings/dense.py#L335C1-L337C55

@jaysonjeg
Copy link
Author

In the above thread, I think you meant to say (1/nfeatures), not (1/nsamples)? I don't fully understand the rationale for rows/columns summing to (1/nfeatures) rather than to 1. But I agree that if the normalization is desired, it can be done retrospectively.

@alexisthual
Copy link
Collaborator

Sorry if it was not clear: indeed, the transport plan is of size n_source_vertices x n_target_vertices (I was referring to any of these twi values as n_samples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants