Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistencies in the computation of cost matrices #28

Open
giovp opened this issue Jan 11, 2023 · 0 comments
Open

inconsistencies in the computation of cost matrices #28

giovp opened this issue Jan 11, 2023 · 0 comments
Assignees

Comments

@giovp
Copy link
Contributor

giovp commented Jan 11, 2023

Hi,

I noticed that there are various inconsistencies in computing the costs for the linear and quadratic term of fgw.

Metric for the linear cost is different from both the signature and the pre-defined for quadratic

The inconsistency lies here:

M = ot.dist(A_X,B_X)

while the signature report "euclidean" the default is "sqeuclidean" https://pythonot.github.io/all.html#ot.dist
For the quadratic term instead, the metric is enforced to be "euclidean". This is problematic since even in the case of equal variance for both feature spaces (used to compute the linear and quadratic costs) the cost matrix would have different magnitudes (since afaik there is no scaling). Beside resolving the inconsistencies, exposing the metric choice in the signature would be helpful.

kl divergence assumes positivity of the feature space but does not assert it nor transform

The problem lies here:

paste/src/paste/PASTE.py

Lines 115 to 117 in 6b896b0

s_A = A_X + 0.01
s_B = B_X + 0.01
M = kl_divergence_backend(s_A, s_B)

in the case where a "standardized" or "scaled" gene table the resulting transport matrix is wrong, the result is invalid. to be fair, ot complains as well yet in a cryptic way

UserWarning: Problem unbounded
  result_code_string = check_result(result_code)

"scaled" gene tables are not uncommon and are the default outputs of normalization pipelines that use:

  • when sc.pp.scale is used
  • when sctransform is used
  • when person residuals are used
@mrland99 mrland99 self-assigned this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants