Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2 BUG]: MCC loss can give Nan #846

Open
KnathanM opened this issue Apr 29, 2024 · 2 comments
Open

[v2 BUG]: MCC loss can give Nan #846

KnathanM opened this issue Apr 29, 2024 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@KnathanM
Copy link
Contributor

Describe the bug
BinaryMCCLoss can give a loss of nan.

Example(s)
The confusion matrix elements (TP, FP, TN, FN) are calculated as targets * predictions. At the start of training model predictions can happen to be negative, in which case some of the matrix elements can be negative. This leads ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).sqrt() to possibly take the square root of a negative number and give nan.

Expected behavior
I'm not sure what the expected behavior should be. Perhaps raw predictions should be forced to be positive (via a sigmoid?). Or the loss set to zero when the loss is nan?

Additional context
This was also a problem in v1 (#380). Our current implementation seems to be taken directly from v1.

@KnathanM KnathanM added the bug Something isn't working label Apr 29, 2024
@KnathanM KnathanM added this to the v2.0.1 milestone Apr 29, 2024
@shihchengli
Copy link
Contributor

Is it possible that the prediction could be negative? Should the prediction be the output after the sigmoid function? In v 1, PR #309 made the multiclass MCC loss return 0 in cases where the loss was infinite. However, the binary MCC was not fixed at that time. For datasets that are very imbalanced, it is possible that there is only a positive or negative result in a batch, leading to the MCC being not meaningful. Using class balance might be a work around in this case.

@davidegraff
Copy link
Contributor

davidegraff commented Apr 30, 2024

I believe the error is here:

def __call__(self, preds: Tensor, targets: Tensor, mask: Tensor, weights: Tensor, *args):
if not (0 <= preds.min() and preds.max() <= 1): # assume logits
preds = preds.softmax(2)

This assumes the preds input to *MCCLoss will be a tensor of shape $(b, t, k)$, where $b$ is the batch size, $t$ is the number of tasks and $k$ is the number of classes to predict, containing either logits or predictions. Of course, this doesn't work in the case of binary predictions, where the tensor has shape $(b, t)$ but still needs to be normalized. The easiest thing to do here is to just split the binary and multiclass losses into their own classes. The mixin isn't really doing much here (and only avoids three lines of repeated code).

I also agree with #380 in that we should add a small $\epsilon \leq 10^{-8}$ value for numerical stability when it's possible we may calculate $\frac 0 0$. I can't really see that happening here if we're calculating the soft MCC (i.e., using probabilities rather than predictions), but it never hurts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants