[v2 BUG]: MCC loss can give Nan #846

KnathanM · 2024-04-29T19:23:29Z

Describe the bug
BinaryMCCLoss can give a loss of nan.

Example(s)
The confusion matrix elements (TP, FP, TN, FN) are calculated as targets * predictions. At the start of training model predictions can happen to be negative, in which case some of the matrix elements can be negative. This leads ((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).sqrt() to possibly take the square root of a negative number and give nan.

Expected behavior
I'm not sure what the expected behavior should be. Perhaps raw predictions should be forced to be positive (via a sigmoid?). Or the loss set to zero when the loss is nan?

Additional context
This was also a problem in v1 (#380). Our current implementation seems to be taken directly from v1.

The text was updated successfully, but these errors were encountered:

shihchengli · 2024-04-29T20:22:54Z

Is it possible that the prediction could be negative? Should the prediction be the output after the sigmoid function? In v 1, PR #309 made the multiclass MCC loss return 0 in cases where the loss was infinite. However, the binary MCC was not fixed at that time. For datasets that are very imbalanced, it is possible that there is only a positive or negative result in a batch, leading to the MCC being not meaningful. Using class balance might be a work around in this case.

davidegraff · 2024-04-30T14:56:02Z

I believe the error is here:

chemprop/chemprop/nn/loss.py

Lines 195 to 197 in cc5b3c1

    
           def __call__(self, preds: Tensor, targets: Tensor, mask: Tensor, weights: Tensor, *args): 
        
               if not (0 <= preds.min() and preds.max() <= 1):  # assume logits 
        
                   preds = preds.softmax(2)

This assumes the preds input to *MCCLoss will be a tensor of shape $(b, t, k)$, where $b$ is the batch size, $t$ is the number of tasks and $k$ is the number of classes to predict, containing either logits or predictions. Of course, this doesn't work in the case of binary predictions, where the tensor has shape $(b, t)$ but still needs to be normalized. The easiest thing to do here is to just split the binary and multiclass losses into their own classes. The mixin isn't really doing much here (and only avoids three lines of repeated code).

I also agree with #380 in that we should add a small $\epsilon \leq 10^{-8}$ value for numerical stability when it's possible we may calculate $\frac 0 0$. I can't really see that happening here if we're calculating the soft MCC (i.e., using probabilities rather than predictions), but it never hurts.

KnathanM added the bug Something isn't working label Apr 29, 2024

KnathanM added this to the v2.0.1 milestone Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2 BUG]: MCC loss can give Nan #846

[v2 BUG]: MCC loss can give Nan #846

KnathanM commented Apr 29, 2024

shihchengli commented Apr 29, 2024

davidegraff commented Apr 30, 2024 •

edited

[v2 BUG]: MCC loss can give Nan #846

[v2 BUG]: MCC loss can give Nan #846

Comments

KnathanM commented Apr 29, 2024

shihchengli commented Apr 29, 2024

davidegraff commented Apr 30, 2024 • edited

davidegraff commented Apr 30, 2024 •

edited