Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2 BUG]: Cross entropy loss is too big #845

Open
KnathanM opened this issue Apr 29, 2024 · 1 comment
Open

[v2 BUG]: Cross entropy loss is too big #845

KnathanM opened this issue Apr 29, 2024 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@KnathanM
Copy link
Contributor

chemprop.nn.MulticlassClassificationFFN sets n_tasks = n_tasks * n_classes. This is a problem because then the default task weights is task_weights = torch.ones(n_tasks). F.cross_entropy in CrossEntropyLoss will reduce the n_tasks * n_classes predicted values to n_tasks loss values, but then L = L * self.task_weights.view(1, -1) will broadcast the n_tasks loss values to n_tasks * n_classes, effectively multiplying the loss value by n_classes when we use L.sum().

This doesn't break anything so I'll plan to fix it when we take another look at loss functions and metrics.

@KnathanM KnathanM added the bug Something isn't working label Apr 29, 2024
@KnathanM KnathanM added this to the v2.0.1 milestone Apr 29, 2024
@KnathanM KnathanM self-assigned this Apr 29, 2024
@davidegraff
Copy link
Contributor

specifically, the problem starts here:

def __init__(
self,
n_classes: int,
n_tasks: int = 1,
input_dim: int = DEFAULT_HIDDEN_DIM,
hidden_dim: int = 300,
n_layers: int = 1,
dropout: float = 0.0,
activation: str = "relu",
criterion: LossFunction | None = None,
task_weights: Tensor | None = None,
threshold: float | None = None,
output_transform: UnscaleTransform | None = None,
):
super().__init__(
n_tasks * n_classes,
input_dim,
hidden_dim,
n_layers,
dropout,
activation,
criterion,
task_weights,
threshold,
output_transform,
)
self.n_classes = n_classes

and continues to here:

def __init__(
self,
n_tasks: int = 1,
input_dim: int = DEFAULT_HIDDEN_DIM,
hidden_dim: int = 300,
n_layers: int = 1,
dropout: float = 0.0,
activation: str = "relu",
criterion: LossFunction | None = None,
task_weights: Tensor | None = None,
threshold: float | None = None,
output_transform: UnscaleTransform | None = None,
):
super().__init__()
self.save_hyperparameters()
self.hparams["cls"] = self.__class__
self.ffn = MLP.build(
input_dim, n_tasks * self.n_targets, hidden_dim, n_layers, dropout, activation
)
task_weights = torch.ones(n_tasks) if task_weights is None else task_weights
self.criterion = criterion or Factory.build(
self._T_default_criterion, task_weights=task_weights, threshold=threshold
)

IMO it will probably be easier to fix this by abstracting the multiclass FFN into its own Predictor subclass rather than abusing the _FFNPredictorBase class like we do currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants