Fix batch handling for `Add::Gradient()` #3679

rcurtin · 2024-04-08T15:26:50Z

The current implementation of the Add layer (which just adds a bias to each element) does not properly handle batch sizes greater than 1. This can be seen in the example in #3666.

The Gradient() function expects the error matrix to be of size outSize x batchSize, and the gradient matrix itself is already sized to be weightSize x 1. (For the Add layer, it is conveniently true that weightSize == outSize.)

However, the existing implementation of gradient = error is only correct when batchSize = 1. Much like the Linear layer, the bias term's gradient is computed as the sum across all input points in the batch.

I then noticed that the Add layer is not tested, so I added a simple test for it at a batch size of 1 and also at a larger batch size. (Previous to this PR, the second test will fail.)

mlpack-bot

Second approval provided automatically after 24 hours. 👍

rcurtin added 2 commits April 8, 2024 11:18

Fix Add layer for batch sizes greater than 1.

7921a3a

Add some tests for the Add layer.

5524679

rcurtin added c: methods t: bugfix labels Apr 8, 2024

rcurtin mentioned this pull request Apr 8, 2024

Can't train a model having bias addition layer Add() #3666

Closed

shrit approved these changes Apr 8, 2024

View reviewed changes

mlpack-bot bot approved these changes Apr 9, 2024

View reviewed changes

rcurtin merged commit aa771da into mlpack:master Apr 12, 2024
9 checks passed

rcurtin deleted the add-batch-size branch April 12, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix batch handling for `Add::Gradient()` #3679

Fix batch handling for `Add::Gradient()` #3679

rcurtin commented Apr 8, 2024

mlpack-bot bot left a comment

Fix batch handling for Add::Gradient() #3679

Fix batch handling for Add::Gradient() #3679

Conversation

rcurtin commented Apr 8, 2024

mlpack-bot bot left a comment

Choose a reason for hiding this comment

Fix batch handling for `Add::Gradient()` #3679

Fix batch handling for `Add::Gradient()` #3679