Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batch handling for Add::Gradient() #3679

Merged
merged 2 commits into from
Apr 12, 2024
Merged

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Apr 8, 2024

The current implementation of the Add layer (which just adds a bias to each element) does not properly handle batch sizes greater than 1. This can be seen in the example in #3666.

The Gradient() function expects the error matrix to be of size outSize x batchSize, and the gradient matrix itself is already sized to be weightSize x 1. (For the Add layer, it is conveniently true that weightSize == outSize.)

However, the existing implementation of gradient = error is only correct when batchSize = 1. Much like the Linear layer, the bias term's gradient is computed as the sum across all input points in the batch.

I then noticed that the Add layer is not tested, so I added a simple test for it at a batch size of 1 and also at a larger batch size. (Previous to this PR, the second test will fail.)

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit aa771da into mlpack:master Apr 12, 2024
9 checks passed
@rcurtin rcurtin deleted the add-batch-size branch April 12, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants