Fix for the graphsage feedforward algorithm #1792
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I went into depth about the issue in #1789, so go there if you want to see more of the gory details, but I'll try to give a summary here.
Essentially, in the original graphsage paper, they first concatenate the previous layer's output with the aggregated feature vector and then applies a fully connected layer to the whole thing. However, in the current stellargraph implementation, weights are first applied to each vector and then the results are concatenated.
There are two main issues with this implementation. First, the network has about half as many parameters as it should, which leads to a lack of expressiveness. Secondly, and much more importantly, it means the first half of the output for each layer is ONLY based on the feature vector, and the second half is ONLY based on the aggregated vector, meaning that the network lacks the ability to gain any information about their interplay (at least within one layer).
I fixed this in a very simple way. First of all, I changed the output sizes of all the weights from outputdim/2 (plus some remainder terms) to just being outputdim. Next, I replaced the concatenation operation with a sum operation, which is the equivalent of saying that if A=[X,Y] and v = [x;y], then Av=[X,Y][x;y]=Xx+Yy, which is completely equivalent
I made no other changes, so it should still work with all other uses of the graphsage model
As a part of that, I had to change some of the tests that were related to how graphsage models were initialized (i.e., the ones that test the weight sizes for an initialized graphsage model)