Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage equivariant MLP #94

Open
vec123 opened this issue Dec 5, 2023 · 4 comments
Open

Usage equivariant MLP #94

vec123 opened this issue Dec 5, 2023 · 4 comments

Comments

@vec123
Copy link

vec123 commented Dec 5, 2023

Hi,

i am attempting to build a equivariant Variational Encoder-Decoder framework.

For this I am using R2Conv() and R3(Conv) layers in the encoder with trivial-representation input & output and regular-representations in between. For the Decoder I would like to use equivariant MLPs. However it is quite unclear to me how the examples map to a generic MLP.

For example I do not understand how one could specify the input and output-dimension respectively. Instead it seems to me, that the equivariant MLP expects (just like a CNN) a 2D or 3D dimensional input, and that the output dimension is determined by the Harmonics-decomposition of functions on that space. In contrast to that a MLP accepts a flat input and the (flat) output dimension is a hyperparameter specified by the user.

During my learning process, I start with a rectangular input grid of shape [B,1,X,Y,Z] corresponding to a scalar (field trivial representation). Use R3(Conv) to get [B,1,X,Y,1] with one hidden regular-representation and a trivial representation output, store [B,1,Z_encoding_size] as the encoding of Z and continue with [B,X,Y,1] and R2Conv() to obtain the encodings of X and Y in shape
[B, 1, X_encoding_size, Y_encoding_size].
A final linear layer maps the [B,1, X_encoding_size , Y_encoding_size , Z_encoding_size] shaped encoding to a latent-space that parametrizes the mean and variance of a distribution.

This to me seems more or less clear. The Decoder part much less.

I really hope for some clarification. The equivariant learning procedure is something I only discovered a week ago and it seems like opening the Pandora box considering all the nice but extensive theory behind it.
Sadly I do not have the time to pick up on it nor is there anyone in my environment who knows that stuff.
Is it reasonable to expect having a learning model within a week?

@maxxxzdn
Copy link

maxxxzdn commented Mar 5, 2024

Hi,

did you see the MLP example https://github.com/QUVA-Lab/escnn/blob/master/examples/mlp.ipynb?

an equivariant MLP doesn't expect a base space (2D nor 3D), it works exactly as a classic MLP and takes only a stack of feature fields:

G = group.so3_group()
        
# since we are building an MLP, there is no base-space
gspace = gspaces.no_base_space(self.G)
        
# assume you have scalar and vector quantities in your output:
scalar_repr = gspace.trivial_repr
vector_repr = gspace.fibergroup.standard_representation()

# assume your output goes like [[scalar, vector], [scalar, vector], ...., [scalar, vector]]
channel_repr = group.directsum([scalar_repr, vector_repr])

# specify the number of channels in input and output
c_in = 1
c_out = 12
in_repr = c_in * [channel_repr]
out_repr = c_out * [channel_repr]

# define feature field type
in_type = gspace.type(*in_repr)
out_type = gspace.type(*out_repr)

# define your MLP
mlp = MLP(in_type, out_type)

As a result, you will give your MLP "flat" stack of features (here, 1 copy of [scalar, vector]) and get back another stack of features but now with 12 copies.

@Danfoa
Copy link
Contributor

Danfoa commented Mar 18, 2024

Hi @maxxxzdn and @Gabri95,

Following your suggestion, say I build a equiv MLP of input, hidden, output equivariant linear layers with some activation function. Such that the hidden layer group representation is defined by hidden_repr = c_in * [channel_repr]

By Shur's lemma, we know there is no linear map between feature fields of different types, therefore, this naive construction of the equivariant MLP will result in a network which never mixes the signals from scalar and vector representations. That is, this network, will result in a decoupled network processing only scalar fields to scalar fields, and vector fields to vector fields. This is clearly a bad architectural design.

To mix fields of different types, we are required to perform the CG tensor product, however it is not clear how to use this, and specially it is unclear what are good design principles for embedding the CG tensor product in the architecture.

Any insights?

@maxxxzdn
Copy link

Please note that the interaction between irreps will happen in non-linearity (e.g. QuotientELU), so use nn.TensorProductModule is not the only way.

@Danfoa
Copy link
Contributor

Danfoa commented Apr 12, 2024

Hi @maxxxzdn can you point out a paper/lecture-note/escnn-documentation page where the action of this quotient activations are clearly explained? I am afraid I am unable to comprehend from the documentation how signals from different irreps are being mixed in this fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants