Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The mask tensor M in script tab_network.py needs to be transformed to realize the objective stated in the paper: "γ is a relaxation parameter – when γ = 1, a feature is enforced to be used only at one decision step". #516

Open
sciengineer opened this issue Sep 25, 2023 · 0 comments
Assignees
Labels
Research Research Ideas to improve architecture

Comments

@sciengineer
Copy link

sciengineer commented Sep 25, 2023

Describe the bug
In the mask tensor M, elements consistently register values far from 1 and near zero, which results in "self.gamma - M" and the prior value being distinct from zero. However, the paper stipulates that "a feature is enforced to be used only at one decision step" when gamma equals 1. This seems practically unachievable, leading me to infer that a transformation of tensor M is necessary.

Tensor M, a batch_size * num_features tensor, is the output of sparsemax. It's interpreted as the weightage of features during a decision step, with each row of M summing up to 1. Consequently, when dealing with a large number of features, the values of the elements are zeros or positive numbers approximating zero. It's highly unlikely for there to be a solitary element equal to 1 amidst other elements that are zeros.

In my view, the author of the Tabnet paper should have employed a tensor, transformed from M, which frequently contains several elements equal to 1, rather than using M itself. This tensor should have been used as the subtrahend in this line: "prior = torch.mul(self.gamma - M, prior)", to realize the objective stated in the paper: "γ is a relaxation parameter – when γ = 1, a feature is enforced to be used only at one decision step".

def forward(self, x, prior=None):
        x = self.initial_bn(x)

        bs = x.shape[0]  # batch size
        if prior is None:
            prior = torch.ones((bs, self.attention_dim)).to(x.device)

        M_loss = 0
        att = self.initial_splitter(x)[:, self.n_d :]
        steps_output = []
        for step in range(self.n_steps):
            M = self.att_transformers[step](prior, att)
            M_loss += torch.mean(
                torch.sum(torch.mul(M, torch.log(M + self.epsilon)), dim=1)
            )
            # update prior
            prior = torch.mul(self.gamma - M, prior)

What is the current behavior?
A feature is not enforced to be used only at one desicion step when gamma is 1 as stated in the paper.
If the current behavior is a bug, please provide the steps to reproduce.

Set gamma = 1, and see the value of variabes M, prior,and masked_x in debug mode when in for loop of decision steps.
Expected behavior

As the paper says, when gamma = 1, a feature is enforced to be used only at one decision step.
Screenshots

Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:

Additional context

@sciengineer sciengineer added the bug Something isn't working label Sep 25, 2023
@sciengineer sciengineer changed the title The mask tensor M in script tab_network.py needs to be transformed so that "a feature is enforced to be used only at one desicion step", when gamma is 1. The mask tensor M in script tab_network.py needs to be transformed to realize the objective stated in the paper: "γ is a relaxation parameter – when γ = 1, a feature is enforced to be used only at one decision step". Sep 26, 2023
@Optimox Optimox added Research Research Ideas to improve architecture and removed bug Something isn't working labels Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Research Research Ideas to improve architecture
Projects
None yet
Development

No branches or pull requests

3 participants