Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

Open
cassidylaidlaw opened this issue Jul 11, 2019 · 4 comments

Comments

@cassidylaidlaw
Copy link
Contributor

In the SPSAAdam class in cleverhans/attacks/spsa.py, the gradient is calculated with the following code:

  delta = self._delta
  delta_x = self._get_delta(x, delta)
  delta_x = tf.concat([delta_x, -delta_x], axis=0)
  loss_vals = tf.reshape(
      loss_fn(x + delta_x),
      [2 * self._num_samples] + [1] * (len(x_shape) - 1))
  avg_grad = reduce_mean(loss_vals * delta_x, axis=0) / delta

I believe the last line should instead be

  avg_grad = reduce_mean(loss_vals * tf.sign(delta_x), axis=0) / delta

or equivalently

  avg_grad = reduce_mean(loss_vals / delta_x, axis=0)

to match the description in the paper. Without this, the gradient is multiplied by delta, which is by default 0.01. Thus, the gradient is underestimated by a factor of 100. I confirmed this while working on the PyTorch implementation, which I changed to use sign(delta_x); it then estimated the gradients correctly.

This issue is also present in cleverhans/future/tf2/attacks/spsa.py. It looks like it was originally implemented by @juesato.

@juesato
Copy link
Contributor

juesato commented Jul 12, 2019

Hey, thanks! Yes, the code should read as you've proposed - otherwise, the updates will move in the right direction, but by the wrong amount.

Do you want to go ahead at making the change? The provided learning rate defaults should also be adjusted, so that default behavior is the same before/after the change. For backwards-compatibility, I would add a new parameter, say lr, since the semantics of the parameter will change, and add a deprecation warning that users should switch from learning_rate to lr.

I'm also happy to fix this, though it will probably be a week until I can get to it.

@cassidylaidlaw
Copy link
Contributor Author

Sure, I can go ahead and make the change. That sounds good about preserving backwards compatibility. Just to make sure I got the math right, the new learning rate should be lr = learning_rate * delta right?

@juesato
Copy link
Contributor

juesato commented Jul 13, 2019

Great, thanks! Yes, that checks out to me.

@cassidylaidlaw
Copy link
Contributor Author

@juesato I was working on fixing this issue today, but I realized that since SPSA is using Adam as an optimizer, it should be agnostic to the magnitude of the gradient. So we can't fix it by scaling the learning rate, but it shouldn't matter anyways. Does that sound right to you?

@jonasguan jonasguan added version < 4 Issues encountered in the (soon to be) archived version 3 and prior versions and removed version < 4 Issues encountered in the (soon to be) archived version 3 and prior versions labels Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants