TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

cassidylaidlaw · 2019-07-11T19:41:36Z

In the SPSAAdam class in cleverhans/attacks/spsa.py, the gradient is calculated with the following code:

  delta = self._delta
  delta_x = self._get_delta(x, delta)
  delta_x = tf.concat([delta_x, -delta_x], axis=0)
  loss_vals = tf.reshape(
      loss_fn(x + delta_x),
      [2 * self._num_samples] + [1] * (len(x_shape) - 1))
  avg_grad = reduce_mean(loss_vals * delta_x, axis=0) / delta

I believe the last line should instead be

  avg_grad = reduce_mean(loss_vals * tf.sign(delta_x), axis=0) / delta

or equivalently

  avg_grad = reduce_mean(loss_vals / delta_x, axis=0)

to match the description in the paper. Without this, the gradient is multiplied by delta, which is by default 0.01. Thus, the gradient is underestimated by a factor of 100. I confirmed this while working on the PyTorch implementation, which I changed to use sign(delta_x); it then estimated the gradients correctly.

This issue is also present in cleverhans/future/tf2/attacks/spsa.py. It looks like it was originally implemented by @juesato.

The text was updated successfully, but these errors were encountered:

juesato · 2019-07-12T20:02:37Z

Hey, thanks! Yes, the code should read as you've proposed - otherwise, the updates will move in the right direction, but by the wrong amount.

Do you want to go ahead at making the change? The provided learning rate defaults should also be adjusted, so that default behavior is the same before/after the change. For backwards-compatibility, I would add a new parameter, say lr, since the semantics of the parameter will change, and add a deprecation warning that users should switch from learning_rate to lr.

I'm also happy to fix this, though it will probably be a week until I can get to it.

cassidylaidlaw · 2019-07-12T23:55:41Z

Sure, I can go ahead and make the change. That sounds good about preserving backwards compatibility. Just to make sure I got the math right, the new learning rate should be lr = learning_rate * delta right?

juesato · 2019-07-13T11:11:12Z

Great, thanks! Yes, that checks out to me.

cassidylaidlaw · 2020-01-22T15:12:56Z

@juesato I was working on fixing this issue today, but I realized that since SPSA is using Adam as an optimizer, it should be agnostic to the magnitude of the gradient. So we can't fix it by scaling the learning rate, but it shouldn't matter anyways. Does that sound right to you?

CNOCycle mentioned this issue Apr 13, 2020

[TF2] spsa #1158

Open

jonasguan added version < 4 Issues encountered in the (soon to be) archived version 3 and prior versions and removed version < 4 Issues encountered in the (soon to be) archived version 3 and prior versions labels Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

cassidylaidlaw commented Jul 11, 2019

juesato commented Jul 12, 2019

cassidylaidlaw commented Jul 12, 2019

juesato commented Jul 13, 2019

cassidylaidlaw commented Jan 22, 2020

TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

TensorFlow Implementation of SPSA seems to calculate gradient incorrectly #1087

Comments

cassidylaidlaw commented Jul 11, 2019

juesato commented Jul 12, 2019

cassidylaidlaw commented Jul 12, 2019

juesato commented Jul 13, 2019

cassidylaidlaw commented Jan 22, 2020