Remove or quantize the batch normalization layers? #835

Y0112358 · 2023-07-03T07:52:53Z

Y0112358
Jul 3, 2023

I am trying to understand the BinaryNet tutorial by dividing the model to many matrix operation sequentially to check the result of each layer.
"https://docs.larq.dev/larq/tutorials/binarynet_cifar10/"

model = tf.keras.models.Sequential([
    # In the first layer we only quantize the weights and not the input
    lq.layers.QuantConv2D(128, 3,
                kernel_quantizer="ste_sign",
                kernel_constraint="weight_clip",
                use_bias=False,
                input_shape=(32, 32, 3)),                               # Conv1_1
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_1

    lq.layers.QuantConv2D(128, 3, padding="same", **kwargs),            # Conv1_2
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),        # Pool1
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_2

    lq.layers.QuantConv2D(256, 3, padding="same", **kwargs),            # Conv2_1
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_3

    lq.layers.QuantConv2D(256, 3, padding="same", **kwargs),            # Conv2_2
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),        # Pool2
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_4

    lq.layers.QuantConv2D(512, 3, padding="same", **kwargs),            # Conv3_1
#    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),   # BN_5

    lq.layers.QuantConv2D(512, 3, padding="same", **kwargs),            # Conv3_2
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),        # Pool3
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_6
    tf.keras.layers.Flatten(),                                          # Flatten

    lq.layers.QuantDense(1024, **kwargs),                               # FC1
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_d1

    lq.layers.QuantDense(1024, **kwargs),                               # FC2
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_d2

    lq.layers.QuantDense(10, **kwargs),                                 # FC3
    tf.keras.layers.BatchNormalization(momentum=0.999, scale=False),    # BN_d3
    tf.keras.layers.Activation("softmax")                               # Softmax
])

Divided into layers

  img_input = [train_images[image_idx]]; img_label = train_labels[image_idx];

  img_input = tf.convert_to_tensor(img_input)
  Conv1_1 = tf.nn.conv2d(img_input, weight_1Q, [1, 1, 1, 1], padding='VALID');
  Conv1_1_bn = handmade_BN(Conv1_1,bn_1)
  Conv1_1Q = lq.quantizers.SteSign(clip_value=1.0)(Conv1_1_bn)
  Conv1_2 = tf.nn.conv2d(Conv1_1Q, weight_2Q, [1, 1, 1, 1], padding='SAME');
  Pool1 = tf.nn.max_pool2d(Conv1_2, ksize=(2, 2), strides=(2, 2), padding='SAME')
  Pool1_bn = handmade_BN(Pool1,bn_2)
  Pool1_bnQ = lq.quantizers.SteSign(clip_value=1.0)(Pool1_bn)

  ##################################
  Conv2_1 = tf.nn.conv2d(Pool1_bnQ, weight_3Q, [1, 1, 1, 1], padding='SAME');
  Conv2_1_bn = handmade_BN(Conv2_1,bn_3)
  Conv2_1Q = lq.quantizers.SteSign(clip_value=1.0)(Conv2_1_bn)
  Conv2_2 = tf.nn.conv2d(Conv2_1Q, weight_4Q, [1, 1, 1, 1], padding='SAME');
  Pool2 = tf.nn.max_pool2d(Conv2_2, ksize=(2, 2), strides=(2, 2), padding='VALID')
  Pool2_bn = handmade_BN(Pool2,bn_4)
  Pool2_bnQ = lq.quantizers.SteSign(clip_value=1.0)(Pool2_bn)

  ##################################
  [...]

In the tutorial, the trained weight are quantized for the filters, and it keeps real values for the parameters of the batch normalization (BN) layer. For example, the parameters ( the mean values, standard deviation values, and beta, ... etc ) of the BN layer are all presented with real number.
Which means that the operation of convolution operation, dense layer operation can use the quantized matrix-multiplier operator (e.q. XNOR or ternary operation). And it still needs some real-number operator for the BN layer operation.

I know that we might fuse the BN layer into the followed convolution layer or dense layer, and then quantize the fused convolution (or dense) layer, but it still has some biases or shifts left after fuse the BN layer and convolution layer.

Can we either keep the BN layers and quantize all the calculations into binarized operations or remove the BN layers and keeps good training convergency and accuracy?

lgeiger · 2023-07-06T16:49:09Z

lgeiger
Jul 6, 2023
Maintainer

I am not sure if I correctly understand your question but let me share a few thoughts regarding batch normalisation:

I think completely removing batch normalisation while keeping good accuracy will be tricky. At least some sort of normalisation - potentially even with a constant value - in general is required because the output of a binary convolution will be large integer values. For a bit more detail please checkout Questions about Batch Normalization #698 (comment)
I am not sure whether quantizing the batch normalisation values is actually desired as it also might impact accuracy. However, Larq Compute Engine should already implement this efficiently and fuse the 32-bit batch norm scaling factors into a single convolutional operation during the conversion to TFLite. For a bit more detail please checkout this is BatchNorm binarized? #714 (comment)

Let me know if that helps or whether you have any follow-up questions

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove or quantize the batch normalization layers? #835

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Remove or quantize the batch normalization layers? #835

Y0112358 Jul 3, 2023

Replies: 1 comment

lgeiger Jul 6, 2023 Maintainer

Y0112358
Jul 3, 2023

lgeiger
Jul 6, 2023
Maintainer