Problem using attention wrapper. #9

PratsBhatt · 2017-07-18T15:58:31Z

am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.

Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible

Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708

cell_output, next_cell_state = self._cell(cell_inputs, cell_state)

I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):

encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length

attention_mechanism = attention_wrapper.LuongAttention(
        num_units=attn_size, memory=encoder_outputs,
        memory_sequence_length=encoder_inputs_length,
        scale=True,
        name='LuongAttention' )

# Building decoder_cell
decoder_cell_list = [
    build_single_cell() for i in range(num_layers)]

decoder_initial_state = encoder_last_state

def attn_decoder_input_fn(inputs, attention):
    #if not self.attn_input_feeding:
    #    return inputs

    # Essential when use_residual=True
    _input_layer = Dense(size, dtype=tf.float32,
                        name='attn_input_feeding')
    return _input_layer(array_ops.concat([inputs, attention], -1))


# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
    cell=decoder_cell_list[-1],
    attention_mechanism=attention_mechanism,
    attention_layer_size=attn_size,
    #cell_input_fn=attn_decoder_input_fn,
    initial_cell_state=encoder_last_state[-1],
    alignment_history=False,
    name='Attention_Wrapper')

# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state

# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
#    else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]

initial_state[-1] = decoder_cell_list[-1].zero_state(
    batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)

return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`

Thank you once again.

The text was updated successfully, but these errors were encountered:

hugddygff · 2018-01-17T06:47:48Z

Did you solve this problem?
I also meet this problem, thanks!

shivam13juna · 2018-07-06T19:26:43Z

@hugddygff either of you solved it yet? i'm stuck here as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem using attention wrapper. #9

Problem using attention wrapper. #9

PratsBhatt commented Jul 18, 2017

hugddygff commented Jan 17, 2018

shivam13juna commented Jul 6, 2018

Problem using attention wrapper. #9

Problem using attention wrapper. #9

Comments

PratsBhatt commented Jul 18, 2017

hugddygff commented Jan 17, 2018

shivam13juna commented Jul 6, 2018