Skip to content Skip to sidebar Skip to footer

Seq2seq-attention Peeping Into The Encoder-states Bypasses Last Encoder-hidden-state

In the seq2seq-Model I want to use the hidden state at end of encoding to read out further info from the input sequence. So I return the hidden state and build a new sub net on top

Solution 1:

Bottom line, you should try different approaches and see what model works best for your data. Without knowing anything about your data or running some tests it is impossible to speculate on whether attention mechanism, CNN, etc. provides any benefits or not.

However, if you are using the tensorflow seq2seq models available in tensorflow/tensorflow/python/ops/seq2seq.py let me share some observations about the attention mechanism as implemented in embedding_attention_seq2seq() and attention_decoder() that related to your question(s):

  1. Hidden state of decoder is initialized with the final state of encoder...so attention does not "effectively bypass the hidden state at end of encoding" IMHO

The following code in embedding_attention_seq2seq() passes in the last time step encoder_state as the initial_state in the 2nd argument:

return embedding_attention_decoder(
      decoder_inputs, encoder_state, attention_states, cell,
      num_decoder_symbols, embedding_size, num_heads=num_heads,
      output_size=output_size, output_projection=output_projection,
      feed_previous=feed_previous,
      initial_state_attention=initial_state_attention)

And you can see that initial_state is used directly in attention_decoder() without going through any kind of attention states:

state = initial_state

...

for i, inp inenumerate(decoder_inputs):
  if i > 0:
    variable_scope.get_variable_scope().reuse_variables()
  # If loop_function is set, we use it instead of decoder_inputs.if loop_function isnotNoneand prev isnotNone:
    with variable_scope.variable_scope("loop_function", reuse=True):
      inp = loop_function(prev, i)
  # Merge input and previous attentions into one vector of the right size.
  input_size = inp.get_shape().with_rank(2)[1]
  if input_size.value isNone:
    raise ValueError("Could not infer input size from input: %s" % inp.name)
  x = linear([inp] + attns, input_size, True)
  # Run the RNN.
  cell_output, state = cell(x, state)
  ....
  1. Attention states are combined with decoder inputs via learned linear combinations

    x = linear([inp] + attns, input_size, True)

    # Run the RNN.

    cell_output, state = cell(x, state)

...the linear() does the W, b matrix operations to down rank the combined input + attn into the decoder input_size. The model will learn values for W and b.

Summary: the attention states are combined with inputs into the decoder, but the last hidden state of the encoder is fed in as the initial hidden state of the decoder without attention.

Finally, the attention mechanism still has the last encoding state at it's disposal and would only "bypass" it if learned that was the best thing to do during training.

Post a Comment for "Seq2seq-attention Peeping Into The Encoder-states Bypasses Last Encoder-hidden-state"