allennlp.modules.seq2seq_decoders¶

Modules that transform a sequence of encoded vectors into a sequence of output vectors.

The available Seq2Seq decoders are

class allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder(target_embedder: allennlp.modules.token_embedders.embedding.Embedding)[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

A SeqDecoder abstract class representing the entire decoder (embedding and neural network) of a Seq2Seq architecture. This is meant to be used with allennlp.models.encoder_decoder.composed_seq2seq.ComposedSeq2Seq.

The implementation of this abstract class ideally uses a decoder neural net allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet for decoding.

The default_implementation allennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder covers most use cases. More likely that we will use the default implementation instead of creating a new implementation.

Parameters
target_embedderEmbedding

Embedder for target tokens. Needed in the base class to enable weight tying.

default_implementation = 'auto_regressive_seq_decoder'
forward(self, encoder_out:Dict[str, torch.LongTensor], target_tokens:Union[Dict[str, torch.LongTensor], NoneType]=None) → Dict[str, torch.Tensor][source]

Decoding from encoded states to sequence of outputs also computes loss if target_tokens are given.

Parameters
encoder_outDict[str, torch.LongTensor], required

Dictionary with encoded state, ideally containing the encoded vectors and the source mask.

target_tokensDict[str, torch.LongTensor], optional

The output of TextField.as_array() applied on the target TextField.

get_metrics(self, reset:bool=False) → Dict[str, float][source]

The decoder is responsible for computing metrics using the target tokens.

get_output_dim(self) → int[source]

The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.

post_process(self, output_dict:Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]

Post processing for converting raw outputs to prediction during inference. The composing models such allennlp.models.encoder_decoders.composed_seq2seq.ComposedSeq2Seq can call this method when decode is called.

class allennlp.modules.seq2seq_decoders.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder(vocab: allennlp.data.vocabulary.Vocabulary, decoder_net: allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet, max_decoding_steps: int, target_embedder: allennlp.modules.token_embedders.embedding.Embedding, target_namespace: str = 'tokens', tie_output_embedding: bool = False, scheduled_sampling_ratio: float = 0, label_smoothing_ratio: Optional[float] = None, beam_size: int = 4, tensor_based_metric: allennlp.training.metrics.metric.Metric = None, token_based_metric: allennlp.training.metrics.metric.Metric = None)[source]

An autoregressive decoder that can be used for most seq2seq tasks.

Parameters
vocabVocabulary, required

Vocabulary containing source and target vocabularies. They may be under the same namespace (tokens) or the target tokens can have a different namespace, in which case it needs to be specified as target_namespace.

decoder_netDecoderNet, required

Module that contains implementation of neural network for decoding output elements

max_decoding_stepsint

Maximum length of decoded sequences.

target_embedderEmbedding

Embedder for target tokens.

target_namespacestr, optional (default = ‘target_tokens’)

If the target side vocabulary is different from the source side’s, you need to specify the target’s namespace here. If not, we’ll assume it is “tokens”, which is also the default choice for the source side, and this might cause them to share vocabularies.

beam_sizeint, optional (default = 4)

Width of the beam for beam search.

tensor_based_metricMetric, optional (default = None)

A metric to track on validation data that takes raw tensors when its called. This metric must accept two arguments when called: a batched tensor of predicted token indices, and a batched tensor of gold token indices.

token_based_metricMetric, optional (default = None)

A metric to track on validation data that takes lists of lists of tokens as input. This metric must accept two arguments when called, both of type List[List[str]]. The first is a predicted sequence for each item in the batch and the second is a gold sequence for each item in the batch.

scheduled_sampling_ratiofloat optional (default = 0)

Defines ratio between teacher forced training and real output usage. If its zero (teacher forcing only) and decoder_netsupports parallel decoding, we get the output predictions in a single forward pass of the decoder_net.

forward(self, encoder_out:Dict[str, torch.LongTensor], target_tokens:Dict[str, torch.LongTensor]=None) → Dict[str, torch.Tensor][source]

Decoding from encoded states to sequence of outputs also computes loss if target_tokens are given.

Parameters
encoder_outDict[str, torch.LongTensor], required

Dictionary with encoded state, ideally containing the encoded vectors and the source mask.

target_tokensDict[str, torch.LongTensor], optional

The output of TextField.as_array() applied on the target TextField.

get_metrics(self, reset:bool=False) → Dict[str, float][source]

The decoder is responsible for computing metrics using the target tokens.

get_output_dim(self)[source]

The dimension of each timestep of the hidden state in the layer before final softmax. Needed to check whether the model is compaitble for embedding-final layer weight tying.

post_process(self, output_dict:Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]

This method trims the output predictions to the first end symbol, replaces indices with corresponding tokens, and adds a field called predicted_tokens to the output_dict.

take_step(self, last_predictions:torch.Tensor, state:Dict[str, torch.Tensor]) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]

Take a decoding step. This is called by the beam search class.

Parameters
last_predictionstorch.Tensor

A tensor of shape (group_size,), which gives the indices of the predictions during the last time step.

stateDict[str, torch.Tensor]

A dictionary of tensors that contain the current state information needed to predict the next step, which includes the encoder outputs, the source mask, and the decoder hidden state and context. Each of these tensors has shape (group_size, *), where * can be any other number of dimensions.

Returns
Tuple[torch.Tensor, Dict[str, torch.Tensor]]

A tuple of (log_probabilities, updated_state), where log_probabilities is a tensor of shape (group_size, num_classes) containing the predicted log probability of each class for the next step, for each item in the group, while updated_state is a dictionary of tensors containing the encoder outputs, source mask, and updated decoder hidden state and context.

Notes

We treat the inputs as a batch, even though group_size is not necessarily equal to batch_size, since the group may contain multiple states for each source sentence in the batch.

class allennlp.modules.seq2seq_decoders.decoder_net.DecoderNet(decoding_dim: int, target_embedding_dim: int, decodes_parallel: bool)[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

This class abstracts the neural architectures for decoding the encoded states and embedded previous step prediction vectors into a new sequence of output vectors.

The implementations of DecoderNet is used by implementations of allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder such as allennlp.modules.seq2seq_decoders.seq_decoder.auto_regressive_seq_decoder.AutoRegressiveSeqDecoder.

The outputs of this module would be likely used by allennlp.modules.seq2seq_decoders.seq_decoder.SeqDecoder to apply the final output feedforward layer and softmax.

Parameters
decoding_dimint, required

Defines dimensionality of output vectors.

target_embedding_dimint, required

Defines dimensionality of target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.

decodes_parallelbool, required

Defines whether the decoder generates multiple next step predictions at in a single forward.

forward(self, previous_state:Dict[str, torch.Tensor], encoder_outputs:torch.Tensor, source_mask:torch.Tensor, previous_steps_predictions:torch.Tensor, previous_steps_mask:Union[torch.Tensor, NoneType]=None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters
previous_steps_predictionstorch.Tensor, required

Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)

encoder_outputstorch.Tensor, required

Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)

source_masktorch.Tensor, required

This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)

previous_stateDict[str, torch.Tensor], required

previous state of decoder

Returns
Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
get_output_dim(self) → int[source]

Returns the dimension of each vector in the sequence output by this DecoderNet. This is not the shape of the returned tensor, but the last element of that shape.

init_decoder_state(self, encoder_out:Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]

Initialize the encoded state to be passed to the first decoding time step.

Parameters
batch_sizeint

Size of batch

final_encoder_outputtorch.Tensor

Last state of the Encoder

Returns
Dict[str, torch.Tensor]
Initial state
class allennlp.modules.seq2seq_decoders.lstm_cell_decoder_net.LstmCellDecoderNet(decoding_dim: int, target_embedding_dim: int, attention: Optional[allennlp.modules.attention.attention.Attention] = None, bidirectional_input: bool = False)[source]

This decoder net implements simple decoding network with LSTMCell and Attention.

Parameters
decoding_dimint, required

Defines dimensionality of output vectors.

target_embedding_dimint, required

Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.

attentionAttention, optional (default = None)

If you want to use attention to get a dynamic summary of the encoder outputs at each step of decoding, this is the function used to compute similarity between the decoder hidden state and encoder outputs.

forward(self, previous_state:Dict[str, torch.Tensor], encoder_outputs:torch.Tensor, source_mask:torch.Tensor, previous_steps_predictions:torch.Tensor, previous_steps_mask:Union[torch.Tensor, NoneType]=None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters
previous_steps_predictionstorch.Tensor, required

Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)

encoder_outputstorch.Tensor, required

Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)

source_masktorch.Tensor, required

This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)

previous_stateDict[str, torch.Tensor], required

previous state of decoder

Returns
Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
init_decoder_state(self, encoder_out:Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]

Initialize the encoded state to be passed to the first decoding time step.

Parameters
batch_sizeint

Size of batch

final_encoder_outputtorch.Tensor

Last state of the Encoder

Returns
Dict[str, torch.Tensor]
Initial state
class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.Decoder(layer: torch.nn.modules.module.Module, num_layers: int)[source]

Bases: torch.nn.modules.module.Module

Transformer N layer decoder with masking. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html

forward(self, x:torch.Tensor, memory:torch.Tensor, src_mask:torch.Tensor, tgt_mask:torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.DecoderLayer(size: int, self_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, src_attn: allennlp.modules.seq2seq_encoders.bidirectional_language_model_transformer.MultiHeadedAttention, feed_forward: <module 'torch.nn.functional' from '/usr/local/lib/python3.6/site-packages/torch/nn/functional.py'>, dropout: float)[source]

Bases: torch.nn.modules.module.Module

A single layer of transformer decoder. Code taken from http://nlp.seas.harvard.edu/2018/04/03/attention.html

forward(self, x:torch.Tensor, memory:torch.Tensor, src_mask:torch.Tensor, tgt_mask:torch.Tensor) → torch.Tensor[source]

Follow Figure 1 (right) for connections.

class allennlp.modules.seq2seq_decoders.stacked_self_attention_decoder_net.StackedSelfAttentionDecoderNet(decoding_dim: int, target_embedding_dim: int, feedforward_hidden_dim: int, num_layers: int, num_attention_heads: int, use_positional_encoding: bool = True, positional_encoding_max_steps: int = 5000, dropout_prob: float = 0.1, residual_dropout_prob: float = 0.2, attention_dropout_prob: float = 0.1)[source]

A Stacked self-attention decoder implementation.

Parameters
decoding_dimint, required

Defines dimensionality of output vectors.

target_embedding_dimint, required

Defines dimensionality of input target embeddings. Since this model takes it’s output on a previous step as input of following step, this is also an input dimensionality.

feedforward_hidden_dimint, required.

The middle dimension of the FeedForward network. The input and output dimensions are fixed to ensure sizes match up for the self attention layers.

num_layersint, required.

The number of stacked self attention -> feedfoward -> layer normalisation blocks.

num_attention_headsint, required.

The number of attention heads to use per layer.

use_positional_encoding: bool, optional, (default = True)

Whether to add sinusoidal frequencies to the input tensor. This is strongly recommended, as without this feature, the self attention layers have no idea of absolute or relative position (as they are just computing pairwise similarity between vectors of elements), which can be important features for many tasks.

dropout_probfloat, optional, (default = 0.1)

The dropout probability for the feedforward network.

residual_dropout_probfloat, optional, (default = 0.2)

The dropout probability for the residual connections.

attention_dropout_probfloat, optional, (default = 0.1)

The dropout probability for the attention distributions in each attention layer.

forward(self, previous_state:Dict[str, torch.Tensor], encoder_outputs:torch.Tensor, source_mask:torch.Tensor, previous_steps_predictions:torch.Tensor, previous_steps_mask:Union[torch.Tensor, NoneType]=None) → Tuple[Dict[str, torch.Tensor], torch.Tensor][source]

Performs a decoding step, and returns dictionary with decoder hidden state or cache and the decoder output. The decoder output is a 3d tensor (group_size, steps_count, decoder_output_dim) if self.decodes_parallel is True, else it is a 2d tensor with (group_size, decoder_output_dim).

Parameters
previous_steps_predictionstorch.Tensor, required

Embeddings of predictions on previous step. Shape: (group_size, steps_count, decoder_output_dim)

encoder_outputstorch.Tensor, required

Vectors of all encoder outputs. Shape: (group_size, max_input_sequence_length, encoder_output_dim)

source_masktorch.Tensor, required

This tensor contains mask for each input sequence. Shape: (group_size, max_input_sequence_length)

previous_stateDict[str, torch.Tensor], required

previous state of decoder

Returns
Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
init_decoder_state(self, encoder_out:Dict[str, torch.LongTensor]) → Dict[str, torch.Tensor][source]

Initialize the encoded state to be passed to the first decoding time step.

Parameters
batch_sizeint

Size of batch

final_encoder_outputtorch.Tensor

Last state of the Encoder

Returns
Dict[str, torch.Tensor]
Initial state