allennlp.models.encoder_decoders

class allennlp.models.encoder_decoders.simple_seq2seq.SimpleSeq2Seq(vocab: allennlp.data.vocabulary.Vocabulary, source_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, max_decoding_steps: int, target_namespace: str = 'tokens', target_embedding_dim: int = None, attention_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None, scheduled_sampling_ratio: float = 0.0) → None[source]

Bases: allennlp.models.model.Model

This SimpleSeq2Seq class is a Model which takes a sequence, encodes it, and then uses the encoded representations to decode another sequence. You can use this as the basis for a neural machine translation system, an abstractive summarization system, or any other common seq2seq problem. The model here is simple, but should be a decent starting place for implementing recent models for these tasks.

This SimpleSeq2Seq model takes an encoder (Seq2SeqEncoder) as an input, and implements the functionality of the decoder. In this implementation, the decoder uses the encoder’s outputs in two ways. The hidden state of the decoder is intialized with the output from the final time-step of the encoder, and when using attention, a weighted average of the outputs from the encoder is concatenated to the inputs of the decoder at every timestep.

Parameters:

vocab : Vocabulary, required

Vocabulary containing source and target vocabularies. They may be under the same namespace (tokens) or the target tokens can have a different namespace, in which case it needs to be specified as target_namespace.

source_embedder : TextFieldEmbedder, required

Embedder for source side sequences

encoder : Seq2SeqEncoder, required

The encoder of the “encoder/decoder” model

max_decoding_steps : int, required

Length of decoded sequences

target_namespace : str, optional (default = ‘tokens’)

If the target side vocabulary is different from the source side’s, you need to specify the target’s namespace here. If not, we’ll assume it is “tokens”, which is also the default choice for the source side, and this might cause them to share vocabularies.

target_embedding_dim : int, optional (default = source_embedding_dim)

You can specify an embedding dimensionality for the target side. If not, we’ll use the same value as the source embedder’s.

attention_function: ``SimilarityFunction``, optional (default = None)

If you want to use attention to get a dynamic summary of the encoder outputs at each step of decoding, this is the function used to compute similarity between the decoder hidden state and encoder outputs.

scheduled_sampling_ratio: float, optional (default = 0.0)

At each timestep during training, we sample a random number between 0 and 1, and if it is not less than this value, we use the ground truth labels for the whole batch. Else, we use the predictions from the previous time step for the whole batch. If this value is 0.0 (default), this corresponds to teacher forcing, and if it is 1.0, it corresponds to not using target side ground truth labels. See the following paper for more information: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Bengio et al., 2015.

decode(output_dict: typing.Dict[str, torch.FloatTensor]) → typing.Dict[str, torch.FloatTensor][source]

This method overrides Model.decode, which gets called after Model.forward, at test time, to finalize predictions. The logic for the decoder part of the encoder-decoder lives within the forward method.

This method trims the output predictions to the first end symbol, replaces indices with corresponding tokens, and adds a field called predicted_tokens to the output_dict.

forward(source_tokens: typing.Dict[str, torch.LongTensor], target_tokens: typing.Dict[str, torch.LongTensor] = None) → typing.Dict[str, torch.FloatTensor][source]

Decoder logic for producing the entire target sequence.

Parameters:

source_tokens : Dict[str, torch.LongTensor]

The output of TextField.as_array() applied on the source TextField. This will be passed through a TextFieldEmbedder and then through an encoder.

target_tokens : Dict[str, torch.LongTensor], optional (default = None)

Output of Textfield.as_array() applied on target TextField. We assume that the target tokens are also represented as a TextField.

classmethod from_params(vocab, params: allennlp.common.params.Params) → allennlp.models.encoder_decoders.simple_seq2seq.SimpleSeq2Seq[source]