allennlp.models.bidirectional_lm

class allennlp.models.bidirectional_lm.BidirectionalLanguageModel(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, contextualizer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, layer_norm: typing.Union[allennlp.modules.masked_layer_norm.MaskedLayerNorm, NoneType] = None, dropout: float = None, loss_scale: typing.Union[float, str] = 1.0, remove_bos_eos: bool = True, num_samples: int = None, sparse_embeddings: bool = False) → None[source]

Bases: allennlp.models.model.Model

The BidirectionalLanguageModel applies a bidirectional “contextualizing” Seq2SeqEncoder to uncontextualized embeddings, using a SoftmaxLoss module (defined above) to compute the language modeling loss.

It is IMPORTANT that your bidirectional Seq2SeqEncoder does not do any “peeking ahead”. That is, for its forward direction it should only consider embeddings at previous timesteps, and for its backward direction only embeddings at subsequent timesteps. If this condition is not met, your language model is cheating.

Parameters:
vocab: ``Vocabulary``
text_field_embedder: ``TextFieldEmbedder``

Used to embed the indexed tokens we get in forward.

contextualizer: ``Seq2SeqEncoder``

Used to “contextualize” the embeddings. As described above, this encoder must not cheat by peeking ahead.

layer_norm: ``MaskedLayerNorm``, optional (default: None)

If provided, is applied to the noncontextualized embeddings before they’re fed to the contextualizer.

dropout: ``float``, optional (default: None)

If specified, dropout is applied to the contextualized embeddings.

loss_scale: ``Union[float, str]``, optional (default: 1.0)

This scaling factor is applied to the average language model loss. You can also specify "n_samples" in which case we compute total loss across all predictions.

remove_bos_eos: ``bool``, optional (default: True)

Typically the provided token indexes will be augmented with begin-sentence and end-sentence tokens. If this flag is True the corresponding embeddings will be removed from the return values.

num_samples: ``int``, optional (default: None)

If provided, the model will use SampledSoftmaxLoss with the specified number of samples. Otherwise, it will use the full _SoftmaxLoss defined above.

sparse_embeddings: ``bool``, optional (default: False)

Passed on to SampledSoftmaxLoss if True.

forward(source: typing.Dict[str, torch.LongTensor]) → typing.Dict[str, torch.Tensor][source]

Computes the averaged forward and backward LM loss from the batch.

By convention, the input dict is required to have at least a "tokens" entry that’s the output of a SingleIdTokenIndexer, which is used to compute the language model targets.

If the model was instantatiated with remove_bos_eos=True, then it is expected that each of the input sentences was augmented with begin-sentence and end-sentence tokens.

Parameters:
tokens: ``torch.Tensor``, required.

The output of Batch.as_tensor_dict() for a batch of sentences.

Returns:
Dict with keys:
``’loss’``: ``torch.Tensor``

averaged forward/backward negative log likelihood

``’forward_loss’``: ``torch.Tensor``

forward direction negative log likelihood

``’backward_loss’``: ``torch.Tensor``

backward direction negative log likelihood

``’lm_embeddings’``: ``torch.Tensor``

(batch_size, timesteps, embed_dim) tensor of top layer contextual representations

``’mask’``: ``torch.Tensor``

(batch_size, timesteps) mask for the embeddings