allennlp.modules.elmo

class allennlp.modules.elmo.Elmo(options_file: str, weight_file: str, num_output_representations: int, do_layer_norm: bool = False, dropout: float = 0.5) → None[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

Compute ELMo representations using a pre-trained bidirectional language model.

See “Deep contextualized word representations”, Peters et al. for details.

This module takes character id input and computes num_output_representations different layers of ELMo representations. Typically num_output_representations is 1 or 2. For example, in the case of the SRL model in the above paper, num_output_representations=1 where ELMo was included at the input token representation layer. In the case of the SQuAD model, num_output_representations=2 as ELMo was also included at the GRU output layer.

In the implementation below, we learn separate scalar weights for each output layer, but only run the biLM once on each input sequence for efficiency.

Parameters:

options_file : str, required.

ELMo JSON options file

weight_file : str, required.

ELMo hdf5 weight file

num_output_representations: ``int``, required.

The number of ELMo representation layers to output.

do_layer_norm : bool, optional, (default=False).

Should we apply layer normalization (passed to ScalarMix)?

dropout : float, optional, (default = 0.5).

The dropout to be applied to the ELMo representations.

forward(inputs: torch.FloatTensor) → typing.Dict[str, typing.Union[torch.FloatTensor, typing.List[torch.FloatTensor]]][source]
Parameters:

inputs : torch.autograd.Variable

Shape (batch_size, timesteps, 50) of character ids representing the current batch. We also accept tensors with additional optional dimensions: (batch_size, dim0, dim1, ..., dimn, timesteps, 50)

Returns:

Dict with keys:

'elmo_representations': List[torch.autograd.Variable]

A num_output_representations list of ELMo representations for the input sequence. Each representation is shape (batch_size, timesteps, embedding_dim)

'mask': torch.autograd.Variable

Shape (batch_size, timesteps) long tensor with sequence mask.

classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.elmo.Elmo[source]