Elmo(options_file: str, weight_file: str, num_output_representations: int, do_layer_norm: bool = False, dropout: float = 0.5) → None¶
Compute ELMo representations using a pre-trained bidirectional language model.
See “Deep contextualized word representations”, Peters et al. for details.
This module takes character id input and computes
num_output_representationsdifferent layers of ELMo representations. Typically
num_output_representationsis 1 or 2. For example, in the case of the SRL model in the above paper,
num_output_representations=1where ELMo was included at the input token representation layer. In the case of the SQuAD model,
num_output_representations=2as ELMo was also included at the GRU output layer.
In the implementation below, we learn separate scalar weights for each output layer, but only run the biLM once on each input sequence for efficiency.
ELMo JSON options file
ELMo hdf5 weight file
num_output_representations: ``int``, required.
The number of ELMo representation layers to output.
bool, optional, (default=False).
Should we apply layer normalization (passed to
float, optional, (default = 0.5).
The dropout to be applied to the ELMo representations.
forward(inputs: torch.FloatTensor) → typing.Dict[str, typing.Union[torch.FloatTensor, typing.List[torch.FloatTensor]]]¶
(batch_size, timesteps, 50)of character ids representing the current batch. We also accept tensors with additional optional dimensions:
(batch_size, dim0, dim1, ..., dimn, timesteps, 50)
Dict with keys:
num_output_representationslist of ELMo representations for the input sequence. Each representation is shape
(batch_size, timesteps, embedding_dim)
(batch_size, timesteps)long tensor with sequence mask.
from_params(params: allennlp.common.params.Params) → allennlp.modules.elmo.Elmo¶