A stacked LSTM with LSTM layers which alternate between going forwards over the sequence and going backwards.

class allennlp.modules.stacked_alternating_lstm.StackedAlternatingLstm(input_size: int, hidden_size: int, num_layers: int, recurrent_dropout_probability: float = 0.0, use_highway: bool = True, use_input_projection_bias: bool = True)[source]

Bases: torch.nn.modules.module.Module

A stacked LSTM with LSTM layers which alternate between going forwards over the sequence and going backwards. This implementation is based on the description in Deep Semantic Role Labelling - What works and what’s next .

input_sizeint, required

The dimension of the inputs to the LSTM.

hidden_sizeint, required

The dimension of the outputs of the LSTM.

num_layersint, required

The number of stacked LSTMs to use.

recurrent_dropout_probability: float, optional (default = 0.0)

The dropout probability to be used in a dropout scheme as stated in A Theoretically Grounded Application of Dropout in Recurrent Neural Networks .

use_input_projection_biasbool, optional (default = True)

Whether or not to use a bias on the input projection layer. This is mainly here for backwards compatibility reasons and will be removed (and set to False) in future releases.


The outputs of the interleaved LSTMs per timestep. A tensor of shape (batch_size, max_timesteps, hidden_size) where for a given batch element, all outputs past the sequence length for that batch are zero tensors.

forward(self, inputs:torch.nn.utils.rnn.PackedSequence, initial_state:Union[Tuple[torch.Tensor, torch.Tensor], NoneType]=None) → Tuple[Union[torch.Tensor, torch.nn.utils.rnn.PackedSequence], Tuple[torch.Tensor, torch.Tensor]][source]
inputsPackedSequence, required.

A batch first PackedSequence to run the stacked LSTM over.

initial_stateTuple[torch.Tensor, torch.Tensor], optional, (default = None)

A tuple (state, memory) representing the initial hidden state and memory of the LSTM. Each tensor has shape (1, batch_size, output_dimension).


The encoded sequence of shape (batch_size, sequence_length, hidden_size)

final_states: Tuple[torch.Tensor, torch.Tensor]

The per-layer final (state, memory) states of the LSTM, each with shape (num_layers, batch_size, hidden_size).