allennlp.state_machines.transition_functions

This module contains TransitionFunctions for state-machine-based decoders. The TransitionFunction parameterizes transitions between States. These TransitionFunctions are all pytorch Modules` that have trainable parameters. The BasicTransitionFunction is simply an LSTM decoder with attention over an input utterance, and the other classes typically subclass this and add functionality to it.

class allennlp.state_machines.transition_functions.transition_function.TransitionFunction[source]

Bases: torch.nn.modules.module.Module, typing.Generic

A TransitionFunction is a module that assigns scores to state transitions in a transition-based decoder.

The TransitionFunction takes a State and outputs a ranked list of next states, ordered by the state’s score.

The intention with this class is that a model will implement a subclass of TransitionFunction that defines how exactly you want to handle the input and what computations get done at each step of decoding, and how states are scored. This subclass then gets passed to a DecoderTrainer to have its parameters trained.

take_step(state: StateType, max_actions: int = None, allowed_actions: typing.List[typing.Set] = None) → typing.List[StateType][source]

The main method in the TransitionFunction API. This function defines the computation done at each step of decoding and returns a ranked list of next states.

The input state is grouped, to allow for efficient computation, but the output states should all have a group_size of 1, to make things easier on the decoding algorithm. They will get regrouped later as needed.

Because of the way we handle grouping in the decoder states, constructing a new state is actually a relatively expensive operation. If you know a priori that only some of the states will be needed (either because you have a set of gold action sequences, or you have a fixed beam size), passing that information into this function will keep us from constructing more states than we need, which will greatly speed up your computation.

IMPORTANT: This method must returns states already sorted by their score, otherwise BeamSearch and other methods will break. For efficiency, we do not perform an additional sort in those methods.

ALSO IMPORTANT: When alowed_actions is given and max_actions is not, we assume you want to evaluate all possible states and do not need any sorting (e.g., this is true for maximum marginal likelihood training that does not use a beam search). In this case, we may skip the sorting step for efficiency reasons.

Parameters:
state : State

The current state of the decoder, which we will take a step from. We may be grouping together computation for several states here. Because we can have several states for each instance in the original batch being evaluated at the same time, we use group_size for this kind of batching, and batch_size for the original batch in model.forward.

max_actions : int, optional

If you know that you will only need a certain number of states out of this (e.g., in a beam search), you can pass in the max number of actions that you need, and we will only construct that many states (for each batch instance - not for each group instance!). This can save a whole lot of computation if you have an action space that’s much larger than your beam size.

allowed_actions : List[Set], optional

If the DecoderTrainer has constraints on which actions need to be evaluated (e.g., maximum marginal likelihood only needs to evaluate action sequences in a given set), you can pass those constraints here, to avoid constructing state objects unnecessarily. If there are no constraints from the trainer, passing a value of None here will allow all actions to be considered.

This is a list because it is batched - every instance in the batch has a set of allowed actions. Note that the size of this list is the group_size in the State, not the batch_size of model.forward. The training algorithm needs to convert from the batched allowed action sequences that it has to a grouped allowed action sequence list.

Returns:
next_states : List[State]

A list of next states, ordered by score.

class allennlp.state_machines.transition_functions.basic_transition_function.BasicTransitionFunction(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), predict_start_type_separately: bool = True, num_start_types: int = None, add_action_bias: bool = True, dropout: float = 0.0, num_layers: int = 1) → None[source]

Bases: allennlp.state_machines.transition_functions.transition_function.TransitionFunction

This is a typical transition function for a state-based decoder. We use an LSTM to track decoder state, and at every timestep we compute an attention over the input question/utterance to help in selecting the action. All actions have an embedding, and we use a dot product between a predicted action embedding and the allowed actions to compute a distribution over actions at each timestep.

We allow the first action to be predicted separately from everything else. This is optional, and is because that’s how the original WikiTableQuestions semantic parser was written. The intuition is that maybe you want to predict the type of your output program outside of the typical LSTM decoder (or maybe Jayant just didn’t realize this could be treated as another action...).

Parameters:
encoder_output_dim : int
action_embedding_dim : int
input_attention : Attention
activation : Activation, optional (default=relu)

The activation that gets applied to the decoder LSTM input and to the action query.

predict_start_type_separately : bool, optional (default=True)

If True, we will predict the initial action (which is typically the base type of the logical form) using a different mechanism than our typical action decoder. We basically just do a projection of the hidden state, and don’t update the decoder RNN.

num_start_types : int, optional (default=None)

If predict_start_type_separately is True, this is the number of start types that are in the grammar. We need this so we can construct parameters with the right shape. This is unused if predict_start_type_separately is False.

add_action_bias : bool, optional (default=True)

If True, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.

dropout : float (optional, default=0.0)
num_layers: ``int``, (optional, default=1)

The number of layers in the decoder LSTM.

attend_on_question(query: torch.Tensor, encoder_outputs: torch.Tensor, encoder_output_mask: torch.Tensor) → typing.Tuple[torch.Tensor, torch.Tensor][source]

Given a query (which is typically the decoder hidden state), compute an attention over the output of the question encoder, and return a weighted sum of the question representations given this attention. We also return the attention weights themselves.

This is a simple computation, but we have it as a separate method so that the forward method on the main parser module can call it on the initial hidden state, to simplify the logic in take_step.

take_step(state: allennlp.state_machines.states.grammar_based_state.GrammarBasedState, max_actions: int = None, allowed_actions: typing.List[typing.Set[int]] = None) → typing.List[allennlp.state_machines.states.grammar_based_state.GrammarBasedState][source]

The main method in the TransitionFunction API. This function defines the computation done at each step of decoding and returns a ranked list of next states.

The input state is grouped, to allow for efficient computation, but the output states should all have a group_size of 1, to make things easier on the decoding algorithm. They will get regrouped later as needed.

Because of the way we handle grouping in the decoder states, constructing a new state is actually a relatively expensive operation. If you know a priori that only some of the states will be needed (either because you have a set of gold action sequences, or you have a fixed beam size), passing that information into this function will keep us from constructing more states than we need, which will greatly speed up your computation.

IMPORTANT: This method must returns states already sorted by their score, otherwise BeamSearch and other methods will break. For efficiency, we do not perform an additional sort in those methods.

ALSO IMPORTANT: When alowed_actions is given and max_actions is not, we assume you want to evaluate all possible states and do not need any sorting (e.g., this is true for maximum marginal likelihood training that does not use a beam search). In this case, we may skip the sorting step for efficiency reasons.

Parameters:
state : State

The current state of the decoder, which we will take a step from. We may be grouping together computation for several states here. Because we can have several states for each instance in the original batch being evaluated at the same time, we use group_size for this kind of batching, and batch_size for the original batch in model.forward.

max_actions : int, optional

If you know that you will only need a certain number of states out of this (e.g., in a beam search), you can pass in the max number of actions that you need, and we will only construct that many states (for each batch instance - not for each group instance!). This can save a whole lot of computation if you have an action space that’s much larger than your beam size.

allowed_actions : List[Set], optional

If the DecoderTrainer has constraints on which actions need to be evaluated (e.g., maximum marginal likelihood only needs to evaluate action sequences in a given set), you can pass those constraints here, to avoid constructing state objects unnecessarily. If there are no constraints from the trainer, passing a value of None here will allow all actions to be considered.

This is a list because it is batched - every instance in the batch has a set of allowed actions. Note that the size of this list is the group_size in the State, not the batch_size of model.forward. The training algorithm needs to convert from the batched allowed action sequences that it has to a grouped allowed action sequence list.

Returns:
next_states : List[State]

A list of next states, ordered by score.

class allennlp.state_machines.transition_functions.linking_transition_function.LinkingTransitionFunction(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), predict_start_type_separately: bool = True, num_start_types: int = None, add_action_bias: bool = True, mixture_feedforward: allennlp.modules.feedforward.FeedForward = None, dropout: float = 0.0, num_layers: int = 1) → None[source]

Bases: allennlp.state_machines.transition_functions.basic_transition_function.BasicTransitionFunction

This transition function adds the ability to consider linked actions to the BasicTransitionFunction (which is just an LSTM decoder with attention). These actions are potentially unseen at training time, so we need to handle them without requiring the action to have an embedding. Instead, we rely on a linking score between each action and the words in the question/utterance, and use these scores, along with the attention, to do something similar to a copy mechanism when producing these actions.

When both linked and global (embedded) actions are available, we need some way to compare the scores for these two sets of actions. The original WikiTableQuestion semantic parser just concatenated the logits together before doing a joint softmax, but this is quite brittle, because the logits might have quite different scales. So we have the option here of predicting a mixture probability between two independently normalized distributions.

Parameters:
encoder_output_dim : int
action_embedding_dim : int
input_attention : Attention
activation : Activation, optional (default=relu)

The activation that gets applied to the decoder LSTM input and to the action query.

predict_start_type_separately : bool, optional (default=True)

If True, we will predict the initial action (which is typically the base type of the logical form) using a different mechanism than our typical action decoder. We basically just do a projection of the hidden state, and don’t update the decoder RNN.

num_start_types : int, optional (default=None)

If predict_start_type_separately is True, this is the number of start types that are in the grammar. We need this so we can construct parameters with the right shape. This is unused if predict_start_type_separately is False.

add_action_bias : bool, optional (default=True)

If True, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.

mixture_feedforward : FeedForward optional (default=None)

If given, we’ll use this to compute a mixture probability between global actions and linked actions given the hidden state at every timestep of decoding, instead of concatenating the logits for both (where the logits may not be compatible with each other).

dropout : float (optional, default=0.0)
num_layers: ``int`` (optional, default=1)

The number of layers in the decoder LSTM.

class allennlp.state_machines.transition_functions.coverage_transition_function.CoverageTransitionFunction(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), predict_start_type_separately: bool = True, num_start_types: int = None, add_action_bias: bool = True, dropout: float = 0.0) → None[source]

Bases: allennlp.state_machines.transition_functions.basic_transition_function.BasicTransitionFunction

Adds a coverage penalty to the BasicTransitionFunction (which is just an LSTM decoder with attention). This coverage penalty is on the output action sequence, and requires an externally-computed agenda of actions that are expected to be produced during decoding, and encourages the model to select actions on that agenda.

The way that we encourage the model to select actions on the agenda is that we add the embeddings for actions on the agenda (that are available at this decoding step and haven’t yet been taken) to the predicted action embedding. We weight that addition by a learned multiplier that gets initialized to 1.

Parameters:
encoder_output_dim : int
action_embedding_dim : int
input_attention : Attention
activation : Activation, optional (default=relu)

The activation that gets applied to the decoder LSTM input and to the action query.

predict_start_type_separately : bool, optional (default=True)

If True, we will predict the initial action (which is typically the base type of the logical form) using a different mechanism than our typical action decoder. We basically just do a projection of the hidden state, and don’t update the decoder RNN.

num_start_types : int, optional (default=None)

If predict_start_type_separately is True, this is the number of start types that are in the grammar. We need this so we can construct parameters with the right shape. This is unused if predict_start_type_separately is False.

add_action_bias : bool, optional (default=True)

If True, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.

dropout : float (optional, default=0.0)
class allennlp.state_machines.transition_functions.linking_coverage_transition_function.LinkingCoverageTransitionFunction(encoder_output_dim: int, action_embedding_dim: int, input_attention: allennlp.modules.attention.attention.Attention, activation: allennlp.nn.activations.Activation = ReLU(), predict_start_type_separately: bool = True, num_start_types: int = None, add_action_bias: bool = True, mixture_feedforward: allennlp.modules.feedforward.FeedForward = None, dropout: float = 0.0) → None[source]

Bases: allennlp.state_machines.transition_functions.coverage_transition_function.CoverageTransitionFunction

Combines both linking and coverage on top of the BasicTransitionFunction (which is just an LSTM decoder with attention). This adds the ability to consider linked actions in addition to global (embedded) actions, and it adds a coverage penalty over the output action sequence, combining the LinkingTransitionFunction with the CoverageTransitionFunction.

The one thing that’s unique to this class is how the coverage penalty interacts with linked actions. Instead of boosting the action’s embedding, as we do in the CoverageTransitionFunction, we boost the action’s logit directly (as there is no action embedding for linked actions).

Parameters:
encoder_output_dim : int
action_embedding_dim : int
input_attention : Attention
activation : Activation, optional (default=relu)

The activation that gets applied to the decoder LSTM input and to the action query.

predict_start_type_separately : bool, optional (default=True)

If True, we will predict the initial action (which is typically the base type of the logical form) using a different mechanism than our typical action decoder. We basically just do a projection of the hidden state, and don’t update the decoder RNN.

num_start_types : int, optional (default=None)

If predict_start_type_separately is True, this is the number of start types that are in the grammar. We need this so we can construct parameters with the right shape. This is unused if predict_start_type_separately is False.

add_action_bias : bool, optional (default=True)

If True, there has been a bias dimension added to the embedding of each action, which gets used when predicting the next action. We add a dimension of ones to our predicted action vector in this case to account for that.

dropout : float (optional, default=0.0)