Reading comprehension is loosely defined as follows: given a question and a passage of text that contains the answer, answer the question.
These submodules contain models for things that are predominantly focused on reading comprehension.
BidirectionalAttentionFlow(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, num_highway_layers: int, phrase_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, attention_similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction, modeling_layer: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, span_end_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, dropout: float = 0.2, mask_lstms: bool = True, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: typing.Union[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator, NoneType] = None) → None¶
This class implements Minjoon Seo’s Bidirectional Attention Flow model for answering reading comprehension questions (ICLR 2017).
The basic layout is pretty simple: encode words as a combination of word embeddings and a character-level encoder, pass the word representations through a bi-LSTM/GRU, use a matrix of attentions to put question information into the passage word representations (this is the only part that is at all non-standard), pass this through another few layers of bi-LSTMs/GRUs, and do a softmax over span start and span end.
Used to embed the
TextFieldswe get as input to the model.
The number of highway layers to use in between embedding the input and passing it through the phrase layer.
The encoder (with its own internal stacking) that we will use in between embedding tokens and doing the bidirectional attention.
The similarity function that we will use when comparing encoded passage and question representations.
The encoder (with its own internal stacking) that we will use in between the bidirectional attention and predicting span start and end.
The encoder that we will use to incorporate span start predictions into the passage state before predicting span end.
float, optional (default=0.2)
If greater than 0, we will apply dropout with this probability after all encoders (pytorch LSTMs do not apply dropout to their last layer).
bool, optional (default=True)
False, we will skip passing the mask to the LSTM layers. This gives a ~2x speedup, with only a slight performance decrease, if any. We haven’t experimented much with this yet, but have confirmed that we still get very similar performance with much faster training times. We still use the mask for all softmaxes, but avoid the shuffling that’s required when using masking with pytorch LSTMs.
InitializerApplicator, optional (default=``InitializerApplicator()``)
Used to initialize the model parameters.
RegularizerApplicator, optional (default=``None``)
If provided, will be used to calculate the regularization penalty during training.
forward(question: typing.Dict[str, torch.LongTensor], passage: typing.Dict[str, torch.LongTensor], span_start: torch.IntTensor = None, span_end: torch.IntTensor = None, metadata: typing.List[typing.Dict[str, typing.Any]] = None) → typing.Dict[str, torch.FloatTensor]¶
question : Dict[str, torch.LongTensor]
passage : Dict[str, torch.LongTensor]
TextField. The model assumes that this passage contains the answer to the question, and predicts the beginning and ending positions of the answer within the passage.
IndexField. This is one of the things we are trying to predict - the beginning position of the answer with the passage. This is an inclusive index. If this is given, we will compute a loss that gets included in the output dictionary.
IndexField. This is one of the things we are trying to predict - the ending position of the answer with the passage. This is an inclusive index. If this is given, we will compute a loss that gets included in the output dictionary.
List[Dict[str, Any]], optional
If present, this should contain the question ID, original passage text, and token offsets into the passage for each instance in the batch. We use this for computing official metrics using the official SQuAD evaluation script. The length of this list should be the batch size, and each dictionary should have the keys
token_offsets. If you only want the best span string and don’t care about official metrics, you can omit the
An output dictionary consisting of:
span_start_logits : torch.FloatTensor
A tensor of shape
(batch_size, passage_length)representing unnormalised log probabilities of the span start position.
span_start_probs : torch.FloatTensor
The result of
span_end_logits : torch.FloatTensor
A tensor of shape
(batch_size, passage_length)representing unnormalised log probabilities of the span end position (inclusive).
span_end_probs : torch.FloatTensor
The result of
best_span : torch.IntTensor
The result of a constrained inference over
span_end_logitsto find the most probable span. Shape is
loss : torch.FloatTensor, optional
A scalar loss to be optimised.
best_span_str : List[str]
If sufficient metadata was provided for the instances in the batch, we also return the string from the original passage that the model thinks is the best answer to the question.
from_params(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → allennlp.models.reading_comprehension.bidaf.BidirectionalAttentionFlow¶
get_metrics(reset: bool = False) → typing.Dict[str, float]¶