SrlBert(vocab: allennlp.data.vocabulary.Vocabulary, bert_model: Union[str, pytorch_pretrained_bert.modeling.BertModel], embedding_dropout: float = 0.0, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: Optional[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator] = None, label_smoothing: float = None, ignore_span_metric: bool = False, srl_eval_path: str = '/local/deploy/agent6/work/8feb324ce7c68d53/allennlp/tools/srl-eval.pl')¶
A Vocabulary, required in order to compute sizes for input/output projections.
Union[str, BertModel], required.
A string describing the BERT model to load or an already constructed BertModel.
InitializerApplicator, optional (default=``InitializerApplicator()``)
Used to initialize the model parameters.
RegularizerApplicator, optional (default=``None``)
If provided, will be used to calculate the regularization penalty during training.
float, optional (default = 0.0)
Whether or not to use label smoothing on the labels when computing cross entropy loss.
- ignore_span_metric: ``bool``, optional (default = False)
Whether to calculate span loss, which is irrelevant when predicting BIO for Open Information Extraction.
- srl_eval_path: ``str``, optional (default=``DEFAULT_SRL_EVAL_PATH``)
The path to the srl-eval.pl script. By default, will use the srl-eval.pl included with allennlp, which is located at allennlp/tools/srl-eval.pl . If
None, srl-eval.pl is not used.
decode(self, output_dict:Dict[str, torch.Tensor]) → Dict[str, torch.Tensor]¶
Does constrained viterbi decoding on class probabilities output in
forward(). The constraint simply specifies that the output tags must be a valid BIO sequence. We add a
"tags"key to the dictionary with the result.
NOTE: First, we decode a BIO sequence on top of the wordpieces. This is important; viterbi decoding produces low quality output if you decode on top of word representations directly, because the model gets confused by the ‘missing’ positions (which is sensible as it is trained to perform tagging on wordpieces, not words).
Secondly, it’s important that the indices we use to recover words from the wordpieces are the start_offsets (i.e offsets which correspond to using the first wordpiece of words which are tokenized into multiple wordpieces) as otherwise, we might get an ill-formed BIO sequence when we select out the word tags from the wordpiece tags. This happens in the case that a word is split into multiple word pieces, and then we take the last tag of the word, which might correspond to, e.g, I-V, which would not be allowed as it is not preceeded by a B tag.
forward(self, tokens:Dict[str, torch.Tensor], verb_indicator:torch.Tensor, metadata:List[Any], tags:torch.LongTensor=None)¶
- tokensDict[str, torch.LongTensor], required
The output of
TextField.as_array(), which should typically be passed directly to a
TextFieldEmbedder. For this model, this must be a SingleIdTokenIndexer which indexes wordpieces from the BERT vocabulary.
- verb_indicator: torch.LongTensor, required.
SequenceFeatureFieldrepresentation of the position of the verb in the sentence. This should have shape (batch_size, num_tokens) and importantly, can be all zeros, in the case that the sentence has no verbal predicate.
- tagstorch.LongTensor, optional (default = None)
A torch tensor representing the sequence of integer gold class labels of shape
List[Dict[str, Any]], optional, (default = None)
metadata containg the original words in the sentence, the verb to compute the frame for, and start offsets for converting wordpieces back to a sequence of words, under ‘words’, ‘verb’ and ‘offsets’ keys, respectively.
- An output dictionary consisting of:
A tensor of shape
(batch_size, num_tokens, tag_vocab_size)representing unnormalised log probabilities of the tag classes.
A tensor of shape
(batch_size, num_tokens, tag_vocab_size)representing a distribution of the tag classes per word.
- losstorch.FloatTensor, optional
A scalar loss to be optimised.
Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainerin order to compute and use model metrics for early stopping and model serialization. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible with
Metrics should be populated during the call to ``forward`, with the
Metrichandling the accumulation of the metric until this method is called.
In the BIO sequence, we cannot start the sequence with an I-XXX tag. This transition sequence is passed to viterbi_decode to specify this constraint.
The pairwise potentials between a START token and the first token of the sequence.
Generate a matrix of pairwise transition potentials for the BIO labels. The only constraint implemented here is that I-XXX labels must be preceded by either an identical I-XXX tag or a B-XXX tag. In order to achieve this constraint, pairs of labels which do not satisfy this constraint have a pairwise potential of -inf.
A (num_labels, num_labels) matrix of pairwise potentials.