allennlp.models.semantic_role_labeler

class allennlp.models.semantic_role_labeler.SemanticRoleLabeler(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, stacked_encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, binary_feature_dim: int, embedding_dropout: float = 0.0, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: typing.Union[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator, NoneType] = None) → None[source]

Bases: allennlp.models.model.Model

This model performs semantic role labeling using BIO tags using Propbank semantic roles. Specifically, it is an implmentation of Deep Semantic Role Labeling - What works and what’s next .

This implementation is effectively a series of stacked interleaved LSTMs with highway connections, applied to embedded sequences of words concatenated with a binary indicator containing whether or not a word is the verbal predicate to generate predictions for in the sentence. Additionally, during inference, Viterbi decoding is applied to constrain the predictions to contain valid BIO sequences.

Parameters:

vocab : Vocabulary, required

A Vocabulary, required in order to compute sizes for input/output projections.

text_field_embedder : TextFieldEmbedder, required

Used to embed the tokens TextField we get as input to the model.

stacked_encoder : Seq2SeqEncoder

The encoder (with its own internal stacking) that we will use in between embedding tokens and predicting output tags.

binary_feature_dim : int, required.

The dimensionality of the embedding of the binary verb predicate features.

initializer : InitializerApplicator, optional (default=``InitializerApplicator()``)

Used to initialize the model parameters.

regularizer : RegularizerApplicator, optional (default=``None``)

If provided, will be used to calculate the regularization penalty during training.

decode(output_dict: typing.Dict[str, torch.FloatTensor]) → typing.Dict[str, torch.FloatTensor][source]

Does constrained viterbi decoding on class probabilities output in forward(). The constraint simply specifies that the output tags must be a valid BIO sequence. We add a "tags" key to the dictionary with the result.

forward(tokens: typing.Dict[str, torch.LongTensor], verb_indicator: torch.LongTensor, tags: torch.LongTensor = None) → typing.Dict[str, torch.FloatTensor][source]
Parameters:

tokens : Dict[str, torch.LongTensor], required

The output of TextField.as_array(), which should typically be passed directly to a TextFieldEmbedder. This output is a dictionary mapping keys to TokenIndexer tensors. At its most basic, using a SingleIdTokenIndexer this is: {"tokens": Tensor(batch_size, num_tokens)}. This dictionary will have the same keys as were used for the TokenIndexers when you created the TextField representing your sequence. The dictionary is designed to be passed directly to a TextFieldEmbedder, which knows how to combine different word representations into a single vector per token in your input.

verb_indicator: torch.LongTensor, required.

An integer SequenceFeatureField representation of the position of the verb in the sentence. This should have shape (batch_size, num_tokens) and importantly, can be all zeros, in the case that the sentence has no verbal predicate.

tags : torch.LongTensor, optional (default = None)

A torch tensor representing the sequence of integer gold class labels of shape (batch_size, num_tokens)

Returns:

An output dictionary consisting of:

logits : torch.FloatTensor

A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing unnormalised log probabilities of the tag classes.

class_probabilities : torch.FloatTensor

A tensor of shape (batch_size, num_tokens, tag_vocab_size) representing a distribution of the tag classes per word.

loss : torch.FloatTensor, optional

A scalar loss to be optimised.

classmethod from_params(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → allennlp.models.semantic_role_labeler.SemanticRoleLabeler[source]
get_metrics(reset: bool = False)[source]
get_viterbi_pairwise_potentials()[source]

Generate a matrix of pairwise transition potentials for the BIO labels. The only constraint implemented here is that I-XXX labels must be preceded by either an identical I-XXX tag or a B-XXX tag. In order to achieve this constraint, pairs of labels which do not satisfy this constraint have a pairwise potential of -inf.

Returns:

transition_matrix : torch.Tensor

A (num_labels, num_labels) matrix of pairwise potentials.

allennlp.models.semantic_role_labeler.convert_bio_tags_to_conll_format(labels: typing.List[str])[source]

Converts BIO formatted SRL tags to the format required for evaluation with the official CONLL 2005 perl script. Spans are represented by bracketed labels, with the labels of words inside spans being the same as those outside spans. Beginning spans always have a opening bracket and a closing asterisk (e.g. “(ARG-1*” ) and closing spans always have a closing bracket (e.g. “)” ). This applies even for length 1 spans, (e.g “(ARG-0)”).

A full example of the conversion performed:

[B-ARG-1, I-ARG-1, I-ARG-1, I-ARG-1, I-ARG-1, O] [ “(ARG-1*”, “*”, “*”, “*”, “)”, ““]

Parameters:

labels : List[str], required.

A list of BIO tags to convert to the CONLL span based format.

Returns:

A list of labels in the CONLL span based format.

allennlp.models.semantic_role_labeler.write_to_conll_eval_file(prediction_file: typing.TextIO, gold_file: typing.TextIO, verb_index: typing.Union[int, NoneType], sentence: typing.List[str], prediction: typing.List[str], gold_labels: typing.List[str])[source]

Prints predicate argument predictions and gold labels for a single verbal predicate in a sentence to two provided file references.

Parameters:

prediction_file : TextIO, required.

A file reference to print predictions to.

gold_file : TextIO, required.

A file reference to print gold labels to.

verb_index : Optional[int], required.

The index of the verbal predicate in the sentence which the gold labels are the arguments for, or None if the sentence contains no verbal predicate.

sentence : List[str], required.

The word tokens.

prediction : List[str], required.

The predicted BIO labels.

gold_labels : List[str], required.

The gold BIO labels.