class allennlp.models.crf_tagger.CrfTagger(vocab:, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, label_namespace: str = 'labels', initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: typing.Union[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator, NoneType] = None) → None[source]

Bases: allennlp.models.model.Model

The CrfTagger encodes a sequence of text with a Seq2SeqEncoder, then uses a Conditional Random Field model to predict a tag for each token in the sequence.


vocab : Vocabulary, required

A Vocabulary, required in order to compute sizes for input/output projections.

text_field_embedder : TextFieldEmbedder, required

Used to embed the tokens TextField we get as input to the model.

encoder : Seq2SeqEncoder

The encoder that we will use in between embedding tokens and predicting output tags.

label_namespace : str, optional (default=``labels``)

This is needed to compute the SpanBasedF1Measure metric. Unless you did something unusual, the default value should be what you want.

initializer : InitializerApplicator, optional (default=``InitializerApplicator()``)

Used to initialize the model parameters.

regularizer : RegularizerApplicator, optional (default=``None``)

If provided, will be used to calculate the regularization penalty during training.

decode(output_dict: typing.Dict[str, torch.FloatTensor]) → typing.Dict[str, torch.FloatTensor][source]

Converts the tag ids to the actual tags. output_dict["tags"] is a list of lists of tag_ids, so we use an ugly nested list comprehension.

forward(tokens: typing.Dict[str, torch.LongTensor], tags: torch.LongTensor = None) → typing.Dict[str, torch.FloatTensor][source]

tokens : Dict[str, torch.LongTensor], required

The output of TextField.as_array(), which should typically be passed directly to a TextFieldEmbedder. This output is a dictionary mapping keys to TokenIndexer tensors. At its most basic, using a SingleIdTokenIndexer this is: {"tokens": Tensor(batch_size, num_tokens)}. This dictionary will have the same keys as were used for the TokenIndexers when you created the TextField representing your sequence. The dictionary is designed to be passed directly to a TextFieldEmbedder, which knows how to combine different word representations into a single vector per token in your input.

tags : torch.LongTensor, optional (default = None)

A torch tensor representing the sequence of integer gold class labels of shape (batch_size, num_tokens).


An output dictionary consisting of:

logits : torch.FloatTensor

The logits that are the output of the tag_projection_layer

mask : torch.LongTensor

The text field mask for the input tokens

tags : List[List[str]]

The predicted tags using the Viterbi algorithm.

loss : torch.FloatTensor, optional

A scalar loss to be optimised. Only computed if gold label tags are provided.

classmethod from_params(vocab:, params: allennlp.common.params.Params) → allennlp.models.crf_tagger.CrfTagger[source]
get_metrics(reset: bool = False) → typing.Dict[str, float][source]

Returns a dictionary of metrics. This method will be called by in order to compute and use model metrics for early stopping and model serialisation. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible with Metrics should be populated during the call to ``forward`, with the Metric handling the accumulation of the metric until this method is called.