allennlp.models.simple_tagger¶
-
class
allennlp.models.simple_tagger.
SimpleTagger
(vocab: allennlp.data.vocabulary.Vocabulary, text_field_embedder: allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder, encoder: allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder, initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>, regularizer: typing.Union[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator, NoneType] = None) → None[source]¶ Bases:
allennlp.models.model.Model
This
SimpleTagger
simply encodes a sequence of text with a stackedSeq2SeqEncoder
, then predicts a tag for each token in the sequence.Parameters: vocab :
Vocabulary
, requiredA Vocabulary, required in order to compute sizes for input/output projections.
text_field_embedder :
TextFieldEmbedder
, requiredUsed to embed the
tokens
TextField
we get as input to the model.encoder :
Seq2SeqEncoder
The encoder (with its own internal stacking) that we will use in between embedding tokens and predicting output tags.
initializer :
InitializerApplicator
, optional (default=``InitializerApplicator()``)Used to initialize the model parameters.
regularizer :
RegularizerApplicator
, optional (default=``None``)If provided, will be used to calculate the regularization penalty during training.
-
decode
(output_dict: typing.Dict[str, torch.FloatTensor]) → typing.Dict[str, torch.FloatTensor][source]¶ Does a simple position-wise argmax over each token, converts indices to string labels, and adds a
"tags"
key to the dictionary with the result.
-
forward
(tokens: typing.Dict[str, torch.LongTensor], tags: torch.LongTensor = None) → typing.Dict[str, torch.FloatTensor][source]¶ Parameters: tokens : Dict[str, torch.LongTensor], required
The output of
TextField.as_array()
, which should typically be passed directly to aTextFieldEmbedder
. This output is a dictionary mapping keys toTokenIndexer
tensors. At its most basic, using aSingleIdTokenIndexer
this is:{"tokens": Tensor(batch_size, num_tokens)}
. This dictionary will have the same keys as were used for theTokenIndexers
when you created theTextField
representing your sequence. The dictionary is designed to be passed directly to aTextFieldEmbedder
, which knows how to combine different word representations into a single vector per token in your input.tags : torch.LongTensor, optional (default = None)
A torch tensor representing the sequence of integer gold class labels of shape
(batch_size, num_tokens)
.Returns: An output dictionary consisting of:
logits : torch.FloatTensor
A tensor of shape
(batch_size, num_tokens, tag_vocab_size)
representing unnormalised log probabilities of the tag classes.class_probabilities : torch.FloatTensor
A tensor of shape
(batch_size, num_tokens, tag_vocab_size)
representing a distribution of the tag classes per word.loss : torch.FloatTensor, optional
A scalar loss to be optimised.
-
classmethod
from_params
(vocab: allennlp.data.vocabulary.Vocabulary, params: allennlp.common.params.Params) → allennlp.models.simple_tagger.SimpleTagger[source]¶
-
get_metrics
(reset: bool = False) → typing.Dict[str, float][source]¶ Returns a dictionary of metrics. This method will be called by
allennlp.training.Trainer
in order to compute and use model metrics for early stopping and model serialisation. We return an empty dictionary here rather than raising as it is not required to implement metrics for a new model. A boolean reset parameter is passed, as frequently a metric accumulator will have some state which should be reset between epochs. This is also compatible withMetrics should be populated during the call to ``forward`
, with theMetric
handling the accumulation of the metric until this method is called.
-