# allennlp.modules.token_embedders¶

A TokenEmbedder is a Module that embeds one-hot-encoded tokens as vectors.

class allennlp.modules.token_embedders.token_embedder.TokenEmbedder[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

A TokenEmbedder is a Module that takes as input a tensor with integer ids that have been output from a TokenIndexer and outputs a vector per token in the input. The input typically has shape (batch_size, num_tokens) or (batch_size, num_tokens, num_characters), and the output is of shape (batch_size, num_tokens, output_dim). The simplest TokenEmbedder is just an embedding layer, but for character-level input, it could also be some kind of character encoder.

We add a single method to the basic Module API: get_output_dim(). This lets us more easily compute output dimensions for the TextFieldEmbedder, which we might need when defining model parameters such as LSTMs or linear layers, which need to know their input dimension before the layers are called.

default_implementation = 'embedding'
get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.embedding.Embedding(num_embeddings: int, embedding_dim: int, projection_dim: int = None, weight: torch.FloatTensor = None, padding_index: int = None, trainable: bool = True, max_norm: float = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, vocab_namespace: str = None, pretrained_file: str = None)[source]

A more featureful embedding module than the default in Pytorch. Adds the ability to:

1. embed higher-order inputs

2. pre-specify the weight matrix

3. use a non-trainable embedding

4. project the resultant embeddings to some other dimension (which only makes sense with non-trainable embeddings).

5. build all of this easily from_params

Note that if you are using our data API and are trying to embed a TextField, you should use a TextFieldEmbedder instead of using this directly.

Parameters
num_embeddingsint

Size of the dictionary of embeddings (vocabulary size).

embedding_dimint

The size of each embedding vector.

projection_dimint, (optional, default=None)

If given, we add a projection layer after the embedding layer. This really only makes sense if trainable is False.

weighttorch.FloatTensor, (optional, default=None)

A pre-initialised weight matrix for the embedding lookup, allowing the use of pretrained vectors.

If given, pads the output with zeros whenever it encounters the index.

trainablebool, (optional, default=True)

Whether or not to optimize the embedding parameters.

max_normfloat, (optional, default=None)

If given, will renormalize the embeddings to always have a norm lesser than this

norm_typefloat, (optional, default=2)

The p of the p-norm to compute for the max_norm option

If given, this will scale gradients by the frequency of the words in the mini-batch.

sparsebool, (optional, default=False)

Whether or not the Pytorch backend should use a sparse representation of the embedding weight.

vocab_namespacestr, (optional, default=None)

In case of fine-tuning/transfer learning, the model’s embedding matrix needs to be extended according to the size of extended-vocabulary. To be able to know how much to extend the embedding-matrix, it’s necessary to know which vocab_namspace was used to construct it in the original training. We store vocab_namespace used during the original training as an attribute, so that it can be retrieved during fine-tuning.

pretrained_filestr, (optional, default=None)

Used to keep track of what is the source of the weights and loading more embeddings at test time. It does not load the weights from this pretrained_file. For that purpose, use Embedding.from_params.

Returns
An Embedding module.
extend_vocab(self, extended_vocab:allennlp.data.vocabulary.Vocabulary, vocab_namespace:str=None, extension_pretrained_file:str=None, model_path:str=None)[source]

Extends the embedding matrix according to the extended vocabulary. If extension_pretrained_file is available, it will be used for initializing the new words embeddings in the extended vocabulary; otherwise we will check if _pretrained_file attribute is already available. If none is available, they will be initialized with xavier uniform.

Parameters
extended_vocabVocabulary:

Vocabulary extended from original vocabulary used to construct this Embedding.

vocab_namespacestr, (optional, default=None)

In case you know what vocab_namespace should be used for extension, you can pass it. If not passed, it will check if vocab_namespace used at the time of Embedding construction is available. If so, this namespace will be used or else extend_vocab will be a no-op.

extension_pretrained_filestr, (optional, default=None)

A file containing pretrained embeddings can be specified here. It can be the path to a local file or an URL of a (cached) remote file. Check format details in from_params of Embedding class.

model_pathstr, (optional, default=None)

Path traversing the model attributes upto this embedding module. Eg. “_text_field_embedder.token_embedder_tokens”. This is only useful to give helpful error message when extend_vocab is implicitly called by fine-tune or any other command.

forward(self, inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_params(vocab:allennlp.data.vocabulary.Vocabulary, params:allennlp.common.params.Params) → 'Embedding'[source]

We need the vocabulary here to know how many items we need to embed, and we look for a vocab_namespace key in the parameter dictionary to know which vocabulary to use. If you know beforehand exactly how many embeddings you need, or aren’t using a vocabulary mapping for the things getting embedded here, then you can pass in the num_embeddings key directly, and the vocabulary will be ignored.

In the configuration file, a file containing pretrained embeddings can be specified using the parameter "pretrained_file". It can be the path to a local file or an URL of a (cached) remote file. Two formats are supported:

• hdf5 file - containing an embedding matrix in the form of a torch.Tensor;

• text file - an utf-8 encoded text file with space separated fields:

[word] [dim 1] [dim 2] ...


The text file can eventually be compressed with gzip, bz2, lzma or zip. You can even select a single file inside an archive containing multiple files using the URI:

"(archive_uri)#file_path_inside_the_archive"


where archive_uri can be a file system path or a URL. For example:

"(https://nlp.stanford.edu/data/glove.twitter.27B.zip)#glove.twitter.27B.200d.txt"

get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.embedding.EmbeddingsFileURI(main_file_uri, path_inside_archive)[source]

Bases: tuple

property main_file_uri

Alias for field number 0

property path_inside_archive

Alias for field number 1

class allennlp.modules.token_embedders.embedding.EmbeddingsTextFile(file_uri: str, encoding: str = 'utf-8', cache_dir: str = None)[source]

Bases: typing.Iterator

Utility class for opening embeddings text files. Handles various compression formats, as well as context management.

Parameters
file_uri: str

It can be:

• a file system path or a URL of an eventually compressed text file or a zip/tar archive containing a single file.

• URI of the type (archive_path_or_url)#file_path_inside_archive if the text file is contained in a multi-file archive.

encoding: str
cache_dir: str
DEFAULT_ENCODING = 'utf-8'
close(self) → None[source]
read(self) → str[source]
readline(self) → str[source]
allennlp.modules.token_embedders.embedding.format_embeddings_file_uri(main_file_path_or_url:str, path_inside_archive:Union[str, NoneType]=None) → str[source]
allennlp.modules.token_embedders.embedding.parse_embeddings_file_uri(uri:str) → 'EmbeddingsFileURI'[source]
class allennlp.modules.token_embedders.token_characters_encoder.TokenCharactersEncoder(embedding: allennlp.modules.token_embedders.embedding.Embedding, encoder: allennlp.modules.seq2vec_encoders.seq2vec_encoder.Seq2VecEncoder, dropout: float = 0.0)[source]

A TokenCharactersEncoder takes the output of a TokenCharactersIndexer, which is a tensor of shape (batch_size, num_tokens, num_characters), embeds the characters, runs a token-level encoder, and returns the result, which is a tensor of shape (batch_size, num_tokens, encoding_dim). We also optionally apply dropout after the token-level encoder.

We take the embedding and encoding modules as input, so this class is itself quite simple.

forward(self, token_characters:torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_params(vocab:allennlp.data.vocabulary.Vocabulary, params:allennlp.common.params.Params) → 'TokenCharactersEncoder'[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.elmo_token_embedder.ElmoTokenEmbedder(options_file: str, weight_file: str, do_layer_norm: bool = False, dropout: float = 0.5, requires_grad: bool = False, projection_dim: int = None, vocab_to_cache: List[str] = None, scalar_mix_parameters: List[float] = None)[source]

Compute a single layer of ELMo representations.

This class serves as a convenience when you only want to use one layer of ELMo representations at the input of your network. It’s essentially a wrapper around Elmo(num_output_representations=1, …)

Parameters
options_filestr, required.

An ELMo JSON options file.

weight_filestr, required.

An ELMo hdf5 weight file.

do_layer_normbool, optional.

Should we apply layer normalization (passed to ScalarMix)?

dropoutfloat, optional, (default = 0.5).

The dropout value to be applied to the ELMo representations.

requires_gradbool, optional

If True, compute gradient of ELMo parameters for fine tuning.

projection_dimint, optional

If given, we will project the ELMo embedding down to this dimension. We recommend that you try using ELMo with a lot of dropout and no projection first, but we have found a few cases where projection helps (particularly where there is very limited training data).

vocab_to_cacheList[str], optional.

A list of words to pre-compute and cache character convolutions for. If you use this option, the ElmoTokenEmbedder expects that you pass word indices of shape (batch_size, timesteps) to forward, instead of character indices. If you use this option and pass a word which wasn’t pre-cached, this will break.

scalar_mix_parametersList[int], optional, (default=None)

If not None, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training. The mixing weights here should be the unnormalized (i.e., pre-softmax) weights. So, if you wanted to use only the 1st layer of a 2-layer ELMo, you can set this to [-9e10, 1, -9e10 ].

forward(self, inputs:torch.Tensor, word_inputs:torch.Tensor=None) → torch.Tensor[source]
Parameters
inputs: torch.Tensor

Shape (batch_size, timesteps, 50) of character ids representing the current batch.

word_inputstorch.Tensor, optional.

If you passed a cached vocab, you can in addition pass a tensor of shape (batch_size, timesteps), which represent word ids which have been pre-cached.

Returns
The ELMo representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)
classmethod from_params(vocab:allennlp.data.vocabulary.Vocabulary, params:allennlp.common.params.Params) → 'ElmoTokenEmbedder'[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.elmo_token_embedder_multilang.ElmoTokenEmbedderMultiLang(options_files: Dict[str, str], weight_files: Dict[str, str], do_layer_norm: bool = False, dropout: float = 0.5, requires_grad: bool = False, projection_dim: int = None, vocab_to_cache: List[str] = None, scalar_mix_parameters: List[float] = None, aligning_files: Dict[str, str] = None)[source]

A multilingual ELMo embedder - extending ElmoTokenEmbedder for multiple languages. Each language has different weights for the ELMo model and an alignment matrix.

Parameters
options_filesDict[str, str], required.

A dictionary of language identifier to an ELMo JSON options file.

weight_filesDict[str, str], required.

A dictionary of language identifier to an ELMo hdf5 weight file.

do_layer_normbool, optional.

Should we apply layer normalization (passed to ScalarMix)?

dropoutfloat, optional.

The dropout value to be applied to the ELMo representations.

requires_gradbool, optional

If True, compute gradient of ELMo parameters for fine tuning.

projection_dimint, optional

If given, we will project the ELMo embedding down to this dimension. We recommend that you try using ELMo with a lot of dropout and no projection first, but we have found a few cases where projection helps (particulary where there is very limited training data).

vocab_to_cacheList[str], optional, (default = 0.5).

A list of words to pre-compute and cache character convolutions for. If you use this option, the ElmoTokenEmbedder expects that you pass word indices of shape (batch_size, timesteps) to forward, instead of character indices. If you use this option and pass a word which wasn’t pre-cached, this will break.

scalar_mix_parametersList[int], optional, (default=None).

If not None, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.

aligning_filesDict[str, str], optional, (default={}).

A dictionary of language identifier to a pth file with an alignment matrix.

forward(self, inputs:torch.Tensor, lang:str, word_inputs:torch.Tensor=None) → torch.Tensor[source]
Parameters
inputs: torch.Tensor

Shape (batch_size, timesteps, 50) of character ids representing the current batch.

langstr, , required.

The language of the ELMo embedder to use.

word_inputstorch.Tensor, optional.

If you passed a cached vocab, you can in addition pass a tensor of shape (batch_size, timesteps), which represent word ids which have been pre-cached.

Returns
The ELMo representations for the given language for the input sequence, shape
(batch_size, timesteps, embedding_dim)
classmethod from_params(vocab:allennlp.data.vocabulary.Vocabulary, params:allennlp.common.params.Params) → 'ElmoTokenEmbedderMultiLang'[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_output_dim(self)[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.openai_transformer_embedder.OpenaiTransformerEmbedder(transformer: allennlp.modules.openai_transformer.OpenaiTransformer, top_layer_only: bool = False)[source]

Takes a byte-pair representation of a batch of sentences (as produced by the OpenaiTransformerBytePairIndexer) and generates a ScalarMix of the corresponding contextual embeddings.

Parameters
transformer: OpenaiTransformer, required.

The OpenaiTransformer module used for the embeddings.

top_layer_only: bool, optional (default = False)

If True, then only return the top layer instead of apply the scalar mix.

forward(self, inputs:torch.Tensor, offsets:torch.Tensor=None) → torch.Tensor[source]
Parameters
inputs: torch.Tensor, required

A (batch_size, num_timesteps) tensor representing the byte-pair encodings for the current batch.

offsets: torch.Tensor, required

A (batch_size, max_sequence_length) tensor representing the word offsets for the current batch.

Returns
[torch.Tensor]

An embedding representation of the input sequence having shape (batch_size, sequence_length, embedding_dim)

get_output_dim(self)[source]

The last dimension of the output, not the shape.

A TokenEmbedder which uses one of the BERT models (https://github.com/google-research/bert) to produce embeddings.

At its core it uses Hugging Face’s PyTorch implementation (https://github.com/huggingface/pytorch-pretrained-BERT), so thanks to them!

class allennlp.modules.token_embedders.bert_token_embedder.BertEmbedder(bert_model: pytorch_pretrained_bert.modeling.BertModel, top_layer_only: bool = False, max_pieces: int = 512, num_start_tokens: int = 1, num_end_tokens: int = 1, scalar_mix_parameters: List[float] = None)[source]

A TokenEmbedder that produces BERT embeddings for your tokens. Should be paired with a BertIndexer, which produces wordpiece ids.

Most likely you probably want to use PretrainedBertEmbedder for one of the named pretrained models, not this base class.

Parameters
bert_model: BertModel

The BERT model being wrapped.

top_layer_only: bool, optional (default = False)

If True, then only return the top layer instead of apply the scalar mix.

max_piecesint, optional (default: 512)

The BERT embedder uses positional embeddings and so has a corresponding maximum length for its input ids. Assuming the inputs are windowed and padded appropriately by this length, the embedder will split them into a large batch, feed them into BERT, and recombine the output as if it was a longer sequence.

num_start_tokensint, optional (default: 1)

The number of starting special tokens input to BERT (usually 1, i.e., [CLS])

num_end_tokensint, optional (default: 1)

The number of ending tokens input to BERT (usually 1, i.e., [SEP])

scalar_mix_parameters: List[float], optional, (default = None)

If not None, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.

forward(self, input_ids:torch.LongTensor, offsets:torch.LongTensor=None, token_type_ids:torch.LongTensor=None) → torch.Tensor[source]
Parameters
input_idstorch.LongTensor

The (batch_size, …, max_sequence_length) tensor of wordpiece ids.

offsetstorch.LongTensor, optional

The BERT embeddings are one per wordpiece. However it’s possible/likely you might want one per original token. In that case, offsets represents the indices of the desired wordpiece for each original token. Depending on how your token indexer is configured, this could be the position of the last wordpiece for each token, or it could be the position of the first wordpiece for each token.

For example, if you had the sentence “Definitely not”, and if the corresponding wordpieces were [“Def”, “##in”, “##ite”, “##ly”, “not”], then the input_ids would be 5 wordpiece ids, and the “last wordpiece” offsets would be [3, 4]. If offsets are provided, the returned tensor will contain only the wordpiece embeddings at those positions, and (in particular) will contain one embedding per token. If offsets are not provided, the entire tensor of wordpiece embeddings will be returned.

token_type_idstorch.LongTensor, optional

If an input consists of two sentences (as in the BERT paper), tokens from the first sentence should have type 0 and tokens from the second sentence should have type 1. If you don’t provide this (the default BertIndexer doesn’t) then it’s assumed to be all 0s.

get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.bert_token_embedder.PretrainedBertEmbedder(pretrained_model: str, requires_grad: bool = False, top_layer_only: bool = False, scalar_mix_parameters: List[float] = None)[source]
Parameters
pretrained_model: str

Either the name of the pretrained model to use (e.g. ‘bert-base-uncased’), or the path to the .tar.gz file with the model weights.

If the name is a key in the list of pretrained models at https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L41 the corresponding path will be used; otherwise it will be interpreted as a path or URL.

requires_gradbool, optional (default = False)

If True, compute gradient of BERT parameters for fine tuning.

top_layer_only: bool, optional (default = False)

If True, then only return the top layer instead of apply the scalar mix.

scalar_mix_parameters: List[float], optional, (default = None)

If not None, use these scalar mix parameters to weight the representations produced by different layers. These mixing weights are not updated during training.

class allennlp.modules.token_embedders.bert_token_embedder.PretrainedBertModel[source]

Bases: object

In some instances you may want to load the same BERT model twice (e.g. to use as a token embedder and also as a pooling layer). This factory provides a cache so that you don’t actually have to load the model twice.

classmethod load(model_name:str, cache_model:bool=True) → pytorch_pretrained_bert.modeling.BertModel[source]
class allennlp.modules.token_embedders.language_model_token_embedder.LanguageModelTokenEmbedder(archive_file: str, dropout: float = None, bos_eos_tokens: Tuple[str, str] = ('<S>', '</S>'), remove_bos_eos: bool = True, requires_grad: bool = False)[source]

Compute a single layer of representations from a (optionally bidirectional) language model. This is done by computing a learned scalar average of the layers from the LM. Typically the LM’s weights will be fixed, but they can be fine tuned by setting requires_grad.

Parameters
archive_filestr, required

An archive file, typically model.tar.gz, from a LanguageModel. The contextualizer used by the LM must satisfy two requirements:

1. It must have a num_layers field.

2. It must take a boolean return_all_layers parameter in its constructor.

See BidirectionalLanguageModelTransformer for their definitions.

dropoutfloat, optional.

The dropout value to be applied to the representations.

bos_eos_tokensTuple[str, str], optional (default=(“<S>”, “</S>”))

These will be indexed and placed around the indexed tokens. Necessary if the language model was trained with them, but they were injected external to an indexer.

remove_bos_eos: bool, optional (default: True)

Typically the provided token indexes will be augmented with begin-sentence and end-sentence tokens. (Alternatively, you can pass bos_eos_tokens.) If this flag is True the corresponding embeddings will be removed from the return values.

Warning: This only removes a single start and single end token!

requires_gradbool, optional (default: False)

If True, compute gradient of bidirectional language model parameters for fine tuning.

forward(self, inputs:torch.Tensor) → Dict[str, torch.Tensor][source]
Parameters
inputs: torch.Tensor

Shape (batch_size, timesteps, ...) of token ids representing the current batch. These must have been produced using the same indexer the LM was trained on.

Returns
The bidirectional language model representations for the input sequence, shape
(batch_size, timesteps, embedding_dim)
get_output_dim(self) → int[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.bag_of_word_counts_token_embedder.BagOfWordCountsTokenEmbedder(vocab: allennlp.data.vocabulary.Vocabulary, vocab_namespace: str, projection_dim: int = None, ignore_oov: bool = False)[source]

Represents a sequence of tokens as a bag of (discrete) word ids, as it was done in the pre-neural days.

Each sequence gets a vector of length vocabulary size, where the i’th entry in the vector corresponds to number of times the i’th token in the vocabulary appears in the sequence.

By default, we ignore padding tokens.

Parameters
vocab: Vocabulary
vocab_namespace: str

namespace of vocabulary to embed

projection_dimint, optional (default = None)

if specified, will project the resulting bag of words representation to specified dimension.

ignore_oovbool, optional (default = False)

If true, we ignore the OOV token.

forward(self, inputs:torch.Tensor) → torch.Tensor[source]
Parameters
inputs: torch.Tensor

Shape (batch_size, timesteps, sequence_length) of word ids representing the current batch.

Returns
The bag-of-words representations for the input sequence, shape
(batch_size, vocab_size)
classmethod from_params(vocab:allennlp.data.vocabulary.Vocabulary, params:allennlp.common.params.Params) → 'BagOfWordCountsTokenEmbedder'[source]

we look for a vocab_namespace key in the parameter dictionary to know which vocabulary to use.

get_output_dim(self)[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.pass_through_token_embedder.PassThroughTokenEmbedder(hidden_dim: int)[source]

Assumes that the input is already vectorized in some way, and just returns it.

Parameters
hidden_dimint, required.
forward(self, inputs:torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_output_dim(self)[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.

class allennlp.modules.token_embedders.pretrained_transformer_embedder.PretrainedTransformerEmbedder(model_name: str)[source]

Uses a pretrained model from pytorch-transformers as a TokenEmbedder.

forward(self, token_ids:torch.LongTensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_output_dim(self)[source]

Returns the final output dimension that this TokenEmbedder uses to represent each token. This is not the shape of the returned tensor, but the last element of that shape.