SingleIdTokenIndexer(self, namespace:str='tokens', lowercase_tokens:bool=False, start_tokens:List[str]=None, end_tokens:List[str]=None, token_min_padding_length:int=0) -> None

This :class:TokenIndexer represents tokens as single integers.


  • namespace : str, optional (default=tokens)
  • We will use this namespace in the :class:Vocabulary to map strings to indices.
  • lowercase_tokens : bool, optional (default=False) If True, we will call token.lower() before getting an index for the token from the vocabulary.
  • start_tokens : List[str], optional (default=None) These are prepended to the tokens provided to tokens_to_indices.
  • end_tokens : List[str], optional (default=None) These are appended to the tokens provided to tokens_to_indices.
  • token_min_padding_length : int, optional (default=0)
  • See :class:TokenIndexer.


SingleIdTokenIndexer.count_vocab_items(self,, counter:Dict[str, Dict[str, int]])

The :class:Vocabulary needs to assign indices to whatever strings we see in the training data (possibly doing some frequency filtering and using an OOV, or out of vocabulary, token). This method takes a token and a dictionary of counts and increments counts for whatever vocabulary items are present in the token. If this is a single token ID representation, the vocabulary item is likely the token itself. If this is a token characters representation, the vocabulary items are all of the characters in the token.


SingleIdTokenIndexer.tokens_to_indices(self, tokens:List[], -> Dict[str, List[int]]

Takes a list of tokens and converts them to an IndexedTokenList. This could be just an ID for each token from the vocabulary. Or it could split each token into characters and return one ID per character. Or (for instance, in the case of byte-pair encoding) there might not be a clean mapping from individual tokens to indices, and the IndexedTokenList could be a complex data structure.


SingleIdTokenIndexer.get_empty_token_list(self) -> Dict[str, List[int]]

Returns an already indexed version of an empty token list. This is typically just an empty list for whatever keys are used in the indexer.