allennlp.data.tokenizers.pretrained_transformer_pre_tokenizer#

BertPreTokenizer#

BertPreTokenizer(self, do_lower_case:bool=True, never_split:Union[List[str], NoneType]=None) -> None

The BasicTokenizer from the BERT implementation. This is used to split a sentence into words. Then the BertTokenIndexer converts each word into wordpieces.

default_never_split#

list() -> new empty list list(iterable) -> new list initialized from iterable's items

tokenize#

BertPreTokenizer.tokenize(self, text:str) -> List[allennlp.data.tokenizers.token.Token]

Actually implements splitting words into tokens.

Returns

tokens: List[Token]