class str = '###', token_delimiter: str = None, token_indexers: typing.Dict[str,] = None, lazy: bool = False) → None[source]


Reads instances from a pretokenised file where each line is in the following format:


and converts it into a Dataset suitable for sequence tagging. You can also specify alternative delimiters in the constructor.

word_tag_delimiter: ``str``, optional (default=``”###”``)

The text that separates each WORD from its TAG.

token_delimiter: ``str``, optional (default=``None``)

The text that separates each WORD-TAG pair from the next pair. If None then the line will just be split on whitespace.

token_indexers : Dict[str, TokenIndexer], optional (default=``{“tokens”: SingleIdTokenIndexer()}``)

We use this to define the input representation for the text. See TokenIndexer. Note that the output tags will always correspond to single token IDs based on how they are pre-tokenised in the data file.

text_to_instance(tokens: typing.List[], tags: typing.List[str] = None) →[source]

We take pre-tokenized input here, because we don’t have a tokenizer in this class.