allennlp.data.dataset_readers.universal_dependencies

class allennlp.data.dataset_readers.universal_dependencies.UniversalDependenciesDatasetReader(token_indexers: typing.Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer] = None, use_language_specific_pos: bool = False, lazy: bool = False) → None[source]

Bases: allennlp.data.dataset_readers.dataset_reader.DatasetReader

Reads a file in the conllu Universal Dependencies format.

Parameters:
token_indexers : Dict[str, TokenIndexer], optional (default=``{“tokens”: SingleIdTokenIndexer()}``)

The token indexers to be applied to the words TextField.

use_language_specific_pos : bool, optional (default = False)

Whether to use UD POS tags, or to use the language specific POS tags provided in the conllu format.

text_to_instance(words: typing.List[str], upos_tags: typing.List[str], dependencies: typing.List[typing.Tuple[str, int]] = None) → allennlp.data.instance.Instance[source]
Parameters:
words : List[str], required.

The words in the sentence to be encoded.

upos_tags : List[str], required.

The universal dependencies POS tags for each word.

dependencies ``List[Tuple[str, int]]``, optional (default = None)

A list of (head tag, head index) tuples. Indices are 1 indexed, meaning an index of 0 corresponds to that word being the root of the dependency tree.

Returns:
An instance containing words, upos tags, dependency head tags and head
indices as fields.
allennlp.data.dataset_readers.universal_dependencies.lazy_parse(text: str, fields: typing.Tuple = ('id', 'form', 'lemma', 'upostag', 'xpostag', 'feats', 'head', 'deprel', 'deps', 'misc'))[source]