class typing.Dict[str,]) → None[source]

Bases: object

An Instance is a collection of Field objects, specifying the inputs and outputs to some model. We don’t make a distinction between inputs and outputs here, though - all operations are done on all fields, and when we return arrays, we return them as dictionaries keyed by field name. A model can then decide which fields it wants to use as inputs as which as outputs.

The Fields in an Instance can start out either indexed or un-indexed. During the data processing pipeline, all fields will end up as IndexedFields, and will then be converted into padded arrays by a DataGenerator.


fields : Dict[str, Field]

The Field objects that will be used to produce data arrays for this instance.

as_tensor_dict(padding_lengths: typing.Dict[str, typing.Dict[str, int]] = None, cuda_device: int = -1, for_training: bool = True) → typing.Dict[str, DataArray][source]

Pads each Field in this instance to the lengths given in padding_lengths (which is keyed by field name, then by padding key, the same as the return value in get_padding_lengths()), returning a list of torch tensors for each field.

If padding_lengths is omitted, we will call self.get_padding_lengths() to get the sizes of the tensors to create.

count_vocab_items(counter: typing.Dict[str, typing.Dict[str, int]])[source]

Increments counts in the given counter for all of the vocabulary items in all of the Fields in this Instance.

get_padding_lengths() → typing.Dict[str, typing.Dict[str, int]][source]

Returns a dictionary of padding lengths, keyed by field name. Each Field returns a mapping from padding keys to actual lengths, and we just key that dictionary by field name.


Converts all UnindexedFields in this Instance to IndexedFields, given the Vocabulary. This mutates the current object, it does not return a new Instance.