DatasetReader(lazy: bool = False)¶
DatasetReaderknows how to turn a file containing a dataset into a collection of
Instances. To implement your own, just override the _read(file_path) method to return an
Iterableof the instances. This could be a list containing the instances or a lazy generator that returns them one at a time.
All parameters necessary to _read the data apart from the filepath should be passed to the constructor of the
- lazy :
bool, optional (default=False)
If this is true,
instances()will return an object whose
__iter__method reloads the dataset each time it’s called. Otherwise,
instances()returns a list.
read(file_path: str) → Iterable[allennlp.data.instance.Instance]¶
Iterablecontaining all the instances in the specified dataset.
self.lazyis False, this calls
self._read(), ensures that the result is a list, then returns the resulting list.
self.lazyis True, this returns an object whose
self._read()each iteration. In this case your implementation of
_read()must also be lazy (that is, not load all instances into memory at once), otherwise you will get a
In either case, the returned
Iterablecan be iterated over multiple times. It’s unlikely you want to override this function, but if you do your result should likewise be repeatedly iterable.
text_to_instance(*inputs) → allennlp.data.instance.Instance¶
Does whatever tokenization or processing is necessary to go from textual input to an
Instance. The primary intended use for this is with a
Predictor, which gets text input as a JSON object and needs to process it to be input to a model.
The intent here is to share code between
_read()and what happens at model serving time, or any other time you want to make a prediction from new data. We need to process the data in the same way it was done at training time. Allowing the
DatasetReaderto process new text lets us accomplish this, as we can just call
DatasetReader.text_to_instancewhen serving predictions.
The input type here is rather vaguely specified, unfortunately. The
Predictorwill have to make some assumptions about the kind of
DatasetReaderthat it’s using, in order to pass it the right information.
- lazy :