TextClassificationJsonReader(self, token_indexers:Dict[str,]=None,, segment_sentences:bool=False, max_sequence_length:int=None, skip_label_indexing:bool=False, **kwargs) -> None

Reads tokens and their labels from a labeled text classification dataset. Expects a "text" field and a "label" field in JSON format.

The output of read is a list of Instance s with the fields: tokens : TextField and label : LabelField


  • token_indexers : Dict[str, TokenIndexer], optional
  • __optional (default={"tokens"__: SingleIdTokenIndexer()}) We use this to define the input representation for the text.
  • See :class:TokenIndexer.
  • tokenizer : Tokenizer, optional (default = {"tokens": SpacyTokenizer()}) Tokenizer to use to split the input text into words or other kinds of tokens.
  • segment_sentences : bool, optional (default = False) If True, we will first segment the text into sentences using SpaCy and then tokenize words. Necessary for some models that require pre-segmentation of sentences, like the Hierarchical
  • Attention Network (
  • max_sequence_length : int, optional (default = None) If specified, will truncate tokens to specified maximum length.
  • skip_label_indexing : bool, optional (default = False) Whether or not to skip label indexing. You might want to skip label indexing if your labels are numbers, so the dataset reader doesn't re-number them starting from 0.


TextClassificationJsonReader.text_to_instance(self, text:str, label:Union[str, int]=None) ->


  • text : str, required. The text to classify
  • label : str, optional, (default = None). The label for this text.


AnInstancecontaining the following fields: tokens: TextField The tokens in the sentence or phrase. label: LabelField The label label of the sentence or phrase.