allennlp.commands.elmoΒΆ

The elmo subcommand allows you to make bulk ELMo predictions.

Given a pre-processed input text file, this command outputs the internal layers used to compute ELMo representations to a single (potentially large) file.

The input file is previously tokenized, whitespace separated text, one sentence per line. The output is a hdf5 file (<http://docs.h5py.org/en/latest/>) where, with the –all flag, each sentence is a size (3, num_tokens, 1024) array with the biLM representations.

For information, see “Deep contextualized word representations”, Peters et al 2018. https://arxiv.org/abs/1802.05365

$ allennlp elmo --help
usage: allennlp elmo [-h] (--all | --top | --average)
                     [--vocab-path VOCAB_PATH] [--options-file OPTIONS_FILE]
                     [--weight-file WEIGHT_FILE] [--batch-size BATCH_SIZE]
                     [--cuda-device CUDA_DEVICE] [--forget-sentences]
                     [--use-sentence-keys] [--include-package INCLUDE_PACKAGE]
                     input_file output_file

Create word vectors using ELMo.

positional arguments:
  input_file            The path to the input file.
  output_file           The path to the output file.

optional arguments:
  -h, --help            show this help message and exit
  --all                 Output all three ELMo vectors.
  --top                 Output the top ELMo vector.
  --average             Output the average of the ELMo vectors.
  --vocab-path VOCAB_PATH
                        A path to a vocabulary file to generate.
  --options-file OPTIONS_FILE
                        The path to the ELMo options file.
  --weight-file WEIGHT_FILE
                        The path to the ELMo weight file.
  --batch-size BATCH_SIZE
                        The batch size to use.
  --cuda-device CUDA_DEVICE
                        The cuda_device to run on.
  --forget-sentences    If this flag is specified, and --use-sentence-keys is
                        not, remove the string serialized JSON dictionary that
                        associates sentences with their line number (its HDF5
                        key) that is normally placed in the
                        "sentence_to_index" HDF5 key.
  --use-sentence-keys   Normally a sentence's line number is used as the HDF5
                        key for its embedding. If this flag is specified, the
                        sentence itself will be used as the key.
  --include-package INCLUDE_PACKAGE
                        additional packages to include