Assorted utilities for working with neural networks in AllenNLP.

allennlp.nn.util.add_positional_features(tensor: torch.Tensor, min_timescale: float = 1.0, max_timescale: float = 10000.0)[source]

Implements the frequency-based positional encoding described in Attention is all you Need .

Adds sinusoids of different frequencies to a Tensor. A sinusoid of a different frequency and phase is added to each dimension of the input Tensor. This allows the attention heads to use absolute and relative positions.

The number of timescales is equal to hidden_dim / 2 within the range (min_timescale, max_timescale). For each timescale, the two sinusoidal signals sin(timestep / timescale) and cos(timestep / timescale) are generated and concatenated along the hidden_dim dimension.

tensor : torch.Tensor

a Tensor with shape (batch_size, timesteps, hidden_dim).

min_timescale : float, optional (default = 1.0)

The smallest timescale to use.

max_timescale : float, optional (default = 1.0e4)

The largest timescale to use.

The input tensor augmented with the sinusoidal frequencies.
allennlp.nn.util.add_sentence_boundary_token_ids(tensor: torch.Tensor, mask: torch.Tensor, sentence_begin_token: typing.Any, sentence_end_token: typing.Any) → typing.Tuple[torch.Tensor, torch.Tensor][source]

Add begin/end of sentence tokens to the batch of sentences. Given a batch of sentences with size (batch_size, timesteps) or (batch_size, timesteps, dim) this returns a tensor of shape (batch_size, timesteps + 2) or (batch_size, timesteps + 2, dim) respectively.

Returns both the new tensor and updated mask.

tensor : torch.Tensor

A tensor of shape (batch_size, timesteps) or (batch_size, timesteps, dim)

mask : torch.Tensor

A tensor of shape (batch_size, timesteps)

sentence_begin_token: Any (anything that can be broadcast in torch for assignment)

For 2D input, a scalar with the <S> id. For 3D input, a tensor with length dim.

sentence_end_token: Any (anything that can be broadcast in torch for assignment)

For 2D input, a scalar with the </S> id. For 3D input, a tensor with length dim.

tensor_with_boundary_tokens : torch.Tensor

The tensor with the appended and prepended boundary tokens. If the input was 2D, it has shape (batch_size, timesteps + 2) and if the input was 3D, it has shape (batch_size, timesteps + 2, dim).

new_mask : torch.Tensor

The new mask for the tensor, taking into account the appended tokens marking the beginning and end of the sentence.

allennlp.nn.util.batch_tensor_dicts(tensor_dicts: typing.List[typing.Dict[str, torch.Tensor]], remove_trailing_dimension: bool = False) → typing.Dict[str, torch.Tensor][source]

Takes a list of tensor dictionaries, where each dictionary is assumed to have matching keys, and returns a single dictionary with all tensors with the same key batched together.

tensor_dicts : List[Dict[str, torch.Tensor]]

The list of tensor dictionaries to batch.

remove_trailing_dimension : bool

If True, we will check for a trailing dimension of size 1 on the tensors that are being batched, and remove it if we find it.

allennlp.nn.util.batched_index_select(target: torch.Tensor, indices: torch.LongTensor, flattened_indices: typing.Union[torch.LongTensor, NoneType] = None) → torch.Tensor[source]

The given indices of size (batch_size, d_1, ..., d_n) indexes into the sequence dimension (dimension 2) of the target, which has size (batch_size, sequence_length, embedding_size).

This function returns selected values in the target with respect to the provided indices, which have size (batch_size, d_1, ..., d_n, embedding_size). This can use the optionally precomputed flattened_indices() with size (batch_size * d_1 * ... * d_n) if given.

An example use case of this function is looking up the start and end indices of spans in a sequence tensor. This is used in the CoreferenceResolver. Model to select contextual word representations corresponding to the start and end indices of mentions. The key reason this can’t be done with basic torch functions is that we want to be able to use look-up tensors with an arbitrary number of dimensions (for example, in the coref model, we don’t know a-priori how many spans we are looking up).

target : torch.Tensor, required.

A 3 dimensional tensor of shape (batch_size, sequence_length, embedding_size). This is the tensor to be indexed.

indices : torch.LongTensor

A tensor of shape (batch_size, ...), where each element is an index into the sequence_length dimension of the target tensor.

flattened_indices : Optional[torch.Tensor], optional (default = None)

An optional tensor representing the result of calling :func:~`flatten_and_batch_shift_indices` on indices. This is helpful in the case that the indices can be flattened once and cached for many batch lookups.

selected_targets : torch.Tensor

A tensor with shape [indices.size(), target.size(-1)] representing the embedded indices extracted from the batch flattened target tensor.

allennlp.nn.util.bucket_values(distances: torch.Tensor, num_identity_buckets: int = 4, num_total_buckets: int = 10) → torch.Tensor[source]

Places the given values (designed for distances) into num_total_buckets``semi-logscale buckets, with ``num_identity_buckets of these capturing single values.

The default settings will bucket values into the following buckets: [0, 1, 2, 3, 4, 5-7, 8-15, 16-31, 32-63, 64+].

distances : torch.Tensor, required.

A Tensor of any size, to be bucketed.

num_identity_buckets: int, optional (default = 4).

The number of identity buckets (those only holding a single value).

num_total_buckets : int, (default = 10)

The total number of buckets to bucket values into.

A tensor of the same shape as the input, containing the indices of the buckets
the values were placed in.
allennlp.nn.util.combine_tensors(combination: str, tensors: typing.List[torch.Tensor]) → torch.Tensor[source]

Combines a list of tensors using element-wise operations and concatenation, specified by a combination string. The string refers to (1-indexed) positions in the input tensor list, and looks like "1,2,1+2,3-1".

We allow the following kinds of combinations: x, x*y, x+y, x-y, and x/y, where x and y are positive integers less than or equal to len(tensors). Each of the binary operations is performed elementwise. You can give as many combinations as you want in the combination string. For example, for the input string "1,2,1*2", the result would be [1;2;1*2], as you would expect, where [;] is concatenation along the last dimension.

If you have a fixed, known way to combine tensors that you use in a model, you should probably just use something like[x_tensor, y_tensor, x_tensor * y_tensor]). This function adds some complexity that is only necessary if you want the specific combination used to be configurable.

If you want to do any element-wise operations, the tensors involved in each element-wise operation must have the same shape.

This function also accepts x and y in place of 1 and 2 in the combination string.

allennlp.nn.util.device_mapping(cuda_device: int)[source]

In order to torch.load() a GPU-trained model onto a CPU (or specific GPU), you have to supply a map_location function. Call this with the desired cuda_device to get the function that torch.load() needs.

allennlp.nn.util.flatten_and_batch_shift_indices(indices: torch.Tensor, sequence_length: int) → torch.Tensor[source]

This is a subroutine for batched_index_select(). The given indices of size (batch_size, d_1, ..., d_n) indexes into dimension 2 of a target tensor, which has size (batch_size, sequence_length, embedding_size). This function returns a vector that correctly indexes into the flattened target. The sequence length of the target must be provided to compute the appropriate offsets.

indices = torch.ones([2,3], dtype=torch.long)
# Sequence length of the target tensor.
sequence_length = 10
shifted_indices = flatten_and_batch_shift_indices(indices, sequence_length)
# Indices into the second element in the batch are correctly shifted
# to take into account that the target tensor will be flattened before
# the indices are applied.
assert shifted_indices == [1, 1, 1, 11, 11, 11]
indices : torch.LongTensor, required.
sequence_length : int, required.

The length of the sequence the indices index into. This must be the second dimension of the tensor.

offset_indices : torch.LongTensor
allennlp.nn.util.flattened_index_select(target: torch.Tensor, indices: torch.LongTensor) → torch.Tensor[source]

The given indices of size (set_size, subset_size) specifies subsets of the target that each of the set_size rows should select. The target has size (batch_size, sequence_length, embedding_size), and the resulting selected tensor has size (batch_size, set_size, subset_size, embedding_size).

target : torch.Tensor, required.

A Tensor of shape (batch_size, sequence_length, embedding_size).

indices : torch.LongTensor, required.

A LongTensor of shape (set_size, subset_size). All indices must be < sequence_length as this tensor is an index into the sequence_length dimension of the target.

selected : torch.Tensor, required.

A Tensor of shape (batch_size, set_size, subset_size, embedding_size).

allennlp.nn.util.get_combined_dim(combination: str, tensor_dims: typing.List[int]) → int[source]

For use with combine_tensors(). This function computes the resultant dimension when calling combine_tensors(combination, tensors), when the tensor dimension is known. This is necessary for knowing the sizes of weight matrices when building models that use combine_tensors.

combination : str

A comma-separated list of combination pieces, like "1,2,1*2", specified identically to combination in combine_tensors().

tensor_dims : List[int]

A list of tensor dimensions, where each dimension is from the last axis of the tensors that will be input to combine_tensors().

allennlp.nn.util.get_device_of(tensor: torch.Tensor) → int[source]

Returns the device of the tensor.

allennlp.nn.util.get_dropout_mask(dropout_probability: float, tensor_for_masking: torch.Tensor)[source]

Computes and returns an element-wise dropout mask for a given tensor, where each element in the mask is dropped out with probability dropout_probability. Note that the mask is NOT applied to the tensor - the tensor is passed to retain the correct CUDA tensor type for the mask.

dropout_probability : float, required.

Probability of dropping a dimension of the input.

tensor_for_masking : torch.Tensor, required.
A torch.FloatTensor consisting of the binary mask scaled by 1/ (1 - dropout_probability).
This scaling ensures expected values and variances of the output of applying this mask

and the original tensor are the same.

allennlp.nn.util.get_final_encoder_states(encoder_outputs: torch.Tensor, mask: torch.Tensor, bidirectional: bool = False) → torch.Tensor[source]

Given the output from a Seq2SeqEncoder, with shape (batch_size, sequence_length, encoding_dim), this method returns the final hidden state for each element of the batch, giving a tensor of shape (batch_size, encoding_dim). This is not as simple as encoder_outputs[:, -1], because the sequences could have different lengths. We use the mask (which has shape (batch_size, sequence_length)) to find the final state for each batch instance.

Additionally, if bidirectional is True, we will split the final dimension of the encoder_outputs into two and assume that the first half is for the forward direction of the encoder and the second half is for the backward direction. We will concatenate the last state for each encoder dimension, giving encoder_outputs[:, -1, :encoding_dim/2] concated with encoder_outputs[:, 0, encoding_dim/2:].

allennlp.nn.util.get_lengths_from_binary_sequence_mask(mask: torch.Tensor)[source]

Compute sequence lengths for each batch element in a tensor using a binary mask.

mask : torch.Tensor, required.

A 2D binary mask of shape (batch_size, sequence_length) to calculate the per-batch sequence lengths from.

A torch.LongTensor of shape (batch_size,) representing the lengths
of the sequences in the batch.
allennlp.nn.util.get_mask_from_sequence_lengths(sequence_lengths: torch.Tensor, max_length: int) → torch.Tensor[source]

Given a variable of shape (batch_size,) that represents the sequence lengths of each batch element, this function returns a (batch_size, max_length) mask variable. For example, if our input was [2, 2, 3], with a max_length of 4, we’d return [[1, 1, 0, 0], [1, 1, 0, 0], [1, 1, 1, 0]].

We require max_length here instead of just computing it from the input sequence_lengths because it lets us avoid finding the max, then copying that value from the GPU to the CPU so that we can use it to construct a new tensor.

allennlp.nn.util.get_range_vector(size: int, device: int) → torch.Tensor[source]

Returns a range vector with the desired size, starting at 0. The CUDA implementation is meant to avoid copy data from CPU to GPU.

allennlp.nn.util.get_text_field_mask(text_field_tensors: typing.Dict[str, torch.Tensor], num_wrapping_dims: int = 0) → torch.LongTensor[source]

Takes the dictionary of tensors produced by a TextField and returns a mask with 0 where the tokens are padding, and 1 otherwise. We also handle TextFields wrapped by an arbitrary number of ListFields, where the number of wrapping ListFields is given by num_wrapping_dims.

If num_wrapping_dims == 0, the returned mask has shape (batch_size, num_tokens). If num_wrapping_dims > 0 then the returned mask has num_wrapping_dims extra dimensions, so the shape will be (batch_size, ..., num_tokens).

There could be several entries in the tensor dictionary with different shapes (e.g., one for word ids, one for character ids). In order to get a token mask, we use the tensor in the dictionary with the lowest number of dimensions. After subtracting num_wrapping_dims, if this tensor has two dimensions we assume it has shape (batch_size, ..., num_tokens), and use it for the mask. If instead it has three dimensions, we assume it has shape (batch_size, ..., num_tokens, num_features), and sum over the last dimension to produce the mask. Most frequently this will be a character id tensor, but it could also be a featurized representation of each token, etc.

TODO(joelgrus): can we change this? NOTE: Our functions for generating masks create torch.LongTensors, because using torch.ByteTensors makes it easy to run into overflow errors when doing mask manipulation, such as summing to get the lengths of sequences - see below. >>> mask = torch.ones([260]).byte() >>> mask.sum() # equals 260. >>> var_mask = torch.autograd.V(mask) >>> var_mask.sum() # equals 4, due to 8 bit precision - the sum overflows.

allennlp.nn.util.last_dim_log_softmax(tensor: torch.Tensor, mask: typing.Union[torch.Tensor, NoneType] = None) → torch.Tensor[source]

Takes a tensor with 3 or more dimensions and does a masked log softmax over the last dimension. We assume the tensor has shape (batch_size, ..., sequence_length) and that the mask (if given) has shape (batch_size, sequence_length).

allennlp.nn.util.last_dim_softmax(tensor: torch.Tensor, mask: typing.Union[torch.Tensor, NoneType] = None) → torch.Tensor[source]

Takes a tensor with 3 or more dimensions and does a masked softmax over the last dimension. We assume the tensor has shape (batch_size, ..., sequence_length) and that the mask (if given) has shape (batch_size, sequence_length).

allennlp.nn.util.logsumexp(tensor: torch.Tensor, dim: int = -1, keepdim: bool = False) → torch.Tensor[source]

A numerically stable computation of logsumexp. This is mathematically equivalent to tensor.exp().sum(dim, keep=keepdim).log(). This function is typically used for summing log probabilities.

tensor : torch.FloatTensor, required.

A tensor of arbitrary size.

dim : int, optional (default = -1)

The dimension of the tensor to apply the logsumexp to.

keepdim: bool, optional (default = False)

Whether to retain a dimension of size one at the dimension we reduce over.

allennlp.nn.util.masked_log_softmax(vector, mask)[source]

torch.nn.functional.log_softmax(vector) does not work if some elements of vector should be masked. This performs a log_softmax on just the non-masked portions of vector. Passing None in for the mask is also acceptable; you’ll just get a regular log_softmax.

We assume that both vector and mask (if given) have shape (batch_size, vector_dim).

In the case that the input vector is completely masked, the return value of this function is arbitrary, but not nan. You should be masking the result of whatever computation comes out of this in that case, anyway, so the specific values returned shouldn’t matter. Also, the way that we deal with this case relies on having single-precision floats; mixing half-precision floats with fully-masked vectors will likely give you nans.

If your logits are all extremely negative (i.e., the max value in your logit vector is -50 or lower), the way we handle masking here could mess you up. But if you’ve got logit values that extreme, you’ve got bigger problems than this.

allennlp.nn.util.masked_softmax(vector, mask)[source]

torch.nn.functional.softmax(vector) does not work if some elements of vector should be masked. This performs a softmax on just the non-masked portions of vector. Passing None in for the mask is also acceptable; you’ll just get a regular softmax.

We assume that both vector and mask (if given) have shape (batch_size, vector_dim).

In the case that the input vector is completely masked, this function returns an array of 0.0. This behavior may cause NaN if this is used as the last layer of a model that uses categorical cross-entropy loss.

allennlp.nn.util.remove_sentence_boundaries(tensor: torch.Tensor, mask: torch.Tensor) → typing.Tuple[torch.Tensor, torch.Tensor][source]

Remove begin/end of sentence embeddings from the batch of sentences. Given a batch of sentences with size (batch_size, timesteps, dim) this returns a tensor of shape (batch_size, timesteps - 2, dim) after removing the beginning and end sentence markers. The sentences are assumed to be padded on the right, with the beginning of each sentence assumed to occur at index 0 (i.e., mask[:, 0] is assumed to be 1).

Returns both the new tensor and updated mask.

This function is the inverse of add_sentence_boundary_token_ids.

tensor : torch.Tensor

A tensor of shape (batch_size, timesteps, dim)

mask : torch.Tensor

A tensor of shape (batch_size, timesteps)

tensor_without_boundary_tokens : torch.Tensor

The tensor after removing the boundary tokens of shape (batch_size, timesteps - 2, dim)

new_mask : torch.Tensor

The new mask for the tensor of shape (batch_size, timesteps - 2).

allennlp.nn.util.replace_masked_values(tensor: torch.Tensor, mask: torch.Tensor, replace_with: float) → torch.Tensor[source]

Replaces all masked values in tensor with replace_with. mask must be broadcastable to the same shape as tensor. We require that tensor.dim() == mask.dim(), as otherwise we won’t know which dimensions of the mask to unsqueeze.

allennlp.nn.util.sequence_cross_entropy_with_logits(logits: torch.FloatTensor, targets: torch.LongTensor, weights: torch.FloatTensor, batch_average: bool = True, label_smoothing: float = None) → torch.FloatTensor[source]

Computes the cross entropy loss of a sequence, weighted with respect to some user provided weights. Note that the weighting here is not the same as in the torch.nn.CrossEntropyLoss() criterion, which is weighting classes; here we are weighting the loss contribution from particular elements in the sequence. This allows loss computations for models which use padding.

logits : torch.FloatTensor, required.

A torch.FloatTensor of size (batch_size, sequence_length, num_classes) which contains the unnormalized probability for each class.

targets : torch.LongTensor, required.

A torch.LongTensor of size (batch, sequence_length) which contains the index of the true class for each corresponding step.

weights : torch.FloatTensor, required.

A torch.FloatTensor of size (batch, sequence_length)

batch_average : bool, optional, (default = True).

A bool indicating whether the loss should be averaged across the batch, or returned as a vector of losses per batch element.

label_smoothing : float, optional (default = None)

Whether or not to apply label smoothing to the cross-entropy loss. For example, with a label smoothing value of 0.2, a 4 class classifcation target would look like [0.05, 0.05, 0.85, 0.05] if the 3rd class was the correct label.

A torch.FloatTensor representing the cross entropy loss.
If ``batch_average == True``, the returned loss is a scalar.
If ``batch_average == False``, the returned loss is a vector of shape (batch_size,).
allennlp.nn.util.sort_batch_by_length(tensor: torch.Tensor, sequence_lengths: torch.Tensor)[source]

Sort a batch first tensor by some specified lengths.

tensor : torch.FloatTensor, required.

A batch first Pytorch tensor.

sequence_lengths : torch.LongTensor, required.

A tensor representing the lengths of some dimension of the tensor which we want to sort by.

sorted_tensor : torch.FloatTensor

The original tensor sorted along the batch dimension with respect to sequence_lengths.

sorted_sequence_lengths : torch.LongTensor

The original sequence_lengths sorted by decreasing size.

restoration_indices : torch.LongTensor

Indices into the sorted_tensor such that sorted_tensor.index_select(0, restoration_indices) == original_tensor

permuation_index : torch.LongTensor

The indices used to sort the tensor. This is useful if you want to sort many tensors using the same ordering.

allennlp.nn.util.viterbi_decode(tag_sequence: torch.Tensor, transition_matrix: torch.Tensor, tag_observations: typing.Union[typing.List[int], NoneType] = None)[source]

Perform Viterbi decoding in log space over a sequence given a transition matrix specifying pairwise (transition) potentials between tags and a matrix of shape (sequence_length, num_tags) specifying unary potentials for possible tags per timestep.

tag_sequence : torch.Tensor, required.

A tensor of shape (sequence_length, num_tags) representing scores for a set of tags over a given sequence.

transition_matrix : torch.Tensor, required.

A tensor of shape (num_tags, num_tags) representing the binary potentials for transitioning between a given pair of tags.

tag_observations : Optional[List[int]], optional, (default = None)

A list of length sequence_length containing the class ids of observed elements in the sequence, with unobserved elements being set to -1. Note that it is possible to provide evidence which results in degenerate labellings if the sequences of tags you provide as evidence cannot transition between each other, or those transitions are extremely unlikely. In this situation we log a warning, but the responsibility for providing self-consistent evidence ultimately lies with the user.

viterbi_path : List[int]

The tag indices of the maximum likelihood tag sequence.

viterbi_score : torch.Tensor

The score of the viterbi path.

allennlp.nn.util.weighted_sum(matrix: torch.Tensor, attention: torch.Tensor) → torch.Tensor[source]

Takes a matrix of vectors and a set of weights over the rows in the matrix (which we call an “attention” vector), and returns a weighted sum of the rows in the matrix. This is the typical computation performed after an attention mechanism.

Note that while we call this a “matrix” of vectors and an attention “vector”, we also handle higher-order tensors. We always sum over the second-to-last dimension of the “matrix”, and we assume that all dimensions in the “matrix” prior to the last dimension are matched in the “vector”. Non-matched dimensions in the “vector” must be directly after the batch dimension.

For example, say I have a “matrix” with dimensions (batch_size, num_queries, num_words, embedding_dim). The attention “vector” then must have at least those dimensions, and could have more. Both:

  • (batch_size, num_queries, num_words) (distribution over words for each query)
  • (batch_size, num_documents, num_queries, num_words) (distribution over words in a query for each document)

are valid input “vectors”, producing tensors of shape: (batch_size, num_queries, embedding_dim) and (batch_size, num_documents, num_queries, embedding_dim) respectively.