allennlp.nn.util¶
Assorted utilities for working with neural networks in AllenNLP.

allennlp.nn.util.
arrays_to_variables
(data_structure: typing.Dict[str, typing.Union[dict, numpy.ndarray]], cuda_device: int = 1, add_batch_dimension: bool = False, for_training: bool = True)[source]¶ Convert an (optionally) nested dictionary of arrays to Pytorch
Variables
, suitable for use in a computation graph.Parameters: data_structure : Dict[str, Union[dict, numpy.ndarray]], required.
The nested dictionary of arrays to convert to Pytorch
Variables
.cuda_device : int, optional (default = 1)
If cuda_device <= 0, GPUs are available and Pytorch was compiled with CUDA support, the tensor will be copied to the cuda_device specified.
add_batch_dimension : bool, optional (default = False).
Optionally add a batch dimension to tensors converted to
Variables
using this function. This is useful during inference for passing tensors representing a single example to a Pytorch model which would otherwise not have a batch dimension.for_training :
bool
, optional (default =True
)If
False
, we will pass thevolatile=True
flag when constructing variables, which disables gradient computations in the graph. This makes inference more efficient (particularly in memory usage), but is incompatible with training models.Returns: The original data structure or tensor converted to a Pytorch
Variable
.

allennlp.nn.util.
combine_tensors
(combination: str, tensors: typing.List[torch.FloatTensor]) → torch.FloatTensor[source]¶ Combines a list of tensors using elementwise operations and concatenation, specified by a
combination
string. The string refers to (1indexed) positions in the input tensor list, and looks like"1,2,1+2,31"
.We allow the following kinds of combinations:
x
,x*y
,x+y
,xy
, andx/y
, wherex
andy
are positive integers less than or equal tolen(tensors)
. Each of the binary operations is performed elementwise. You can give as many combinations as you want in thecombination
string. For example, for the input string"1,2,1*2"
, the result would be[1;2;1*2]
, as you would expect, where[;]
is concatenation along the last dimension.If you have a fixed, known way to combine tensors that you use in a model, you should probably just use something like
torch.cat([x_tensor, y_tensor, x_tensor * y_tensor])
. This function adds some complexity that is only necessary if you want the specific combination used to be configurable.If you want to do any elementwise operations, the tensors involved in each elementwise operation must have the same shape.
This function also accepts
x
andy
in place of1
and2
in the combination string.

allennlp.nn.util.
device_mapping
(cuda_device: int)[source]¶ In order to torch.load() a GPUtrained model onto a CPU (or specific GPU), you have to supply a map_location function. Call this with the desired cuda_device to get the function that torch.load() needs.

allennlp.nn.util.
get_combined_dim
(combination: str, tensor_dims: typing.List[int]) → int[source]¶ For use with
combine_tensors()
. This function computes the resultant dimension when callingcombine_tensors(combination, tensors)
, when the tensor dimension is known. This is necessary for knowing the sizes of weight matrices when building models that usecombine_tensors
.Parameters: combination :
str
A commaseparated list of combination pieces, like
"1,2,1*2"
, specified identically tocombination
incombine_tensors()
.tensor_dims :
List[int]
A list of tensor dimensions, where each dimension is from the last axis of the tensors that will be input to
combine_tensors()
.

allennlp.nn.util.
get_dropout_mask
(dropout_probability: float, tensor_for_masking: torch.autograd.variable.Variable)[source]¶ Computes and returns an elementwise dropout mask for a given tensor, where each element in the mask is dropped out with probability dropout_probability. Note that the mask is NOT applied to the tensor  the tensor is passed to retain the correct CUDA tensor type for the mask.
Parameters: dropout_probability : float, required.
Probability of dropping a dimension of the input.
tensor_for_masking : torch.Variable, required.
Returns: A torch.FloatTensor consisting of the binary mask scaled by 1/ (1  dropout_probability).
This scaling ensures expected values and variances of the output of applying this mask
and the original tensor are the same.

allennlp.nn.util.
get_lengths_from_binary_sequence_mask
(mask: torch.FloatTensor)[source]¶ Compute sequence lengths for each batch element in a tensor using a binary mask.
Parameters: mask : torch.Tensor, required.
A 2D binary mask of shape (batch_size, sequence_length) to calculate the perbatch sequence lengths from.
Returns: A torch.LongTensor of shape (batch_size,) representing the lengths
of the sequences in the batch.

allennlp.nn.util.
get_text_field_mask
(text_field_tensors: typing.Dict[str, torch.FloatTensor]) → torch.LongTensor[source]¶ Takes the dictionary of tensors produced by a
TextField
and returns a mask of shape(batch_size, num_tokens)
. This mask will be 0 where the tokens are padding, and 1 otherwise.There could be several entries in the tensor dictionary with different shapes (e.g., one for word ids, one for character ids). In order to get a token mask, we assume that the tensor in the dictionary with the lowest number of dimensions has plain token ids. This allows us to also handle cases where the input is actually a
ListField[TextField]
.NOTE: Our functions for generating masks create torch.LongTensors, because using torch.byteTensors inside Variables makes it easy to run into overflow errors when doing mask manipulation, such as summing to get the lengths of sequences  see below. >>> mask = torch.ones([260]).byte() >>> mask.sum() # equals 260. >>> var_mask = torch.autograd.Variable(mask) >>> var_mask.sum() # equals 4, due to 8 bit precision  the sum overflows.

allennlp.nn.util.
last_dim_softmax
(tensor: torch.FloatTensor, mask: typing.Union[torch.FloatTensor, NoneType] = None) → torch.FloatTensor[source]¶ Takes a tensor with 3 or more dimensions and does a masked softmax over the last dimension. We assume the tensor has shape
(batch_size, ..., sequence_length)
and that the mask (if given) has shape(batch_size, sequence_length)
. We first unsqueeze and expand the mask so that it has the same shape as the tensor, then flatten them both to be 2D, pass them throughmasked_softmax()
, then put the tensor back in its original shape.

allennlp.nn.util.
logsumexp
(tensor: torch.FloatTensor, dim: int = 1, keepdim: bool = False) → torch.FloatTensor[source]¶ A numerically stable computation of logsumexp. This is mathematically equivalent to tensor.exp().sum(dim, keep=keepdim).log(). This function is typically used for summing log probabilities.
Parameters: tensor : torch.FloatTensor, required.
A tensor of arbitrary size.
dim : int, optional (default = 1)
The dimension of the tensor to apply the logsumexp to.
keepdim: bool, optional (default = False)
Whether to retain a dimension of size one at the dimension we reduce over.

allennlp.nn.util.
masked_log_softmax
(vector, mask)[source]¶ torch.nn.functional.log_softmax(vector)
does not work if some elements ofvector
should be masked. This performs a log_softmax on just the nonmasked portions ofvector
. PassingNone
in for the mask is also acceptable; you’ll just get a regular log_softmax.We assume that both
vector
andmask
(if given) have shape(batch_size, vector_dim)
.In the case that the input vector is completely masked, this function returns an array of
0.0
. You should be masking the result of whatever computation comes out of this in that case, anyway, so it shouldn’t matter.

allennlp.nn.util.
masked_softmax
(vector, mask)[source]¶ torch.nn.functional.softmax(vector)
does not work if some elements ofvector
should be masked. This performs a softmax on just the nonmasked portions ofvector
. PassingNone
in for the mask is also acceptable; you’ll just get a regular softmax.We assume that both
vector
andmask
(if given) have shape(batch_size, vector_dim)
.In the case that the input vector is completely masked, this function returns an array of
0.0
. This behavior may causeNaN
if this is used as the last layer of a model that uses categorical crossentropy loss.

allennlp.nn.util.
ones_like
(tensor: torch.FloatTensor) → torch.FloatTensor[source]¶ Use clone() + fill_() to make sure that a ones tensor ends up on the right device at runtime.

allennlp.nn.util.
replace_masked_values
(tensor: torch.autograd.variable.Variable, mask: torch.autograd.variable.Variable, replace_with: float) → torch.autograd.variable.Variable[source]¶ Replaces all masked values in
tensor
withreplace_with
.mask
must be broadcastable to the same shape astensor
. We require thattensor.dim() == mask.dim()
, as otherwise we won’t know which dimensions of the mask to unsqueeze.

allennlp.nn.util.
sequence_cross_entropy_with_logits
(logits: torch.FloatTensor, targets: torch.LongTensor, weights: torch.FloatTensor, batch_average: bool = True) → torch.FloatTensor[source]¶ Computes the cross entropy loss of a sequence, weighted with respect to some user provided weights. Note that the weighting here is not the same as in the
torch.nn.CrossEntropyLoss()
criterion, which is weighting classes; here we are weighting the loss contribution from particular elements in the sequence. This allows loss computations for models which use padding.Parameters: logits :
torch.FloatTensor
, required.A
torch.FloatTensor
of size (batch_size, sequence_length, num_classes) which contains the unnormalized probability for each class.targets :
torch.LongTensor
, required.A
torch.LongTensor
of size (batch, sequence_length) which contains the index of the true class for each corresponding step.weights :
torch.FloatTensor
, required.A
torch.FloatTensor
of size (batch, sequence_length)batch_average : bool, optional, (default = True).
A bool indicating whether the loss should be averaged across the batch, or returned as a vector of losses per batch element.
Returns: A torch.FloatTensor representing the cross entropy loss.
If
batch_average == True
, the returned loss is a scalar.If
batch_average == False
, the returned loss is a vector of shape (batch_size,).

allennlp.nn.util.
sort_batch_by_length
(tensor: torch.autograd.variable.Variable, sequence_lengths: torch.autograd.variable.Variable)[source]¶ Sort a batch first tensor by some specified lengths.
Parameters: tensor : Variable(torch.FloatTensor), required.
A batch first Pytorch tensor.
sequence_lengths : Variable(torch.LongTensor), required.
A tensor representing the lengths of some dimension of the tensor which we want to sort by.
Returns: sorted_tensor : Variable(torch.FloatTensor)
The original tensor sorted along the batch dimension with respect to sequence_lengths.
sorted_sequence_lengths : Variable(torch.LongTensor)
The original sequence_lengths sorted by decreasing size.
restoration_indices : Variable(torch.LongTensor)
Indices into the sorted_tensor such that
sorted_tensor.index_select(0, restoration_indices) == original_tensor

allennlp.nn.util.
viterbi_decode
(tag_sequence: torch.FloatTensor, transition_matrix: torch.FloatTensor, tag_observations: typing.Union[typing.List[int], NoneType] = None)[source]¶ Perform Viterbi decoding in log space over a sequence given a transition matrix specifying pairwise (transition) potentials between tags and a matrix of shape (sequence_length, num_tags) specifying unary potentials for possible tags per timestep.
Parameters: tag_sequence : torch.Tensor, required.
A tensor of shape (sequence_length, num_tags) representing scores for a set of tags over a given sequence.
transition_matrix : torch.Tensor, required.
A tensor of shape (num_tags, num_tags) representing the binary potentials for transitioning between a given pair of tags.
tag_observations : Optional[List[int]], optional, (default = None)
A list of length
sequence_length
containing the class ids of observed elements in the sequence, with unobserved elements being set to 1. Note that it is possible to provide evidence which results in degenerate labellings if the sequences of tags you provide as evidence cannot transition between each other, or those transitions are extremely unlikely. In this situation we log a warning, but the responsibility for providing selfconsistent evidence ultimately lies with the user.Returns: viterbi_path : List[int]
The tag indices of the maximum likelihood tag sequence.
viterbi_score : float
The score of the viterbi path.

allennlp.nn.util.
weighted_sum
(matrix: torch.FloatTensor, attention: torch.FloatTensor) → torch.FloatTensor[source]¶ Takes a matrix of vectors and a set of weights over the rows in the matrix (which we call an “attention” vector), and returns a weighted sum of the rows in the matrix. This is the typical computation performed after an attention mechanism.
Note that while we call this a “matrix” of vectors and an attention “vector”, we also handle higherorder tensors. We always sum over the secondtolast dimension of the “matrix”, and we assume that all dimensions in the “matrix” prior to the last dimension are matched in the “vector”. Nonmatched dimensions in the “vector” must be directly after the batch dimension.
For example, say I have a “matrix” with dimensions
(batch_size, num_queries, num_words, embedding_dim)
. The attention “vector” then must have at least those dimensions, and could have more. Both:(batch_size, num_queries, num_words)
(distribution over words for each query)(batch_size, num_documents, num_queries, num_words)
(distribution over words in a query for each document)
are valid input “vectors”, producing tensors of shape:
(batch_size, num_queries, embedding_dim)
and(batch_size, num_documents, num_queries, embedding_dim)
respectively.