allennlp.modules.matrix_attention

class allennlp.modules.matrix_attention.matrix_attention.MatrixAttention[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

MatrixAttention takes two matrices as input and returns a matrix of attentions.

We compute the similarity between each row in each matrix and return unnormalized similarity scores. Because these scores are unnormalized, we don’t take a mask as input; it’s up to the caller to deal with masking properly when this output is used.

Input:
  • matrix_1: (batch_size, num_rows_1, embedding_dim_1)
  • matrix_2: (batch_size, num_rows_2, embedding_dim_2)
Output:
  • (batch_size, num_rows_1, num_rows_2)
forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.matrix_attention.bilinear_matrix_attention.BilinearMatrixAttention(matrix_1_dim: int, matrix_2_dim: int, activation: allennlp.nn.activations.Activation = None, use_input_biases: bool = False) → None[source]

Bases: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention

Computes attention between two matrices using a bilinear attention function. This function has a matrix of weights W and a bias b, and the similarity between the two matrices X and Y is computed as X W Y^T + b.

Parameters:
matrix_1_dim : int

The dimension of the matrix X, described above. This is X.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.

matrix_2_dim : int

The dimension of the matrix Y, described above. This is Y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly.

activation : Activation, optional (default=linear (i.e. no activation))

An activation function applied after the X W Y^T + b calculation. Default is no activation.

use_input_biases : bool, optional (default = False)

If True, we add biases to the inputs such that the final computation is equivelent to the original bilinear matrix multiplication plus a projection of both inputs.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
reset_parameters()[source]
class allennlp.modules.matrix_attention.cosine_matrix_attention.CosineMatrixAttention[source]

Bases: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention

Computes attention between every entry in matrix_1 with every entry in matrix_2 using cosine similarity.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.matrix_attention.dot_product_matrix_attention.DotProductMatrixAttention[source]

Bases: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention

Computes attention between every entry in matrix_1 with every entry in matrix_2 using a dot product.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.matrix_attention.linear_matrix_attention.LinearMatrixAttention(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = None) → None[source]

Bases: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention

This MatrixAttention takes two matrices as input and returns a matrix of attentions by performing a dot product between a vector of weights and some combination of the two input matrices, followed by an (optional) activation function. The combination used is configurable.

If the two vectors are x and y, we allow the following kinds of combinations: x, y, x*y, x+y, x-y, x/y, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might give x,y,x*y as the combination parameter to this class. The computed similarity function would then be w^T [x; y; x*y] + b, where w is a vector of weights, b is a bias parameter, and [;] is vector concatenation.

Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.

Parameters:
tensor_1_dim : int

The dimension of the first tensor, x, described above. This is x.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.

tensor_2_dim : int

The dimension of the second tensor, y, described above. This is y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly.

combination : str, optional (default=”x,y”)

Described above.

activation : Activation, optional (default=linear (i.e. no activation))

An activation function applied after the w^T * [x;y] + b calculation. Default is no activation.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
reset_parameters()[source]
class allennlp.modules.matrix_attention.legacy_matrix_attention.LegacyMatrixAttention(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None) → None[source]

Bases: allennlp.modules.matrix_attention.matrix_attention.MatrixAttention

The legacy implementation of MatrixAttention.

It should be considered deprecated as it uses much more memory than the newer specialized MatrixAttention modules.

Parameters:
similarity_function: ``SimilarityFunction``, optional (default=``DotProductSimilarity``)

The similarity function to use when computing the attention.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]