# allennlp.modules.matrix_attention¶

class allennlp.modules.matrix_attention.matrix_attention.MatrixAttention[source]

Bases: torch.nn.modules.module.Module, allennlp.common.registrable.Registrable

MatrixAttention takes two matrices as input and returns a matrix of attentions.

We compute the similarity between each row in each matrix and return unnormalized similarity scores. Because these scores are unnormalized, we don’t take a mask as input; it’s up to the caller to deal with masking properly when this output is used.

Input:
• matrix_1: (batch_size, num_rows_1, embedding_dim_1)
• matrix_2: (batch_size, num_rows_2, embedding_dim_2)
Output:
• (batch_size, num_rows_1, num_rows_2)
forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.matrix_attention.matrix_attention.MatrixAttention[source]
class allennlp.modules.matrix_attention.bilinear_matrix_attention.BilinearMatrixAttention(matrix_1_dim: int, matrix_2_dim: int, activation: allennlp.nn.activations.Activation) → None[source]

Computes attention between two matrices using a bilinear attention function. This function has a matrix of weights W and a bias b, and the similarity between the two matrices X and Y is computed as X W Y^T + b.

Parameters: matrix_1_dim : int The dimension of the matrix X, described above. This is X.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly. matrix_2_dim : int The dimension of the matrix Y, described above. This is Y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build the weight matrix correctly. activation : Activation, optional (default=linear (i.e. no activation)) An activation function applied after the X W Y^T + b calculation. Default is no activation.
forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params)[source]
reset_parameters()[source]
class allennlp.modules.matrix_attention.cosine_matrix_attention.CosineMatrixAttention[source]

Computes attention between every entry in matrix_1 with every entry in matrix_2 using cosine similarity.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params)[source]
class allennlp.modules.matrix_attention.dot_product_matrix_attention.DotProductMatrixAttention[source]

Computes attention between every entry in matrix_1 with every entry in matrix_2 using a dot product.

forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params)[source]
class allennlp.modules.matrix_attention.linear_matrix_attention.LinearMatrixAttention(tensor_1_dim: int, tensor_2_dim: int, combination: str = 'x, y', activation: allennlp.nn.activations.Activation = <function <lambda>.<locals>.<lambda>>) → None[source]

This MatrixAttention takes two matrices as input and returns a matrix of attentions by performing a dot product between a vector of weights and some combination of the two input matrices, followed by an (optional) activation function. The combination used is configurable.

If the two vectors are x and y, we allow the following kinds of combinations: x, y, x*y, x+y, x-y, x/y, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might give x,y,x*y as the combination parameter to this class. The computed similarity function would then be w^T [x; y; x*y] + b, where w is a vector of weights, b is a bias parameter, and [;] is vector concatenation.

Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.

Parameters: tensor_1_dim : int The dimension of the first tensor, x, described above. This is x.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly. tensor_2_dim : int The dimension of the second tensor, y, described above. This is y.size()[-1] - the length of the vector that will go into the similarity computation. We need this so we can build weight vectors correctly. combination : str, optional (default=”x,y”) Described above. activation : Activation, optional (default=linear (i.e. no activation)) An activation function applied after the w^T * [x;y] + b calculation. Default is no activation.
forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.matrix_attention.linear_matrix_attention.LinearMatrixAttention[source]
reset_parameters()[source]
class allennlp.modules.matrix_attention.legacy_matrix_attention.LegacyMatrixAttention(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None) → None[source]

The legacy implementation of MatrixAttention.

It should be considered deprecated as it uses much more memory than the newer specialized MatrixAttention modules.

Parameters: similarity_function: SimilarityFunction, optional (default=DotProductSimilarity) The similarity function to use when computing the attention.
forward(matrix_1: torch.Tensor, matrix_2: torch.Tensor) → torch.Tensor[source]
classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.matrix_attention.matrix_attention.MatrixAttention[source]