An attention module that computes the similarity between an input vector and the rows of a matrix.

class allennlp.modules.attention.Attention(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None, normalize: bool = True) → None[source]

Bases: torch.nn.modules.module.Module

This Module takes two inputs: a (batched) vector and a matrix, plus an optional mask on the rows of the matrix. We compute the similarity between the vector and each row in the matrix, and then (optionally) perform a softmax over rows using those computed similarities.

By default similarity is computed with a dot product, but you can alternatively use a parameterized similarity function if you wish.


  • vector: shape (batch_size, embedding_dim)
  • matrix: shape (batch_size, num_rows, embedding_dim)
  • matrix_mask: shape (batch_size, num_rows), specifying which rows are just padding.


  • attention: shape (batch_size, num_rows).

similarity_function : SimilarityFunction, optional (default=``DotProductSimilarity``)

The similarity function to use when computing the attention.

normalize : bool, optional (default: True)

If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.

forward(vector: torch.FloatTensor, matrix: torch.FloatTensor, matrix_mask: torch.FloatTensor = None) → torch.FloatTensor[source]

Defines the computation performed at every call.

Should be overriden by all subclasses.

classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.attention.Attention[source]