An attention module that computes the similarity between an input vector and the rows of a matrix.

class allennlp.modules.attention.Attention(similarity_function: allennlp.modules.similarity_functions.similarity_function.SimilarityFunction = None, normalize: bool = True) → None[source]

Bases: torch.nn.modules.module.Module

This Module takes two inputs: a (batched) vector and a matrix, plus an optional mask on the rows of the matrix. We compute the similarity between the vector and each row in the matrix, and then (optionally) perform a softmax over rows using those computed similarities.

By default similarity is computed with a dot product, but you can alternatively use a parameterized similarity function if you wish.


  • vector: shape (batch_size, embedding_dim)
  • matrix: shape (batch_size, num_rows, embedding_dim)
  • matrix_mask: shape (batch_size, num_rows), specifying which rows are just padding.


  • attention: shape (batch_size, num_rows).

similarity_function : SimilarityFunction, optional (default=``DotProductSimilarity``)

The similarity function to use when computing the attention.

normalize : bool, optional (default: True)

If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.

forward(vector: torch.FloatTensor, matrix: torch.FloatTensor, matrix_mask: torch.FloatTensor = None) → torch.FloatTensor[source]

Defines the computation performed at every call.

Should be overriden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_params(params: allennlp.common.params.Params) → allennlp.modules.attention.Attention[source]