AllenNLP just uses PyTorch optimizers , with a thin wrapper to allow registering them and instantiating them from_params.

The available optimizers are

class, lr=0.001, betas=(0.9, 0.999), eps=1e-08)[source]

Bases: torch.optim.optimizer.Optimizer

NOTE: This class has been copied verbatim from the separate Dense and Sparse versions of Adam in Pytorch.

Implements Adam algorithm with dense & sparse gradients. It has been proposed in Adam: A Method for Stochastic Optimization.

params : iterable

iterable of parameters to optimize or dicts defining parameter groups

lr : float, optional (default: 1e-3)

The learning rate.

betas : Tuple[float, float], optional (default: (0.9, 0.999))

coefficients used for computing running averages of gradient and its square.

eps : float, optional, (default: 1e-8)

A term added to the denominator to improve numerical stability.


Performs a single optimization step.

closure : callable, optional.

A closure that reevaluates the model and returns the loss.


Bases: allennlp.common.registrable.Registrable

This class just allows us to implement Registrable for Pytorch Optimizers.

default_implementation = 'adam'
classmethod from_params(model_parameters: typing.List, params: allennlp.common.params.Params)[source]