allennlp.training.learning_rate_schedulers

AllenNLP uses most PyTorch learning rate schedulers, with a thin wrapper to allow registering them and instantiating them from_params.

The available learning rate schedulers from PyTorch are

In addition, AllenNLP also provides a Noam schedule and cosine with restarts, which are registered as “noam” and “cosine”, respectively.

class allennlp.training.learning_rate_schedulers.CosineWithRestarts(optimizer: torch.optim.optimizer.Optimizer, t_initial: int, t_mul: float = 1.0, eta_min: float = 0.0, eta_mul: float = 1.0, last_epoch: int = -1) → None[source]

Bases: torch.optim.lr_scheduler._LRScheduler

Cosine annealing with restarts.

This is decribed in the paper https://arxiv.org/abs/1608.03983.

Parameters:
optimizer : torch.optim.Optimizer
t_initial : int

The number of iterations within the first cycle.

t_mul : float, optional (default=1)

Determines the number of iterations in the i-th decay cycle, which is the length of the last cycle multiplied by t_mul.

eta_min : float, optional (default=0)

The minimum learning rate.

eta_mul : float, optional (default=1)

Determines the initial learning rate for the i-th decay cycle, which is the last initial learning rate multiplied by m_mul.

last_epoch : int, optional (default=-1)

The index of the last epoch. This is used when restarting.

get_lr()[source]

Get updated learning rate.

class allennlp.training.learning_rate_schedulers.LearningRateScheduler(lr_scheduler) → None[source]

Bases: allennlp.common.registrable.Registrable

This class just allows us to implement Registrable for Pytorch LRSchedulers.

classmethod from_params(optimizer: torch.optim.optimizer.Optimizer, params: allennlp.common.params.Params)[source]
step(metric: float, epoch: typing.Union[int, NoneType] = None)[source]
step_batch(batch_num_total: typing.Union[int, NoneType])[source]
class allennlp.training.learning_rate_schedulers.LearningRateWithMetricsWrapper(lr_scheduler: torch.optim.lr_scheduler.ReduceLROnPlateau) → None[source]

Bases: allennlp.training.learning_rate_schedulers.LearningRateScheduler

A wrapper around learning rate schedulers that require metrics, At the moment there is only a single instance of this lrs. It is the ReduceLROnPlateau

step(metric: float, epoch: typing.Union[int, NoneType] = None)[source]
class allennlp.training.learning_rate_schedulers.LearningRateWithoutMetricsWrapper(lr_scheduler: torch.optim.lr_scheduler._LRScheduler) → None[source]

Bases: allennlp.training.learning_rate_schedulers.LearningRateScheduler

A wrapper around learning rate schedulers that do not require metrics

step(metric: float, epoch: typing.Union[int, NoneType] = None)[source]
class allennlp.training.learning_rate_schedulers.NoamLR(optimizer: torch.optim.optimizer.Optimizer, model_size: int, warmup_steps: int, factor: float = 1.0, last_epoch: int = -1) → None[source]

Bases: torch.optim.lr_scheduler._LRScheduler

Implements the Noam Learning rate schedule. This corresponds to increasing the learning rate linearly for the first warmup_steps training steps, and decreasing it thereafter proportionally to the inverse square root of the step number, scaled by the inverse square root of the dimensionality of the model. Time will tell if this is just madness or it’s actually important.

Parameters:
model_size : int, required.

The hidden size parameter which dominates the number of parameters in your model.

warmup_steps: ``int``, required.

The number of steps to linearly increase the learning rate.

factor : float, optional (default = 1.0).

The overall scale factor for the learning rate decay.

get_lr()[source]
step(epoch=None)[source]
step_batch(epoch=None)[source]
class allennlp.training.learning_rate_schedulers.SlantedTriangular(optimizer: torch.optim.optimizer.Optimizer, num_epochs: int, num_steps_per_epoch: int, cut_frac: float = 0.1, ratio: int = 32, last_epoch: int = -1, gradual_unfreezing: bool = False, discriminative_fine_tuning: bool = False, decay_factor: float = 0.38) → None[source]

Bases: torch.optim.lr_scheduler._LRScheduler

Implements the Slanted Triangular Learning Rate schedule with optional gradual unfreezing. The schedule corresponds to first linearly increasing the learning rate and annealing the learning based on a fixed ratio.

If we gradually unfreeze, then in the first epoch of training, only the top layer is trained; in the second epoch, the top two layers are trained, etc. During freezing, the learning rate is increased and annealed over one epoch. After freezing finished, the learning rate is increased and annealed over the remaining training iterations.

Note that with this schedule, early stopping should typically be avoided.

Parameters:
num_epochs : int, required.

The total number of epochs for which the model should be trained.

num_steps_per_epoch: ``int``, required.

The number of steps (updates, batches) per training epoch.

cut_frac: ``float``, optional (default = 0.1).

The fraction of the steps to increase the learning rate.

ratio: ``float``, optional (default = 32).

The ratio of the smallest to the (largest) base learning rate.

gradual_unfreezing: ``bool``, optional (default = False).

Whether gradual unfreezing should be used.

discriminative_fine_tuning: ``bool``, optional (default = False).

Whether discriminative fine-tuning (different learning rates per layer) are used.

decay_factor: ``float``, optional (default = 0.38).

The decay factor by which the learning rate is reduced with discriminative fine-tuning when going a layer deeper.

get_lr()[source]
step(epoch=None)[source]
step_batch(epoch=None)[source]