allennlp.training.learning_rate_schedulers

AllenNLP uses most PyTorch learning rate schedulers, with a thin wrapper to allow registering them and instantiating them from_params.

The available learning rate schedulers from PyTorch are

In addition, AllenNLP also provides cosine with restarts, a Noam schedule, and a slanted triangular schedule, which are registered as “cosine”, “noam”, and “slanted_triangular”, respectively.

class allennlp.training.learning_rate_schedulers.learning_rate_scheduler.LearningRateScheduler(optimizer: torch.optim.optimizer.Optimizer, last_epoch: int = -1)[source]

Bases: allennlp.training.scheduler.Scheduler, allennlp.common.registrable.Registrable

classmethod from_params(optimizer:torch.optim.optimizer.Optimizer, params:allennlp.common.params.Params)[source]

This is the automatic implementation of from_params. Any class that subclasses FromParams (or Registrable, which itself subclasses FromParams) gets this implementation for free. If you want your class to be instantiated from params in the “obvious” way – pop off parameters and hand them to your constructor with the same names – this provides that functionality.

If you need more complex logic in your from from_params method, you’ll have to implement your own method that overrides this one.

get_values(self) → None[source]
class allennlp.training.learning_rate_schedulers.cosine.CosineWithRestarts(optimizer: torch.optim.optimizer.Optimizer, t_initial: int, t_mul: float = 1.0, eta_min: float = 0.0, eta_mul: float = 1.0, last_epoch: int = -1)[source]

Bases: allennlp.training.learning_rate_schedulers.learning_rate_scheduler.LearningRateScheduler

Cosine annealing with restarts.

This is described in the paper https://arxiv.org/abs/1608.03983. Note that early stopping should typically be avoided when using this schedule.

Parameters
optimizertorch.optim.Optimizer
t_initialint

The number of iterations within the first cycle.

t_mulfloat, optional (default=1)

Determines the number of iterations in the i-th decay cycle, which is the length of the last cycle multiplied by t_mul.

eta_minfloat, optional (default=0)

The minimum learning rate.

eta_mulfloat, optional (default=1)

Determines the initial learning rate for the i-th decay cycle, which is the last initial learning rate multiplied by m_mul.

last_epochint, optional (default=-1)

The index of the last epoch. This is used when restarting.

get_values(self)[source]

Get updated learning rate.

class allennlp.training.learning_rate_schedulers.noam.NoamLR(optimizer: torch.optim.optimizer.Optimizer, model_size: int, warmup_steps: int, factor: float = 1.0, last_epoch: int = -1)[source]

Bases: allennlp.training.learning_rate_schedulers.learning_rate_scheduler.LearningRateScheduler

Implements the Noam Learning rate schedule. This corresponds to increasing the learning rate linearly for the first warmup_steps training steps, and decreasing it thereafter proportionally to the inverse square root of the step number, scaled by the inverse square root of the dimensionality of the model. Time will tell if this is just madness or it’s actually important.

Parameters
model_sizeint, required.

The hidden size parameter which dominates the number of parameters in your model.

warmup_steps: ``int``, required.

The number of steps to linearly increase the learning rate.

factorfloat, optional (default = 1.0).

The overall scale factor for the learning rate decay.

get_values(self)[source]
step(self, metric:float=None, epoch:int=None) → None[source]
step_batch(self, batch_num_total:int=None) → None[source]

By default, a scheduler is assumed to only update every epoch, not every batch. So this does nothing unless it’s overriden.

class allennlp.training.learning_rate_schedulers.slanted_triangular.SlantedTriangular(optimizer: torch.optim.optimizer.Optimizer, num_epochs: int, num_steps_per_epoch: int, cut_frac: float = 0.1, ratio: int = 32, last_epoch: int = -1, gradual_unfreezing: bool = False, discriminative_fine_tuning: bool = False, decay_factor: float = 0.38)[source]

Bases: allennlp.training.learning_rate_schedulers.learning_rate_scheduler.LearningRateScheduler

Implements the Slanted Triangular Learning Rate schedule with optional gradual unfreezing. The schedule corresponds to first linearly increasing the learning rate and annealing the learning based on a fixed ratio.

If we gradually unfreeze, then in the first epoch of training, only the top layer is trained; in the second epoch, the top two layers are trained, etc. During freezing, the learning rate is increased and annealed over one epoch. After freezing finished, the learning rate is increased and annealed over the remaining training iterations.

Note that with this schedule, early stopping should typically be avoided.

Parameters
num_epochsint, required.

The total number of epochs for which the model should be trained.

num_steps_per_epoch: ``int``, required.

The number of steps (updates, batches) per training epoch.

cut_frac: ``float``, optional (default = 0.1).

The fraction of the steps to increase the learning rate.

ratio: ``float``, optional (default = 32).

The ratio of the smallest to the (largest) base learning rate.

gradual_unfreezing: ``bool``, optional (default = False).

Whether gradual unfreezing should be used.

discriminative_fine_tuning: ``bool``, optional (default = False).

Whether discriminative fine-tuning (different learning rates per layer) are used.

decay_factor: ``float``, optional (default = 0.38).

The decay factor by which the learning rate is reduced with discriminative fine-tuning when going a layer deeper.

get_values(self)[source]
step(self, metric:float=None, epoch:int=None) → None[source]
step_batch(self, batch_num_total:int=None)[source]

By default, a scheduler is assumed to only update every epoch, not every batch. So this does nothing unless it’s overriden.