An initializer is just a PyTorch function. Here we implement a proxy class that allows us to register them and supply any additional function arguments (for example, the mean and std of a normal initializer) as named arguments to the constructor.

The available initialization functions are

class allennlp.nn.initializers.Initializer[source]

Bases: allennlp.common.registrable.Registrable

An initializer is really just a bare pytorch function. This class is a proxy that allows us to implement Registerable for those functions.

default_implementation = 'normal'
classmethod from_params(params: allennlp.common.params.Params) → allennlp.nn.initializers.Initializer[source]
class allennlp.nn.initializers.InitializerApplicator(initializers: typing.List[typing.Tuple[str, allennlp.nn.initializers.Initializer]] = None, prevent_regexes: typing.List[str] = None) → None[source]

Bases: object

Applies initializers to the parameters of a Module based on regex matches. Any parameter not explicitly matching a regex will not be initialized, instead using whatever the default initialization was in the module’s code.

classmethod from_params(params: typing.Iterable[typing.Tuple[str, allennlp.common.params.Params]] = ()) → allennlp.nn.initializers.InitializerApplicator[source]

Converts a Params object into an InitializerApplicator. The json should be formatted as follows:

            "type": "normal"
            "mean": 0.01
            "std": 0.1
    ["parameter_regex_match2", "uniform"]
    ["prevent_init_regex", "prevent"]

where the first item in each tuple is the regex that matches to parameters, and the second item is a set of parameters that will be passed to Initialzer.from_params(). These values can either be strings, in which case they correspond to the names of initializers, or dictionaries, in which case they must contain the “type” key, corresponding to the name of an initializer. In addition, they may contain auxiliary named parameters which will be fed to the initializer itself. To determine valid auxiliary parameters, please refer to the torch.nn.init documentation. Only “prevent” is a special type which does not have corresponding initializer. Any parameter matching its corresponding regex will be overriden to NOT initialize.

An InitializerApplicator containing the specified initializers.
allennlp.nn.initializers.block_orthogonal(tensor: torch.Tensor, split_sizes: typing.List[int], gain: float = 1.0) → None[source]

An initializer which allows initializing model parameters in “blocks”. This is helpful in the case of recurrent models which use multiple gates applied to linear projections, which can be computed efficiently if they are concatenated together. However, they are separate parameters which should be initialized independently.

tensor : torch.Tensor, required.

A tensor to initialize.

split_sizes : List[int], required.

A list of length tensor.ndim() specifying the size of the blocks along that particular dimension. E.g. [10, 20] would result in the tensor being split into chunks of size 10 along the first dimension and 20 along the second.

gain : float, optional (default = 1.0)

The gain (scaling) applied to the orthogonal initialization.

allennlp.nn.initializers.lstm_hidden_bias(tensor: torch.Tensor) → None[source]

Initialize the biases of the forget gate to 1, and all other gates to 0, following Jozefowicz et al., An Empirical Exploration of Recurrent Network Architectures

allennlp.nn.initializers.uniform_unit_scaling(tensor: torch.Tensor, nonlinearity: str = 'linear')[source]

An initaliser which preserves output variance for approximately gaussian distributed inputs. This boils down to initialising layers using a uniform distribution in the range (-sqrt(3/dim[0]) * scale, sqrt(3 / dim[0]) * scale), where dim[0] is equal to the input dimension of the parameter and the scale is a constant scaling factor which depends on the non-linearity used.

See Random Walk Initialisation for Training Very Deep Feedforward Networks for more information.

tensor : torch.Tensor, required.

The tensor to initialise.

nonlinearity : str, optional (default = “linear”)

The non-linearity which is performed after the projection that this tensor is involved in. This must be the name of a function contained in the torch.nn.functional package.

The initialised tensor. torch.Tensor) → None[source]