allennlp.modules.openai_transformer

An implementation of the OpenAI Transformer Language Model.

Mostly just a slightly modified version of https://github.com/huggingface/pytorch-openai-transformer-lm so thanks to them!

Some of these modules duplicate code elsewhere in AllenNLP, but the serialized weights depend on the exact parameter setup here, so it’s easiest to just reimplement them.

class allennlp.modules.openai_transformer.Attention(nx: int, n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False) → None[source]

Bases: torch.nn.modules.module.Module

forward(x: torch.Tensor) → torch.Tensor[source]
merge_heads(x: torch.Tensor)[source]
split_heads(x: torch.Tensor, k: bool = False)[source]
class allennlp.modules.openai_transformer.Block(n_ctx: int, config: allennlp.modules.openai_transformer.TransformerConfig, scale: bool = False) → None[source]

Bases: torch.nn.modules.module.Module

forward(x: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.openai_transformer.Conv1D(nf: int, rf: int, nx: int) → None[source]

Bases: torch.nn.modules.module.Module

forward(x: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.openai_transformer.LayerNorm(n_state, e=1e-05)[source]

Bases: torch.nn.modules.module.Module

Construct a layernorm module in the OpenAI style (epsilon inside the square root).

forward(x)[source]
class allennlp.modules.openai_transformer.MLP(n_state: int, config: allennlp.modules.openai_transformer.TransformerConfig) → None[source]

Bases: torch.nn.modules.module.Module

forward(x: torch.Tensor) → torch.Tensor[source]
class allennlp.modules.openai_transformer.OpenaiTransformer(vocab_size: int = 40478, n_ctx: int = 512, embedding_dim: int = 768, num_heads: int = 12, num_layers: int = 12, embedding_dropout_probability: float = 0.1, attention_dropout_probability: float = 0.1, residual_dropout_probability: float = 0.1, activation_function: str = 'gelu', model_path: str = None, requires_grad: bool = False, n_special: int = -1) → None[source]

Bases: torch.nn.modules.module.Module, allennlp.common.from_params.FromParams

Openai transformer, as per https://blog.openai.com/language-unsupervised/. Default parameters are the ones for their pretrained model.

Parameters:
vocab_size: ``int`` (optional, default: 40478)

The size of the vocabulary (number of byte pair embeddings) excluding the n_special embeddings (if any), and the positional embeddings.

n_ctx: ``int`` (optional, default: 512)

The number of positional encodings to use for evaluation.

embedding_dim: ``int`` (optional, default: 768)

The dimension of the output embeddings.

num_heads: ``int`` (optional, default: 12)

How many “heads” the attention has.

num_layers: ``int`` (optional, default: 12)

How many layers of “blocks” the transformer has.

embedding_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for the embedding.

attention_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for attention.

residual_dropout_probability: ``float`` (optional, default: 0.1)

Dropout for residual

activation_function: ``str`` (optional, default: ``’gelu’``)

Activation function for the multi-layer perceptron.

model_path: ``str`` (optional, default: ``None``)

A tar.gz file containing serialized model weights. If supplied, the weights will be loaded from that file.

requires_grad: ``bool`` (optional, default: ``False``)

If true, the transformer will be fine-tuneable.

n_special: ``int`` (optional, default: ``-1``)

The number of special tokens added to the byte pair vocabulary (via OpenaiTransformerBytePairIndexer).

dump_weights(output_dir: str, num_pieces: int = 10) → None[source]
forward(x: torch.Tensor) → typing.List[torch.Tensor][source]
load_weights(transformer_model_path: str, n_ctx: int = -1, n_special: int = -1, n_transfer: int = 12, n_embd: int = 768, names: typing.List[str] = ['model/we:0', 'model/h0/attn/c_attn/w:0', 'model/h0/attn/c_attn/b:0', 'model/h0/attn/c_proj/w:0', 'model/h0/attn/c_proj/b:0', 'model/h0/ln_1/g:0', 'model/h0/ln_1/b:0', 'model/h0/mlp/c_fc/w:0', 'model/h0/mlp/c_fc/b:0', 'model/h0/mlp/c_proj/w:0', 'model/h0/mlp/c_proj/b:0', 'model/h0/ln_2/g:0', 'model/h0/ln_2/b:0', 'model/h1/attn/c_attn/w:0', 'model/h1/attn/c_attn/b:0', 'model/h1/attn/c_proj/w:0', 'model/h1/attn/c_proj/b:0', 'model/h1/ln_1/g:0', 'model/h1/ln_1/b:0', 'model/h1/mlp/c_fc/w:0', 'model/h1/mlp/c_fc/b:0', 'model/h1/mlp/c_proj/w:0', 'model/h1/mlp/c_proj/b:0', 'model/h1/ln_2/g:0', 'model/h1/ln_2/b:0', 'model/h2/attn/c_attn/w:0', 'model/h2/attn/c_attn/b:0', 'model/h2/attn/c_proj/w:0', 'model/h2/attn/c_proj/b:0', 'model/h2/ln_1/g:0', 'model/h2/ln_1/b:0', 'model/h2/mlp/c_fc/w:0', 'model/h2/mlp/c_fc/b:0', 'model/h2/mlp/c_proj/w:0', 'model/h2/mlp/c_proj/b:0', 'model/h2/ln_2/g:0', 'model/h2/ln_2/b:0', 'model/h3/attn/c_attn/w:0', 'model/h3/attn/c_attn/b:0', 'model/h3/attn/c_proj/w:0', 'model/h3/attn/c_proj/b:0', 'model/h3/ln_1/g:0', 'model/h3/ln_1/b:0', 'model/h3/mlp/c_fc/w:0', 'model/h3/mlp/c_fc/b:0', 'model/h3/mlp/c_proj/w:0', 'model/h3/mlp/c_proj/b:0', 'model/h3/ln_2/g:0', 'model/h3/ln_2/b:0', 'model/h4/attn/c_attn/w:0', 'model/h4/attn/c_attn/b:0', 'model/h4/attn/c_proj/w:0', 'model/h4/attn/c_proj/b:0', 'model/h4/ln_1/g:0', 'model/h4/ln_1/b:0', 'model/h4/mlp/c_fc/w:0', 'model/h4/mlp/c_fc/b:0', 'model/h4/mlp/c_proj/w:0', 'model/h4/mlp/c_proj/b:0', 'model/h4/ln_2/g:0', 'model/h4/ln_2/b:0', 'model/h5/attn/c_attn/w:0', 'model/h5/attn/c_attn/b:0', 'model/h5/attn/c_proj/w:0', 'model/h5/attn/c_proj/b:0', 'model/h5/ln_1/g:0', 'model/h5/ln_1/b:0', 'model/h5/mlp/c_fc/w:0', 'model/h5/mlp/c_fc/b:0', 'model/h5/mlp/c_proj/w:0', 'model/h5/mlp/c_proj/b:0', 'model/h5/ln_2/g:0', 'model/h5/ln_2/b:0', 'model/h6/attn/c_attn/w:0', 'model/h6/attn/c_attn/b:0', 'model/h6/attn/c_proj/w:0', 'model/h6/attn/c_proj/b:0', 'model/h6/ln_1/g:0', 'model/h6/ln_1/b:0', 'model/h6/mlp/c_fc/w:0', 'model/h6/mlp/c_fc/b:0', 'model/h6/mlp/c_proj/w:0', 'model/h6/mlp/c_proj/b:0', 'model/h6/ln_2/g:0', 'model/h6/ln_2/b:0', 'model/h7/attn/c_attn/w:0', 'model/h7/attn/c_attn/b:0', 'model/h7/attn/c_proj/w:0', 'model/h7/attn/c_proj/b:0', 'model/h7/ln_1/g:0', 'model/h7/ln_1/b:0', 'model/h7/mlp/c_fc/w:0', 'model/h7/mlp/c_fc/b:0', 'model/h7/mlp/c_proj/w:0', 'model/h7/mlp/c_proj/b:0', 'model/h7/ln_2/g:0', 'model/h7/ln_2/b:0', 'model/h8/attn/c_attn/w:0', 'model/h8/attn/c_attn/b:0', 'model/h8/attn/c_proj/w:0', 'model/h8/attn/c_proj/b:0', 'model/h8/ln_1/g:0', 'model/h8/ln_1/b:0', 'model/h8/mlp/c_fc/w:0', 'model/h8/mlp/c_fc/b:0', 'model/h8/mlp/c_proj/w:0', 'model/h8/mlp/c_proj/b:0', 'model/h8/ln_2/g:0', 'model/h8/ln_2/b:0', 'model/h9/attn/c_attn/w:0', 'model/h9/attn/c_attn/b:0', 'model/h9/attn/c_proj/w:0', 'model/h9/attn/c_proj/b:0', 'model/h9/ln_1/g:0', 'model/h9/ln_1/b:0', 'model/h9/mlp/c_fc/w:0', 'model/h9/mlp/c_fc/b:0', 'model/h9/mlp/c_proj/w:0', 'model/h9/mlp/c_proj/b:0', 'model/h9/ln_2/g:0', 'model/h9/ln_2/b:0', 'model/h10/attn/c_attn/w:0', 'model/h10/attn/c_attn/b:0', 'model/h10/attn/c_proj/w:0', 'model/h10/attn/c_proj/b:0', 'model/h10/ln_1/g:0', 'model/h10/ln_1/b:0', 'model/h10/mlp/c_fc/w:0', 'model/h10/mlp/c_fc/b:0', 'model/h10/mlp/c_proj/w:0', 'model/h10/mlp/c_proj/b:0', 'model/h10/ln_2/g:0', 'model/h10/ln_2/b:0', 'model/h11/attn/c_attn/w:0', 'model/h11/attn/c_attn/b:0', 'model/h11/attn/c_proj/w:0', 'model/h11/attn/c_proj/b:0', 'model/h11/ln_1/g:0', 'model/h11/ln_1/b:0', 'model/h11/mlp/c_fc/w:0', 'model/h11/mlp/c_fc/b:0', 'model/h11/mlp/c_proj/w:0', 'model/h11/mlp/c_proj/b:0', 'model/h11/ln_2/g:0', 'model/h11/ln_2/b:0', 'model/clf/w:0', 'model/clf/b:0']) → None[source]
class allennlp.modules.openai_transformer.TransformerConfig[source]

Bases: tuple

The transformer has to pass a bunch of params to its submodules, this bundles them together to make things easier.

activation_function

Alias for field number 5

attention_dropout_probability

Alias for field number 3

embedding_dim

Alias for field number 0

embedding_dropout_probability

Alias for field number 2

num_heads

Alias for field number 1

residual_dropout_probability

Alias for field number 4

allennlp.modules.openai_transformer.gelu(x: torch.Tensor) → torch.Tensor[source]
allennlp.modules.openai_transformer.swish(x: torch.Tensor) → torch.Tensor[source]