allennlp.interpret.attackers

class allennlp.interpret.attackers.attacker.Attacker(predictor: allennlp.predictors.predictor.Predictor)[source]

Bases: allennlp.common.registrable.Registrable

An Attacker will modify an input (e.g., add or delete tokens) to try to change an AllenNLP Predictor’s output in a desired manner (e.g., make it incorrect).

attack_from_json(self, inputs:Dict[str, Any], input_field_to_attack:str, grad_input_field:str, ignore_tokens:List[str]) → Dict[str, Any][source]

This function finds a modification to the input text that would change the model’s prediction in some desired manner (e.g., an adversarial attack).

Parameters
inputsJsonDict

The input you want to attack (the same as the argument to a Predictor, e.g., predict_json()).

input_field_to_attackstr

The key in the inputs JsonDict you want to attack, e.g., tokens.

grad_input_fieldstr

The field in the gradients dictionary that contains the input gradients. For example, grad_input_1 will be the field for single input tasks. See get_gradients() in Predictor for more information on field names.

Returns
reduced_inputJsonDict

Contains the final, sanitized input after adversarial modification.

initialize(self)[source]

Initializes any components of the Attacker that are expensive to compute, so that they are not created on __init__(). Default implementation is pass.

class allennlp.interpret.attackers.hotflip.Hotflip(predictor: allennlp.predictors.predictor.Predictor)[source]

Bases: allennlp.interpret.attackers.attacker.Attacker

Runs the HotFlip style attack at the word-level https://arxiv.org/abs/1712.06751. We use the first-order taylor approximation described in https://arxiv.org/abs/1903.06620, in the function _first_order_taylor(). Constructing this object is expensive due to the construction of the embedding matrix.

attack_from_json(self, inputs:Dict[str, Any]=None, input_field_to_attack:str='tokens', grad_input_field:str='grad_input_1', ignore_tokens:List[str]=None) → Dict[str, Any][source]

Replaces one token at a time from the input until the model’s prediction changes. input_field_to_attack is for example tokens, it says what the input field is called. grad_input_field is for example grad_input_1, which is a key into a grads dictionary.

The method computes the gradient w.r.t. the tokens, finds the token with the maximum gradient (by L2 norm), and replaces it with another token based on the first-order Taylor approximation of the loss. This process is iteratively repeated until the prediction changes. Once a token is replaced, it is not flipped again.

initialize(self)[source]

Call this function before running attack_from_json(). We put the call to _construct_embedding_matrix() in this function to prevent a large amount of compute being done when __init__() is called.

class allennlp.interpret.attackers.input_reduction.InputReduction(predictor: allennlp.predictors.predictor.Predictor)[source]

Bases: allennlp.interpret.attackers.attacker.Attacker

Runs the input reduction method from Pathologies of Neural Models Make Interpretations Difficult, which removes as many words as possible from the input without changing the model’s prediction.

The functions on this class handle a special case for NER by looking for a field called “tags” This check is brittle, i.e., the code could break if the name of this field has changed, or if a non-NER model has a field called “tags”.

attack_from_json(self, inputs:Dict[str, Any]=None, input_field_to_attack:str='tokens', grad_input_field:str='grad_input_1', ignore_tokens:List[str]=None)[source]

This function finds a modification to the input text that would change the model’s prediction in some desired manner (e.g., an adversarial attack).

Parameters
inputsJsonDict

The input you want to attack (the same as the argument to a Predictor, e.g., predict_json()).

input_field_to_attackstr

The key in the inputs JsonDict you want to attack, e.g., tokens.

grad_input_fieldstr

The field in the gradients dictionary that contains the input gradients. For example, grad_input_1 will be the field for single input tasks. See get_gradients() in Predictor for more information on field names.

Returns
reduced_inputJsonDict

Contains the final, sanitized input after adversarial modification.

allennlp.interpret.attackers.utils.get_fields_to_compare(inputs:Dict[str, Any], instance:allennlp.data.instance.Instance, input_field_to_attack:str) → Dict[str, Any][source]

Gets a list of the fields that should be checked for equality after an attack is performed.

Parameters
inputsJsonDict

The input you want to attack, similar to the argument to a Predictor, e.g., predict_json().

instanceInstance

A labeled instance that is output from json_to_labeled_instances().

input_field_to_attackstr

The key in the inputs JsonDict you want to attack, e.g., tokens.

Returns
fieldsJsonDict

The fields that must be compared for equality.