# allennlp.semparse.worlds¶

exception allennlp.semparse.worlds.world.ExecutionError(message)[source]

Bases: Exception

This exception gets raised when you’re trying to execute a logical form that your executor does not understand. This may be because your logical form contains a function with an invalid name or a set of arguments whose types do not match those that the fuction expects.

exception allennlp.semparse.worlds.world.ParsingError(message)[source]

Bases: Exception

This exception gets raised when there is a parsing error during logical form processing. This might happen because you’re not handling the full set of possible logical forms, for instance, and having this error provides a consistent way to catch those errors and log how frequently this occurs.

class allennlp.semparse.worlds.world.World(constant_type_prefixes: typing.Dict[str, nltk.sem.logic.BasicType] = None, global_type_signatures: typing.Dict[str, nltk.sem.logic.Type] = None, global_name_mapping: typing.Dict[str, str] = None, num_nested_lambdas: int = 0) → None[source]

Bases: object

Base class for defining a world in a new domain. This class defines a method to translate a logical form as per a naming convention that works with NLTK’s LogicParser. The sub-classes can decide on the convention by overriding the _map_name method that does token level mapping. This class also defines methods for transforming logical form strings into parsed Expressions, and Expressions into action sequences.

Parameters: constant_type_prefixes : Dict[str, BasicType] (optional) If you have an unbounded number of constants in your domain, you are required to add prefixes to their names to denote their types. This is the mapping from prefixes to types. global_type_signatures : Dict[str, Type] (optional) A mapping from translated names to their types. global_name_mapping : Dict[str, str] (optional) A name mapping from the original names in the domain to the translated names. num_nested_lambdas : int (optional) Does the language used in this World permit lambda expressions? And if so, how many nested lambdas do we need to worry about? This is important when considering the space of all possible actions, which we need to enumerate a priori for the parser.
all_possible_actions() → typing.List[str][source]
get_action_sequence(expression: nltk.sem.logic.Expression) → typing.List[str][source]

Returns the sequence of actions (as strings) that resulted in the given expression.

get_basic_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of basic types (types of entities) in the world.

get_logical_form(action_sequence: typing.List[str], add_var_function: bool = True) → str[source]

Takes an action sequence and constructs a logical form from it. This is useful if you want to get a logical form from a decoded sequence of actions generated by a transition based semantic parser.

Parameters: action_sequence : List[str] The sequence of actions as strings (eg.: ['{START_SYMBOL} -> t', 't -> ', ...]). add_var_function : bool (optional) var is a special function that some languages use within lambda functions to indicate the use of a variable (eg.: (lambda x (fb:row.row.year (var x)))). Due to the way constrained decoding is currently implemented, it is easier for the decoder to not produce these functions. In that case, setting this flag adds the function in the logical form even though it is not present in the action sequence.
get_multi_match_mapping() → typing.Dict[nltk.sem.logic.Type, typing.List[nltk.sem.logic.Type]][source]

Returns a mapping from each MultiMatchNamedBasicType to all the NamedBasicTypes that it matches.

get_name_mapping() → typing.Dict[str, str][source]
get_paths_to_root(action: str, max_path_length: int = 20, beam_size: int = 30, max_num_paths: int = 10) → typing.List[typing.List[str]][source]

For a given action, returns at most max_num_paths paths to the root (production with START_SYMBOL) that are not longer than max_path_length.

get_type_signatures() → typing.Dict[str, str][source]
get_valid_actions() → typing.Dict[str, typing.List[str]][source]
get_valid_starting_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of all types t, such that actions {START_SYMBOL} -> t are valid. In other words, these are all the possible types of complete logical forms in this world.

is_terminal(symbol: str) → bool[source]

This function will be called on nodes of a logical form tree, which are either non-terminal symbols that can be expanded or terminal symbols that must be leaf nodes. Returns True if the given symbol is a terminal symbol.

parse_logical_form(logical_form: str, remove_var_function: bool = True) → nltk.sem.logic.Expression[source]

Takes a logical form as a string, maps its tokens using the mapping and returns a parsed expression.

Parameters: logical_form : str Logical form to parse remove_var_function : bool (optional) var is a special function that some languages use within lambda founctions to indicate the usage of a variable. If your language uses it, and you do not want to include it in the parsed expression, set this flag. You may want to do this if you are generating an action sequence from this parsed expression, because it is easier to let the decoder not produce this function due to the way constrained decoding is currently implemented.
allennlp.semparse.worlds.world.nltk_tree_to_logical_form(tree: nltk.tree.Tree) → str[source]

Given an nltk.Tree representing the syntax tree that generates a logical form, this method produces the actual (lisp-like) logical form, with all of the non-terminal symbols converted into the correct number of parentheses.

We store all the information related to a world (i.e. the context in which logical forms will be executed) here. For WikiTableQuestions, this includes a representation of a table, mapping from Sempre variables in all logical forms to NLTK variables, and the types of all predicates and entities.

class allennlp.semparse.worlds.wikitables_world.WikiTablesWorld(table_graph: allennlp.semparse.contexts.table_question_knowledge_graph.TableQuestionKnowledgeGraph) → None[source]

World representation for the WikitableQuestions domain.

Parameters: table_graph : TableQuestionKnowledgeGraph Context associated with this world.
curried_functions = {<n,<n,<#1,<<#2,#1>,#1>>>>: 4, <#1,<#1,#1>>: 2, <n,<n,<n,d>>>: 3, <n,<n,n>>: 2}
get_agenda()[source]
get_basic_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of basic types (types of entities) in the world.

get_valid_actions() → typing.Dict[str, typing.List[str]][source]
get_valid_starting_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of all types t, such that actions {START_SYMBOL} -> t are valid. In other words, these are all the possible types of complete logical forms in this world.

is_table_entity(entity_name: str) → bool[source]

Returns True if the given entity is one of the entities in the table.

We store the information related to context sensitive execution of logical forms here. We assume that the logical forms are written in the variable-free language described in the paper ‘Memory Augmented Policy Optimization for Program Synthesis with Generalization’ by Liang et al. The language is the main difference between this class and WikiTablesWorld. Also, this class defines an executor for the variable-free logical forms.

class allennlp.semparse.worlds.wikitables_variable_free_world.WikiTablesVariableFreeWorld(table_context: allennlp.semparse.contexts.table_question_context.TableQuestionContext) → None[source]

World representation for the WikitableQuestions domain with the variable-free language used in the paper from Liang et al. (2018).

Parameters: table_graph : TableQuestionKnowledgeGraph Context associated with this world.
curried_functions = {<r,<g,s>>: 2, <r,<g,r>>: 2, <r,<c,r>>: 2, <r,<f,n>>: 2, <r,<f,<n,r>>>: 3, <r,<m,<d,r>>>: 3, <r,<t,<s,r>>>: 3, <r,<r,<f,n>>>: 3}
evaluate_logical_form(logical_form: str, target_list: typing.List[str]) → bool[source]

Takes a logical forms and a list of target values as strings from the original lisp representation of instances, and returns True iff the logical form executes to those values.

execute(logical_form: str) → typing.Union[typing.List[str], int][source]
get_agenda()[source]
get_basic_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of basic types (types of entities) in the world.

get_valid_starting_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of all types t, such that actions {START_SYMBOL} -> t are valid. In other words, these are all the possible types of complete logical forms in this world.

static is_instance_specific_entity(entity_name: str) → bool[source]

Instance specific entities are column names, strings and numbers. Returns True if the entity is one of those.

This module defines classes Object and Box (the two entities in the NLVR domain) and an NlvrWorld, which mainly contains an execution method and related helper methods.

class allennlp.semparse.worlds.nlvr_world.NlvrWorld(world_representation: typing.List[typing.List[typing.Dict[str, typing.Any]]]) → None[source]

Class defining the world representation of NLVR. Defines an execution logic for logical forms in NLVR. We just take the structured_rep from the JSON file to initialize this.

Parameters: world_representation : JsonDict structured_rep from the JSON file.
curried_functions = {<b,<c,b>>: 2, <b,<s,b>>: 2, <b,<e,b>>: 2, <o,<c,t>>: 2, <o,<s,t>>: 2, <b,<e,t>>: 2, <o,<e,t>>: 2}
execute(logical_form: str) → bool[source]
get_agenda_for_sentence(sentence: str, add_paths_to_agenda: bool = False) → typing.List[str][source]

Given a sentence, returns a list of actions the sentence triggers as an agenda. The agenda can be used while by a parser to guide the decoder. sequences as possible. This is a simplistic mapping at this point, and can be expanded.

Parameters: sentence : str The sentence for which an agenda will be produced. add_paths_to_agenda : bool , optional If set, the agenda will also include nonterminal productions that lead to the terminals from the root node (default = False).
get_basic_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of basic types (types of entities) in the world.

get_valid_starting_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of all types t, such that actions {START_SYMBOL} -> t are valid. In other words, these are all the possible types of complete logical forms in this world.

class allennlp.semparse.worlds.nlvr_box.Box(objects_list: typing.List[typing.Dict[str, typing.Any]], box_id: int) → None[source]

Bases: object

This class represents each box containing objects in NLVR.

Parameters: objects_list : List[JsonDict] List of objects in the box, as given by the json file. box_id : int An integer identifying the box index (0, 1 or 2).
class allennlp.semparse.worlds.nlvr_object.Object(attributes: typing.Dict[str, typing.Any], box_id: str) → None[source]

Bases: object

Objects are the geometric shapes in the NLVR domain. They have values for attributes shape, color, x_loc, y_loc and size. We take a dict read from the JSON file and store it here, and define a get method for getting the attribute values. We need this to be hashable because need to make sets of Objects during execution, which get passed around between functions.

Parameters: attributes : JsonDict The dict for each object from the json file.
class allennlp.semparse.worlds.atis_world.AtisWorld(utterances: typing.List[str], tokenizer: allennlp.data.tokenizers.tokenizer.Tokenizer = None) → None[source]

Bases: object

World representation for the Atis SQL domain. This class has a SqlTableContext which holds the base grammar, it then augments this grammar by constraining each column to the values that are allowed in it.

Parameters: utterances: List[str] A list of utterances in the interaction, the last element in this list is the current utterance that we are interested in. tokenizer: Tokenizer, optional (default=WordTokenizer()) We use this tokenizer to tokenize the utterances.
add_dates_to_number_linking_scores(number_linking_scores: typing.Dict[str, typing.Tuple[[str, str], typing.List[int]]], current_tokenized_utterance: typing.List[allennlp.data.tokenizers.token.Token]) → None[source]
add_to_number_linking_scores(all_numbers: typing.Set[str], number_linking_scores: typing.Dict[str, typing.Tuple[[str, str], typing.List[int]]], get_number_linking_dict: typing.Callable[[str, typing.List[allennlp.data.tokenizers.token.Token]], typing.Dict[str, typing.List[int]]], current_tokenized_utterance: typing.List[allennlp.data.tokenizers.token.Token], nonterminal: str) → None[source]

This is a helper method for adding different types of numbers (eg. starting time ranges) as entities. We first go through all utterances in the interaction and find the numbers of a certain type and add them to the set all_numbers, which is initialized with default values. We want to add all numbers that occur in the interaction, and not just the current turn because the query could contain numbers that were triggered before the current turn. For each entity, we then check if it is triggered by tokens in the current utterance and construct the linking score.

all_possible_actions() → typing.List[str][source]

Return a sorted list of strings representing all possible actions of the form: nonterminal -> [right_hand_side]

database_file = 'https://s3-us-west-2.amazonaws.com/allennlp/datasets/atis/atis.db'
get_action_sequence(query: str) → typing.List[str][source]
get_valid_actions() → typing.Dict[str, typing.List[str]][source]
sql_table_context = None
allennlp.semparse.worlds.atis_world.get_strings_from_utterance(tokenized_utterance: typing.List[allennlp.data.tokenizers.token.Token]) → typing.Dict[str, typing.List[int]][source]

Based on the current utterance, return a dictionary where the keys are the strings in the database that map to lists of the token indices that they are linked to.

class allennlp.semparse.worlds.text2sql_world.Text2SqlWorld(schema_path: str, cursor: sqlite3.Cursor = None, use_prelinked_entities: bool = True, variable_free: bool = True, use_untyped_entities: bool = False) → None[source]

Bases: object

World representation for any of the Text2Sql datasets.

Parameters: schema_path: str A path to a schema file which we read into a dictionary representing the SQL tables in the dataset, the keys are the names of the tables that map to lists of the table’s column names. cursor : Cursor, optional (default = None) An optional cursor for a database, which is used to add database values to the grammar. use_prelinked_entities : bool, (default = True) Whether or not to use the pre-linked entities from the text2sql data. We take this parameter here because it effects whether we need to add table values to the grammar. variable_free : bool, optional (default = True) Denotes whether the data being parsed by the grammar is variable free. If it is, the grammar is modified to be less expressive by removing elements which are not necessary if the data is variable free. use_untyped_entities : bool, optional (default = False) Whether or not to try to infer the types of prelinked variables. If not, they are added as untyped values to the grammar instead.
get_action_sequence_and_all_actions(query: typing.List[str] = None, prelinked_entities: typing.Dict[str, typing.Dict[str, str]] = None) → typing.Tuple[typing.List[str], typing.List[str]][source]
is_global_rule(production_rule: str) → bool[source]

This module defines QuarelWorld, with a simple domain theory for reasoning about qualitative relations.

class allennlp.semparse.worlds.quarel_world.QuarelWorld(table_graph: allennlp.semparse.contexts.knowledge_graph.KnowledgeGraph, syntax: str, qr_coeff_sets: typing.List[typing.Dict[str, int]] = None) → None[source]

Class defining the QuaRel domain theory world.

execute(lf_raw: str) → int[source]

Very basic model for executing friction logical forms. For now returns answer index (or -1 if no answer can be concluded)

get_basic_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of basic types (types of entities) in the world.

get_valid_starting_types() → typing.Set[nltk.sem.logic.Type][source]

Returns the set of all types t, such that actions {START_SYMBOL} -> t are valid. In other words, these are all the possible types of complete logical forms in this world.

is_table_entity(entity_name: str) → bool[source]

Returns True if the given entity is one of the entities in the table.

qr_coeff_sets_default = [{'friction': 1, 'speed': -1, 'smoothness': -1, 'distance': -1, 'heat': 1}, {'speed': 1, 'time': -1}, {'speed': 1, 'distance': 1}, {'time': 1, 'distance': 1}, {'weight': 1, 'acceleration': -1}, {'strength': 1, 'distance': 1}, {'strength': 1, 'thickness': 1}, {'mass': 1, 'gravity': 1}, {'flexibility': 1, 'breakability': -1}, {'distance': 1, 'loudness': -1, 'brightness': -1, 'apparentSize': -1}, {'exerciseIntensity': 1, 'amountSweat': 1}]
qr_size = {'higher': 1, 'high': 1, 'low': -1, 'lower': -1}