In the example above, the overall task of putting a clean mug on a desk in an unfamiliar household is too complex for the model, leading to the failure of the iterative executor. While a plan-and-execute-style approach initially breaks down the task into three sub-tasks, it falls short in accounting for the complexity of finding a mug. Therefore, ADaPT employs its recursive structure to dynamically adapt to execution failures (assessed by LLMs), by further decomposing the complex sub-task of finding a mug via the planner.

Methodology

Broadly, we utilize separate planner and executor LLM modules within ADaPT. The executor performs low-level "atomic" skills specific to the environment by interacting with it iteratively and uses the LLM to self-assess whether the task was successfully completed. On the other hand, the planner breaks down complex tasks into smaller sub-tasks and generates logical operators to combine various sub-tasks to accomplish the task. We incorporate these modules into ADaPT via the controller which is a pre-determined and recursive algorithm.

The figure above shows the control flow of ADaPT. A complex task such as put a clean mug on the desk is first assigned to the executor. If the executor does not succeed, then ADaPT calls the planner to decompose the task into sub-tasks along with a logical operator ("And" or "Or" ) indicating how to compose them. Each sub-task (or step) is then assigned recursively to ADaPT and is combined using the logical operator. In the end, the success of sub-tasks after recursive decomposition ensures overall task success.

ADaPT yields the highest overall success rates

On three interactive decision-making tasks: ALFWorld (shown above on left), WebShop (shown above on right), and TextCraft (below), ADaPT improves the performance of GPT-3.5 over previous approaches such as ReAct, Plan-and-Execute by up to 28.3%, 27%, and 33% (absolute) points respectively. Compared to Reflexion, an adaptive approach that addresses failures in the full task trajectory, ADaPT yields higher success rates by 14.1% and 9% (absolute) points on ALFWorld and WebShop, respectively.

ADaPT dynamically accomodates executor capabilities

Using the ALFWorld dataset, we demonstrate that ADaPT dynamically adjusts to account for executor capabilities. On the left, we illustrate how the same planner module improves the performance of two different executor LLMs. On the right, we show that, even for the same underlying LLM, ADaPT enhances performance across different prompt settings, ranging from specialized task-specific executors that observe relevant gold trajectories to atomic executors capable only of performing atomic skills in the environment.

ADaPT dynamically accomodates task complexity

Lastly, ADaPT can accomodate to task complexity in WebShop as we increase the number of displayed products on the search page. On TextCraft dataset, we observe that the decomposition depth employed by ADaPT aligns with the inherent recipe depth (or complexity) of the underlying task.

ADaPT: As-Needed Decomposition and Planning with Language Models

Abstract

Methodology

ADaPT yields the highest overall success rates

ADaPT dynamically accomodates executor capabilities

ADaPT dynamically accomodates task complexity

BibTeX