RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

April 14, 2024 | 911 words | 5min read

Paper Title: RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Link to Paper: https://arxiv.org/abs/2307.04738
Date: 10. July 2023
Paper Type: NLP, LLM, Robots
Short Abstract:
The goal of this paper is to improve multi-robot collaboration trough harnessing the power of LLM. For that they equip the robots with a LLM to discuss their task and form strategies. The LLMs form strategies through the generation of sub-tasks, which are then transformed to space waypoints. The space waypoints are used by motion planner to generate trajectories for the robot arms.

1. Introduction

Multi-robot system, such as multiple robots working at a assembly line, are interesting for their promise of enhancing productivity. But they have multiple challenges to overcome:

High understanding of the task, in order to be able to split the task up.
Understanding the capabilities of the robots, such as reach or payload.
Low-level motion planning, finding collision free paths.
Generality, most of the times multi-robot systems need task-specific engineering.

The zero-shot method of the author called RoCo, consist of three components:

Dialogue-style task-coordination: They let the robots ’talk’ among themselves, by assigning to each robot a LLM. This should help for task reasoning.
Feedback-improved Sub-task Plan Generated by LLMs: The dialogue of the LLMs ends with a sub-task plan for each agent, they use validation methods to check for the validity of the sub task and give the LLM feedback.
LLM-informed Motion Planning in Joint Space: From the validated sub-task, they use RRT-sampler to plan motion trajectories.

Furthermore they introduce RoCoBench, a benchmark which test the robots on 6 multi-robot manipulation tasks.

2. Preliminaries

Task Assumptions:

Cooperative multi task agent enviroment.
$N$ Robots.
$T$ finite time horizon.
$O$ Observation space.
An agent $n$ has observation space $\Omega^n \subset O$.
Description function $f$ that translates task semantics and observations at time step $t$ to natural language prompts: $l_{t}^{n}=f^{n}(o_t), o_t \in \Omega^{n}$.

Multi-arm Path Planning:

$X \in \mathbb{R}^d$ is the joint configuration space of all $N$ Robot arms.
$X_{ob}$ the obstacle region in the configuration space.
$X_{free} = X \backslash X_{ob}$ collision free space.
The motion planner finds a optimal $\sigma^{*} : [0, 1] \to X$ with $\sigma^{*}(0) = X_{init}$ and $\sigma^{*}(1) = X_{goal}$.
$\sigma^{*}$ is then used by the robot arms.

3. Multi-Robot Collaboration with LLMs

3.1 Multi-Agent Dialog via LLMs

Before, each environment interaction, the robot arms will do an round of dialog where each robot has a LLM assigned to it, which receives information and responds to it.

Each agent gets the same LLM prompt structure, but with different content:

Task Context: Describes the objectives of the task.
Round History: Past Dialogue and executed actions.
Agent capabilities: The Agent skills.
Communication Instructions: How to responds to the other agents.
Current Observation: What the agent is currently ‘seeing’.
Plan Feedback:(optional) Reasons why a sub-task plan failed.

Each agent is asked to end the response with either deciding to 1) continue the discussion or 2) summarize everyone actions and make a final proposal. The second option is only allowed if every agent responded at least once.

3.2 LLM-Generated Sub-task Plan

After the discussion ends, the agent needs to summarize the results and make a ‘sub-task plan’, where each agent gets a sub-task(e.g. pick up a object) and a 3D waypoint. Before execution the sub-task plans are validated, if a check fails the feedback is appended to the agent prompt an another round of discussion starts. Following validation have to be passed:

Task parsing plan follows the desired format.
Task Constrains check if the plan complies with the robots capabilities.
IK checks whether a robot arm position is feasible via iinverse kinemtics.
Collision Checking check if the robot arm position will cause a collision.
Valid Waypoints if a task requires path planning, each intermediate waypoints must pass IK and Collision Check.

A task is considered to be failed, if a maximum amount of discussion was reached.

3.3 LLM-informed Motion Planning

Once all validation have been passed , they are combined with IK to produce a final goal configuration for all robot arms. The goal is than passed to an RRT-based multi arm motion planner that plans for all arms and outputs motion trajectories for each arm.

4. Benchmarks

The RoCoBench benchmark consist of 6 multi-robot collaboration in a tabletop setting. Each task has three key properties:

Task decomposition whether a task can be decomposed into sub-tasks.
Observation space: how much of the task is shared.
Workspace overlap: proximity between robots.

5. Experiments

They evaluate following methods:

RoCo: Previous described Method
In ‘Central Plan’ the LLM-agent is given full knowledge of the enviroment.
In ‘Dialog w/o History’ dialogue from past round is removed.
In ‘Dialog w/o Feedback’ failed plans are discarded and agent try again without feedback.

LLM-proposed 3D waypoints show no clear benefit for picking sub-taks, but accelerate planning.

RoCo is strongly adaptable in:

Object Initialization: Randomizing objects.
Task Goal: Changing the goal.
Robot Capability: Changing of capabilities.

They validate RoCo in a real World Setup, where the robot arm needs to collaborate with a human to complete the task. For perception they use a pre-trained object detection model, OWL-ViT, to generate scene descriptions.

6. Multi-Agent Reasoning Dataset

In addition to their man result, they introduce a text-based benchmark called, RoCoBench-Text, to evaluate an LLM’s agent reasoning ability.

7. Conclusion

Limitation:

RoCo assumes object detection is accurate.
Motion trajectories can lead to errors.
Querying LLMs is expensive.

RoCo is a new framework for collaboration of multiple robots with each other to solve tasks.

reply via email