RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Published: April 14, 2024
Paper Title: RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Link to Paper: https://arxiv.org/abs/2307.04738
Date: 10. July 2023
Paper Type: NLP, LLM, Robots
Short Abstract:
The goal of this paper is to improve multi-robot collaboration trough harnessing the power of LLM. For that they equip the robots with a LLM to discuss their task and form strategies. The LLMs form strategies through the generation of sub-tasks, which are then transformed to space waypoints. The space waypoints are used by motion planner to generate trajectories for the robot arms.
1. Introduction
Multi-robot system, such as multiple robots working at a assembly line, are interesting for their promise of enhancing productivity. But they have multiple challenges to overcome:
- High understanding of the task, in order to be able to split the task up.
- Understanding the capabilities of the robots, such as reach or payload.
- Low-level motion planning, finding collision free paths.
- Generality, most of the times multi-robot systems need task-specific engineering.
The zero-shot method of the author called RoCo, consist of three components:
- Dialogue-style task-coordination: They let the robots ’talk’ among themselves, by assigning to each robot a LLM. This should help for task reasoning.
- Feedback-improved Sub-task Plan Generated by LLMs: The dialogue of the LLMs ends with a sub-task plan for each agent, they use validation methods to check for the validity of the sub task and give the LLM feedback.
- LLM-informed Motion Planning in Joint Space: From the validated sub-task, they use RRT-sampler to plan motion trajectories.
Furthermore they introduce RoCoBench, a benchmark which test the robots on 6 multi-robot manipulation tasks.
2. Preliminaries
Task Assumptions:
- Cooperative multi task agent enviroment.
- \(N\) Robots.
- \(T\) finite time horizon.
- \(O\) Observation space.
- An agent \(n\) has observation space \(\Omega^n \subset O\).
- Description function \(f\) that translates task semantics and observations at time step \(t\) to natural language prompts: \(l_{t}^{n}=f^{n}(o_t), o_t \in \Omega^{n}\).
Multi-arm Path Planning:
- \(X \in \mathbb{R}^d\) is the joint configuration space of all \(N\) Robot arms.
- \(X_{ob}\) the obstacle region in the configuration space.
- \(X_{free} = X \backslash X_{ob}\) collision free space.
- The motion planner finds a optimal $\sigma^{*} : [0, 1] \to X$ with \(\sigma^{*}(0) = X_{init}\) and \(\sigma^{*}(1) = X_{goal}\).
- \(\sigma^{*}\) is then used by the robot arms.
3. Multi-Robot Collaboration with LLMs
3.1 Multi-Agent Dialog via LLMs
Before, each environment interaction, the robot arms will do an round of dialog where each robot has a LLM assigned to it, which receives information and responds to it.
Each agent gets the same LLM prompt structure, but with different content:
- Task Context: Describes the objectives of the task.
- Round History: Past Dialogue and executed actions.
- Agent capabilities: The Agent skills.
- Communication Instructions: How to responds to the other agents.
- Current Observation: What the agent is currently ‘seeing’.
- Plan Feedback:(optional) Reasons why a sub-task plan failed.
Each agent is asked to end the response with either deciding to 1) continue the discussion or 2) summarize everyone actions and make a final proposal. The second option is only allowed if every agent responded at least once.
3.2 LLM-Generated Sub-task Plan
After the discussion ends, the agent needs to summarize the results and make a ‘sub-task plan’, where each agent gets a sub-task(e.g. pick up a object) and a 3D waypoint. Before execution the sub-task plans are validated, if a check fails the feedback is appended to the agent prompt an another round of discussion starts. Following validation have to be passed:
- Task parsing plan follows the desired format.
- Task Constrains check if the plan complies with the robots capabilities.
- IK checks whether a robot arm position is feasible via iinverse kinemtics.
- Collision Checking check if the robot arm position will cause a collision.
- Valid Waypoints if a task requires path planning, each intermediate waypoints must pass IK and Collision Check.
A task is considered to be failed, if a maximum amount of discussion was reached.
3.3 LLM-informed Motion Planning
Once all validation have been passed , they are combined with IK to produce a final goal configuration for all robot arms. The goal is than passed to an RRT-based multi arm motion planner that plans for all arms and outputs motion trajectories for each arm.
4. Benchmarks
The RoCoBench benchmark consist of 6 multi-robot collaboration in a tabletop setting. Each task has three key properties:
- Task decomposition whether a task can be decomposed into sub-tasks.
- Observation space: how much of the task is shared.
- Workspace overlap: proximity between robots.
5. Experiments
They evaluate following methods:
- RoCo: Previous described Method
- In ‘Central Plan’ the LLM-agent is given full knowledge of the enviroment.
- In ‘Dialog w/o History’ dialogue from past round is removed.
- In ‘Dialog w/o Feedback’ failed plans are discarded and agent try again without feedback.
LLM-proposed 3D waypoints show no clear benefit for picking sub-taks, but accelerate planning.
RoCo is strongly adaptable in:
- Object Initialization: Randomizing objects.
- Task Goal: Changing the goal.
- Robot Capability: Changing of capabilities.
They validate RoCo in a real World Setup, where the robot arm needs to collaborate with a human to complete the task. For perception they use a pre-trained object detection model, OWL-ViT, to generate scene descriptions.
6. Multi-Agent Reasoning Dataset
In addition to their man result, they introduce a text-based benchmark called, RoCoBench-Text, to evaluate an LLM’s agent reasoning ability.
7. Conclusion
Limitation:
- RoCo assumes object detection is accurate.
- Motion trajectories can lead to errors.
- Querying LLMs is expensive.
RoCo is a new framework for collaboration of multiple robots with each other to solve tasks.