Notes from the Wired

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

April 14, 2024 | 911 words | 5min read

Paper Title: RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Link to Paper: https://arxiv.org/abs/2307.04738
Date: 10. July 2023
Paper Type: NLP, LLM, Robots
Short Abstract:
The goal of this paper is to improve multi-robot collaboration trough harnessing the power of LLM. For that they equip the robots with a LLM to discuss their task and form strategies. The LLMs form strategies through the generation of sub-tasks, which are then transformed to space waypoints. The space waypoints are used by motion planner to generate trajectories for the robot arms.

1. Introduction

Multi-robot system, such as multiple robots working at a assembly line, are interesting for their promise of enhancing productivity. But they have multiple challenges to overcome:

The zero-shot method of the author called RoCo, consist of three components:

Furthermore they introduce RoCoBench, a benchmark which test the robots on 6 multi-robot manipulation tasks.

2. Preliminaries

Task Assumptions:

Multi-arm Path Planning:

3. Multi-Robot Collaboration with LLMs

3.1 Multi-Agent Dialog via LLMs

Before, each environment interaction, the robot arms will do an round of dialog where each robot has a LLM assigned to it, which receives information and responds to it.

Each agent gets the same LLM prompt structure, but with different content:

  1. Task Context: Describes the objectives of the task.
  2. Round History: Past Dialogue and executed actions.
  3. Agent capabilities: The Agent skills.
  4. Communication Instructions: How to responds to the other agents.
  5. Current Observation: What the agent is currently ‘seeing’.
  6. Plan Feedback:(optional) Reasons why a sub-task plan failed.

Each agent is asked to end the response with either deciding to 1) continue the discussion or 2) summarize everyone actions and make a final proposal. The second option is only allowed if every agent responded at least once.

3.2 LLM-Generated Sub-task Plan

After the discussion ends, the agent needs to summarize the results and make a ‘sub-task plan’, where each agent gets a sub-task(e.g. pick up a object) and a 3D waypoint. Before execution the sub-task plans are validated, if a check fails the feedback is appended to the agent prompt an another round of discussion starts. Following validation have to be passed:

  1. Task parsing plan follows the desired format.
  2. Task Constrains check if the plan complies with the robots capabilities.
  3. IK checks whether a robot arm position is feasible via iinverse kinemtics.
  4. Collision Checking check if the robot arm position will cause a collision.
  5. Valid Waypoints if a task requires path planning, each intermediate waypoints must pass IK and Collision Check.

A task is considered to be failed, if a maximum amount of discussion was reached.

3.3 LLM-informed Motion Planning

Once all validation have been passed , they are combined with IK to produce a final goal configuration for all robot arms. The goal is than passed to an RRT-based multi arm motion planner that plans for all arms and outputs motion trajectories for each arm.

4. Benchmarks

The RoCoBench benchmark consist of 6 multi-robot collaboration in a tabletop setting. Each task has three key properties:

  1. Task decomposition whether a task can be decomposed into sub-tasks.
  2. Observation space: how much of the task is shared.
  3. Workspace overlap: proximity between robots.

5. Experiments

They evaluate following methods:

LLM-proposed 3D waypoints show no clear benefit for picking sub-taks, but accelerate planning.

RoCo is strongly adaptable in:

They validate RoCo in a real World Setup, where the robot arm needs to collaborate with a human to complete the task. For perception they use a pre-trained object detection model, OWL-ViT, to generate scene descriptions.

6. Multi-Agent Reasoning Dataset

In addition to their man result, they introduce a text-based benchmark called, RoCoBench-Text, to evaluate an LLM’s agent reasoning ability.

7. Conclusion

Limitation:

RoCo is a new framework for collaboration of multiple robots with each other to solve tasks.