Planning & Manipulation Overview

Bridge the gap between high-level reasoning and physical actuation. These recipes demonstrate VLM-based planning and VLA-based end-to-end robotic manipulation.


Multimodal Planning

Use a VLM to decompose complex instructions into executable low-level actions.

Multimodal Planning
VLA Manipulation

Map visual inputs directly to joint commands using Vision-Language-Action models.

VLA Manipulation
Event-Driven VLA

Closed-loop manipulation – a VLM referee stops actions on visual task completion.

Event-Driven VLA