Planning & Manipulation Overview¶
Bridge the gap between high-level reasoning and physical actuation. These recipes demonstrate VLM-based planning and VLA-based end-to-end robotic manipulation.
Multimodal Planning
Use a VLM to decompose complex instructions into executable low-level actions.
VLA Manipulation
Map visual inputs directly to joint commands using Vision-Language-Action models.
Event-Driven VLA
Closed-loop manipulation – a VLM referee stops actions on visual task completion.