Planning & Manipulation Overview¶

Bridge the gap between high-level reasoning and physical actuation. These recipes cover Cortex – the EMOS planner-executor that decomposes goals into ordered tool calls, dispatches them, and replans on failure, all on top of capability components you wrote in isolation – alongside VLM-based planning and VLA-based end-to-end manipulation for direct motor control.

Cortex: The Agentic Harness

A component that sits on top of the rest of your recipe and turns it into a self-directing agent. Cortex auto-discovers every component’s capabilities and uses them to achieve a high-level goal.

Cortex: The Agentic Harness

Memory and Cortex

Cortex paired with graph-backed spatio-temporal memory. It recalls past observations, reasons about internal state via interoception, and wraps action tasks in episodes that consolidate into long-term memory.

Memory and Cortex

Cortex Driving the Full Stack

The full stack. Cortex orchestrates Vision, VLM, Memory, the Kompass navigation stack, and TTS end-to-end. Compound goals fulfilled by a single agent – no behavior trees, no state machines.

Cortex Driving the Full Stack

Multimodal Planning

Navigation guided by sight, not maps. A planning VLM grounds free-form descriptions like “the yellow chair” in the live camera frame and projects them into goal points the navigation stack acts on.

Multimodal Planning

VLA Manipulation

End-to-end neural manipulation. Use one of the latest VLA foundation models – SmolVLA, Pi0, or any other policy from the HuggingFace LeRobot ecosystem – and go straight from camera frames + text to joint commands.

VLA Manipulation

Event-Driven VLA

Closed-loop manipulation from an open-loop policy. A VLM watches the camera during execution and stops the VLA the moment it sees the task complete – or sees it going wrong.

Event-Driven VLA