Cortex: The Agentic Harness¶
A single component, dropped on top of the rest of your recipe, that turns it into a self-directed agent.
Most EMOS recipes you’ve seen so far are programmed: events trigger components, components publish to topics, fallbacks recover from failure – and you, the recipe author, hand-wired every link. The Cortex component is a different shape. Drop a Cortex into your recipe and it discovers every other component you added, registers every method they expose as a callable tool, and – given a high-level goal like “track the person on the left and tell me what they’re holding” – decomposes it into an ordered plan, dispatches each step, watches the feedback, and replans on failure. No orchestration glue from you.
If Claude Code is an agentic harness for software engineering, Cortex is an agentic harness for embodied intelligence. The capability components – Vision, VLM, TTS, navigation, memory – are the robot’s limbs and senses. Cortex is the part that decides what to do next using these capabilities.
See also
For the conceptual model and the full list of capabilities Cortex auto-discovers, see Cortex. For Cortex paired with a graph-backed spatio-temporal memory, see Memory and Cortex. For Cortex orchestrating the navigation stack on top of all of that, see Cortex Driving the Full Stack.
The shape of the abstraction¶
A capability component such as Vision exposes its primary work as topics (/detections, /trackings) but it also exposes private methods decorated with @component_action:
class Vision(Component):
@component_action(description={...})
def track(self, label: str): ...
@component_action(description={...})
def take_picture(self, save_path: str = "..."): ...
These actions are normally invisible – they require explicit wiring through the events/actions system to be useful. Cortex changes that. When you drop a Cortex component into the launcher, on activation it walks every managed component and discovers:
What gets discovered |
What Cortex does with it |
|---|---|
|
Auto-registers as an LLM tool, namespaced as |
|
Same as above – exposed as callable recovery tools the planner can fall back to. |
Additional ROS services ( |
Registered as |
Additional ROS action servers |
Registered as |
The component’s main action server |
Registered the same way – so a Planner running as |
Component config parameters |
Reachable via the built-in |
Component structure |
Reachable via the built-in |
Every one of those tools is automatic. You write the components; Cortex makes them addressable.
What we’re building¶
A robot that, when you tell it “describe what you see and then start tracking the person”, will:
Plan three steps – call
vlm.describe, feed its answer totts.say, then callvision.track.Execute them in order – describing the scene, speaking the description through
tts.say, then askingVisionto start tracking the requested label.Report each step’s result back into the planning loop and close out the episode.
The recipe is short. There is no event wiring. There are no fallback policies. There are no topic-routed connections between the VLM, the TTS, and Cortex – speech happens because Cortex calls tts.say() as a tool, not because some output topic is silently subscribed by TTS. We don’t write a single prompt either – Cortex’s built-in prompts plus the auto-discovered tool descriptions are the prompt.
Step 1: Bring up your capability components¶
We need eyes, a voice, and visual reasoning. Standard EMOS capability components:
from agents.clients import OllamaClient, RoboMLRESPClient
from agents.components import TextToSpeech, VLM, Vision
from agents.config import TextToSpeechConfig, VisionConfig
from agents.models import OllamaModel, VisionModel
from agents.ros import Topic
# Vision — detection + tracking
detection_model = VisionModel(name="rtdetr", checkpoint="PekingU/rtdetr_r50vd_coco_o365")
detection_client = RoboMLRESPClient(detection_model)
image_in = Topic(name="/image_raw", msg_type="Image")
detections = Topic(name="detections", msg_type="Detections")
trackings = Topic(name="trackings", msg_type="Trackings")
vision = Vision(
inputs=[image_in],
outputs=[detections, trackings],
model_client=detection_client,
config=VisionConfig(threshold=0.5),
trigger=0.5,
component_name="vision",
)
# VLM — visual question answering. Cortex invokes it via ``vlm.describe``,
# and the action's return value comes back as the tool result.
vlm_model = OllamaModel(name="qwen_vl", checkpoint="qwen2.5vl:latest")
vlm_client = OllamaClient(vlm_model)
vlm_query = Topic(name="vlm_query", msg_type="String")
vlm_response = Topic(name="vlm_response", msg_type="String")
vlm = VLM(
inputs=[vlm_query, image_in],
outputs=[vlm_response],
model_client=vlm_client,
trigger=vlm_query,
component_name="vlm",
)
# TTS — speech happens via Cortex calling ``tts.say(text=...)``.
tts_input = Topic(name="tts_input", msg_type="String")
tts = TextToSpeech(
inputs=[tts_input],
config=TextToSpeechConfig(enable_local_model=True, play_on_device=True),
trigger=tts_input,
component_name="tts",
)
Nothing here is Cortex-specific. Each component has its own @component_action methods declared upstream – Vision.track, Vision.take_picture, VLM.describe, TTS.say – and Cortex will discover all of them on activation. Cortex sequences the components by calling their actions in turn.
Step 2: Drop in Cortex¶
from agents.components import Cortex
from agents.config import CortexConfig
from agents.ros import Action
# A planner LLM. Choose a chat-grade model — the smaller, the faster the loop.
planner_model = OllamaModel(name="qwen", checkpoint="qwen3.5:latest")
planner_client = OllamaClient(planner_model)
# Cortex publishes its text-only replies (cases where the planner decides no
# tool calls are needed) to this topic for downstream consumers (e.g. the Web
# UI). When the planner *does* want the robot to speak, it calls ``tts.say``
# as a tool -- it does not rely on this topic being subscribed by TTS.
cortex_output = Topic(name="cortex_output", msg_type="String")
cortex = Cortex(
output=cortex_output,
model_client=planner_client,
config=CortexConfig(max_planning_steps=5, max_execution_steps=10),
component_name="cortex",
)
That’s all. No actions=[…] list – the capability components contribute their own actions. No prompt – the built-in prompt plus the discovered tool descriptions are the prompt. No fallback wiring – Cortex confirms each step before executing it and replans on failure.
Step 3: Adding your own custom action¶
A capability you want exposed to the planner that doesn’t naturally live on a managed component? Pass it as a custom Action. Cortex registers it alongside everything else.
led_on = False
def toggle_led():
"""Toggle an LED on the robot."""
global led_on
led_on = not led_on
print(f"LED toggled {'ON' if led_on else 'OFF'}")
cortex = Cortex(
actions=[
Action(method=toggle_led, description="Toggle the robot's LED on or off."),
],
output=cortex_output,
model_client=planner_client,
config=CortexConfig(max_planning_steps=5, max_execution_steps=10),
component_name="cortex",
)
The description is mandatory – it’s what the planner sees when deciding whether to call this tool.
Step 4: Launch¶
from agents.ros import Launcher
launcher = Launcher()
launcher.enable_ui(
inputs=[cortex.ui_main_action_input],
outputs=[cortex_output],
)
launcher.add_pkg(
components=[vision, vlm, tts, cortex],
multiprocessing=True,
package_name="automatika_embodied_agents",
)
launcher.on_process_fail() # process-level safety net
launcher.bringup()
launcher.enable_ui registers a goal-input field for Cortex’s main action and a streaming output panel for cortex_output. The whole agent runs in a single launcher process tree.
Talking to the agent¶
Open the Web UI at http://localhost:5001 and send tasks in plain English:
Goal |
What Cortex plans |
|---|---|
“describe what you see” |
Two steps: |
“start tracking the person” |
One step: |
“take a picture, describe it, then track whatever’s in front of you” |
Three steps, sequenced. The third step’s argument is bound from the second step’s output – Cortex resolves |
“toggle the LED” |
One step: the custom |
“are you ok?” |
No actions needed. The planner returns text only; the reply lands on the |
Or send a goal from another terminal directly to Cortex’s action server:
ros2 action send_goal /cortex_<process_id>/cortex_input_command \
automatika_embodied_agents/action/VisionLanguageAction \
"{task: 'describe what you see and track the person'}"
Watch the launcher’s main logging card to see the planning trace, the goals Cortex dispatches, and feedback streamed back from each component.
What just happened¶
When you sent the goal “describe what you see and then start tracking the person”, Cortex:
Built a plan via the planning loop. The first iteration optionally called
inspect_component("vision")to confirm the tool surface, then committed three execution tool calls.Confirmed and called each step in turn. The first (
vlm.describe) returned a text description; the second (tts.say) was called with that description bound as itstextargument and the speaker spoke it; the third (vision.track) asked the Vision component to start continuous tracking on the named label and returned a confirmation string. Tracking results then streamed on the component’strackingstopic for any downstream consumer to use.With every step’s tool result folded back into the trace, the episode closed and the plan returned
SUCCEEDED.
Compare that to the equivalent recipe written without Cortex: bespoke event wiring for the trigger, hand-tuned prompts on each component, manual sequencing of the speech and tracking calls. Cortex collapses all of that into the one component you just dropped in.
Tip
For the long-running case – where Cortex should dispatch a Kompass action server like the Controller’s track_vision_target (or the Planner’s navigate_to_goal) and watch its feedback stream until the goal completes – add the Controller (or Planner) component to the launcher. Cortex auto-registers each one’s main action server as send_goal_to_<server> and switches into asynchronous monitoring mode. See Cortex Driving the Full Stack.
Where next¶
Cortex Driving the Full Stack – the showcase tutorial. Cortex orchestrates a navigation stack, vision, memory, and speech to handle compound natural-language goals like “go to the kitchen and tell me what’s on the counter”.
Memory and Cortex – add a graph-backed spatio-temporal memory so Cortex can reason over past observations and the robot’s own internal state.
Cortex concept page – the full reference for the planning loop, the confirmation step, RAG, async goal monitoring, and the Cortex-as-Monitor architecture.
Tip
Promote this recipe to production. While you’re shaping it, the script runs straight with python recipe.py. Once it’s solid, drop it at ~/emos/recipes/<your_name>/recipe.py and run emos run <your_name> – you’ll get sensor pre-flight checks, persistent logs, and a card on the dashboard so an operator can launch it from a browser. See Running Recipes for the full development-vs-production comparison and install-mode pitfalls (especially in Container mode).