GoTo Navigation¶
In the previous recipe we built a graph-backed spatio-temporal memory using the Memory component. Memory tags every observation with the robot’s pose, so for any concept the agent has encountered (an object class, a room label) we can ask Memory: where did we see this? It already exposes that lookup as a component action called locate, returning a centroid plus a radius for the most likely region.
In this recipe we wire locate into a Go-to-X component so that a command like “Go to the kitchen” turns into a PoseStamped goal point that the navigation stack can consume. We do this by registering Memory’s locate tool on an LLM – the LLM decides when to call the tool, Memory answers, and a small preprocessor converts the textual answer into a numpy coordinate that gets published as the goal.
See also
For the conceptual reference of Memory’s full retrieval surface, see the Memory page. For a generic introduction to LLM tool calling that is not tied to navigation, see the next recipe, Tool Calling.
Prerequisites
Memory needs the eMEM package: pip install emem.
What we’re building¶
Three components in a single launcher:
Component |
Role |
|---|---|
Vision |
Object detector publishing |
Memory |
Graph-backed spatio-temporal memory ingesting detections; tags each observation with the robot’s pose from |
goto LLM |
A plain |
Memory’s locate returns a textual answer like "... Location: (10.3, 9.8, 0.0) ..." (centroid plus a description). We wire a small regex preprocessor onto the goto LLM’s output topic that picks the centroid out and converts it to an np.ndarray, which the framework publishes as the PoseStamped goal point.
Step 1: Vision and Memory¶
from agents.clients import OllamaClient, RoboMLRESPClient
from agents.components import Memory, Vision
from agents.config import MemoryConfig, VisionConfig
from agents.models import OllamaModel, VisionModel
from agents.ros import Launcher, MemLayer, Topic
image0 = Topic(name="image_raw", msg_type="Image")
detections_topic = Topic(name="detections", msg_type="Detections")
position = Topic(name="odom", msg_type="Odometry")
vision = Vision(
inputs=[image0],
outputs=[detections_topic],
trigger=image0,
config=VisionConfig(threshold=0.5),
model_client=RoboMLRESPClient(
VisionModel(name="rtdetr", checkpoint="PekingU/rtdetr_r50vd_coco_o365")
),
component_name="vision",
)
embedding_client = OllamaClient(
OllamaModel(name="embeddings", checkpoint="nomic-embed-text-v2-moe:latest")
)
memory = Memory(
layers=[MemLayer(subscribes_to=detections_topic)],
position=position,
embedding_client=embedding_client,
config=MemoryConfig(db_path="/tmp/go_to_x.db"),
trigger=10.0,
component_name="memory",
)
Each detection becomes an ObservationNode in Memory tagged with the robot’s pose at the moment the detection was made. After a few minutes of accumulation, eMEM’s entity layer auto-merges nearby semantically-similar detections into persistent entities, and locate("chair") returns the centroid of the merged “chair” entity.
Step 2: The Go-to-X LLM¶
from agents.components import LLM
from agents.config import LLMConfig
qwen = OllamaModel(name="qwen", checkpoint="qwen3.5:latest")
qwen_client = OllamaClient(qwen)
goto_in = Topic(name="goto_in", msg_type="String")
goal_point = Topic(name="goal_point", msg_type="PoseStamped")
goto = LLM(
inputs=[goto_in],
outputs=[goal_point],
model_client=qwen_client,
trigger=goto_in,
config=LLMConfig(),
component_name="go_to_x",
)
goto.set_component_prompt(
template=(
"The user asks you to go to a place. Use the available tools to "
"look up the place's location in memory. Pass the place name to "
"the locate tool as the ``concept`` argument. User said: {{goto_in}}"
)
)
Step 3: Register Memory’s locate tool on the LLM¶
memory.register_tools_on(goto, tools=["locate"], send_tool_response_to_model=False)
register_tools_on exposes Memory’s component actions to the goto LLM as callable tools. We register a single tool, locate, since the LLM only needs the lookup capability for this recipe. The flag send_tool_response_to_model=False is what makes the recipe end-to-end: instead of feeding locate’s answer back into the LLM for a follow-up generation, the answer becomes the output of the LLM component. After preprocessing, that output is what gets published on goal_point.
Step 4: Parse Memory’s textual answer into coordinates¶
locate returns text formatted like:
Location: (10.3, 9.8, 0.0)
Radius: 1.5m
Based on: 5 memories (3x detections, 2x scene)
[detections] kitchen counter with cups and a kettle...
[scene] open kitchen area near the pantry...
[detections] kitchen island with stools...
We register a small preprocessor on the goal_point output topic that pulls the centroid out of that text and converts it to an np.ndarray. The framework then publishes the array as a PoseStamped.
import re
from typing import Optional
import numpy as np
_LOCATION_RE = re.compile(r"Location:\s*\(([^)]+)\)")
def locate_text_to_goal_point(output: str) -> Optional[np.ndarray]:
"""Pull the centroid coordinates out of Memory.locate's text output."""
match = _LOCATION_RE.search(output)
if not match:
return # no match → nothing to publish
try:
coords = np.fromstring(match.group(1), sep=",", dtype=np.float64)
except ValueError:
return
if coords.shape[0] == 2:
coords = np.append(coords, 0.0)
if coords.shape[0] != 3:
return
return coords
goto.add_publisher_preprocessor(goal_point, locate_text_to_goal_point)
If the LLM (correctly) called locate, the regex matches, the coordinates parse cleanly, and the goal point is published. If the LLM hallucinated or the place is unknown to Memory, the preprocessor returns None and nothing is published – the navigation stack sees no spurious goal.
Step 5: Launch¶
launcher = Launcher()
launcher.add_pkg(components=[vision, memory, goto])
launcher.bringup()
Full recipe code¶
1import re
2from typing import Optional
3
4import numpy as np
5
6from agents.clients import OllamaClient, RoboMLRESPClient
7from agents.components import LLM, Memory, Vision
8from agents.config import LLMConfig, MemoryConfig, VisionConfig
9from agents.models import OllamaModel, VisionModel
10from agents.ros import Launcher, MemLayer, Topic
11
12
13# -- Perception side: vision + memory --
14image0 = Topic(name="image_raw", msg_type="Image")
15detections_topic = Topic(name="detections", msg_type="Detections")
16position = Topic(name="odom", msg_type="Odometry")
17
18vision = Vision(
19 inputs=[image0],
20 outputs=[detections_topic],
21 trigger=image0,
22 config=VisionConfig(threshold=0.5),
23 model_client=RoboMLRESPClient(
24 VisionModel(name="rtdetr", checkpoint="PekingU/rtdetr_r50vd_coco_o365")
25 ),
26 component_name="vision",
27)
28
29embedding_client = OllamaClient(
30 OllamaModel(name="embeddings", checkpoint="nomic-embed-text-v2-moe:latest")
31)
32
33memory = Memory(
34 layers=[MemLayer(subscribes_to=detections_topic)],
35 position=position,
36 embedding_client=embedding_client,
37 config=MemoryConfig(db_path="/tmp/go_to_x.db"),
38 trigger=10.0,
39 component_name="memory",
40)
41
42
43# -- Go-to-X LLM --
44qwen = OllamaModel(name="qwen", checkpoint="qwen3.5:latest")
45qwen_client = OllamaClient(qwen)
46
47goto_in = Topic(name="goto_in", msg_type="String")
48goal_point = Topic(name="goal_point", msg_type="PoseStamped")
49
50goto = LLM(
51 inputs=[goto_in],
52 outputs=[goal_point],
53 model_client=qwen_client,
54 trigger=goto_in,
55 config=LLMConfig(),
56 component_name="go_to_x",
57)
58
59goto.set_component_prompt(
60 template=(
61 "The user asks you to go to a place. Use the available tools to "
62 "look up the place's location in memory. Pass the place name to "
63 "the locate tool as the ``concept`` argument. User said: {{goto_in}}"
64 )
65)
66
67memory.register_tools_on(goto, tools=["locate"], send_tool_response_to_model=False)
68
69
70_LOCATION_RE = re.compile(r"Location:\s*\(([^)]+)\)")
71
72
73def locate_text_to_goal_point(output: str) -> Optional[np.ndarray]:
74 """Pull the centroid coordinates out of Memory.locate's text output."""
75 match = _LOCATION_RE.search(output)
76 if not match:
77 return
78 try:
79 coords = np.fromstring(match.group(1), sep=",", dtype=np.float64)
80 except ValueError:
81 return
82 if coords.shape[0] == 2:
83 coords = np.append(coords, 0.0)
84 if coords.shape[0] != 3:
85 return
86 return coords
87
88
89goto.add_publisher_preprocessor(goal_point, locate_text_to_goal_point)
90
91
92# -- Launch (single process so the LLM can call Memory in-process) --
93launcher = Launcher()
94launcher.add_pkg(components=[vision, memory, goto])
95launcher.bringup()
Where next¶
Tool Calling — generalises this pattern. Instead of registering Memory’s pre-defined tools, write your own Python function as a custom tool and register it with
goto.register_tool(...).Complete Agent — drops this Go-to-X pattern into a full multi-modal agent (speech I/O + vision + memory + Q&A + routing) defined in one Python script.
Cortex Driving the Full Stack — the agentic-harness version: drop a Cortex component on top of Memory and the navigation stack and the robot handles compound natural-language goals like “go to the kitchen and tell me what’s on the counter” with no orchestration code from you.
Tip
Promote this recipe to production. While you’re shaping it, the script runs straight with python recipe.py. Once it’s solid, drop it at ~/emos/recipes/<your_name>/recipe.py and run emos run <your_name> – you’ll get sensor pre-flight checks, persistent logs, and a card on the dashboard so an operator can launch it from a browser. See Running Recipes for the full development-vs-production comparison and install-mode pitfalls (especially in Container mode).