GoTo Navigation

In the previous recipe we built a graph-backed spatio-temporal memory using the Memory component. Memory tags every observation with the robot’s pose, so for any concept the agent has encountered (an object class, a room label) we can ask Memory: where did we see this? It already exposes that lookup as a component action called locate, returning a centroid plus a radius for the most likely region.

In this recipe we wire locate into a Go-to-X component so that a command like “Go to the kitchen” turns into a PoseStamped goal point that the navigation stack can consume. We do this by registering Memory’s locate tool on an LLM – the LLM decides when to call the tool, Memory answers, and a small preprocessor converts the textual answer into a numpy coordinate that gets published as the goal.

See also

For the conceptual reference of Memory’s full retrieval surface, see the Memory page. For a generic introduction to LLM tool calling that is not tied to navigation, see the next recipe, Tool Calling.

Prerequisites

Memory needs the eMEM package: pip install emem.


What we’re building

Three components in a single launcher:

Component

Role

Vision

Object detector publishing Detections per frame. Feeds Memory.

Memory

Graph-backed spatio-temporal memory ingesting detections; tags each observation with the robot’s pose from /odom.

goto LLM

A plain LLM component that takes a free-form “go to X” query, calls Memory’s locate tool to look up the place, and publishes the resulting coordinates on goal_point.

Memory’s locate returns a textual answer like "... Location: (10.3, 9.8, 0.0) ..." (centroid plus a description). We wire a small regex preprocessor onto the goto LLM’s output topic that picks the centroid out and converts it to an np.ndarray, which the framework publishes as the PoseStamped goal point.


Step 1: Vision and Memory

from agents.clients import OllamaClient, RoboMLRESPClient
from agents.components import Memory, Vision
from agents.config import MemoryConfig, VisionConfig
from agents.models import OllamaModel, VisionModel
from agents.ros import Launcher, MemLayer, Topic

image0 = Topic(name="image_raw", msg_type="Image")
detections_topic = Topic(name="detections", msg_type="Detections")
position = Topic(name="odom", msg_type="Odometry")

vision = Vision(
    inputs=[image0],
    outputs=[detections_topic],
    trigger=image0,
    config=VisionConfig(threshold=0.5),
    model_client=RoboMLRESPClient(
        VisionModel(name="rtdetr", checkpoint="PekingU/rtdetr_r50vd_coco_o365")
    ),
    component_name="vision",
)

embedding_client = OllamaClient(
    OllamaModel(name="embeddings", checkpoint="nomic-embed-text-v2-moe:latest")
)

memory = Memory(
    layers=[MemLayer(subscribes_to=detections_topic)],
    position=position,
    embedding_client=embedding_client,
    config=MemoryConfig(db_path="/tmp/go_to_x.db"),
    trigger=10.0,
    component_name="memory",
)

Each detection becomes an ObservationNode in Memory tagged with the robot’s pose at the moment the detection was made. After a few minutes of accumulation, eMEM’s entity layer auto-merges nearby semantically-similar detections into persistent entities, and locate("chair") returns the centroid of the merged “chair” entity.


Step 2: The Go-to-X LLM

from agents.components import LLM
from agents.config import LLMConfig

qwen = OllamaModel(name="qwen", checkpoint="qwen3.5:latest")
qwen_client = OllamaClient(qwen)

goto_in = Topic(name="goto_in", msg_type="String")
goal_point = Topic(name="goal_point", msg_type="PoseStamped")

goto = LLM(
    inputs=[goto_in],
    outputs=[goal_point],
    model_client=qwen_client,
    trigger=goto_in,
    config=LLMConfig(),
    component_name="go_to_x",
)

goto.set_component_prompt(
    template=(
        "The user asks you to go to a place. Use the available tools to "
        "look up the place's location in memory. Pass the place name to "
        "the locate tool as the ``concept`` argument. User said: {{goto_in}}"
    )
)

Step 3: Register Memory’s locate tool on the LLM

memory.register_tools_on(goto, tools=["locate"], send_tool_response_to_model=False)

register_tools_on exposes Memory’s component actions to the goto LLM as callable tools. We register a single tool, locate, since the LLM only needs the lookup capability for this recipe. The flag send_tool_response_to_model=False is what makes the recipe end-to-end: instead of feeding locate’s answer back into the LLM for a follow-up generation, the answer becomes the output of the LLM component. After preprocessing, that output is what gets published on goal_point.


Step 4: Parse Memory’s textual answer into coordinates

locate returns text formatted like:

Location: (10.3, 9.8, 0.0)
Radius: 1.5m
Based on: 5 memories (3x detections, 2x scene)
  [detections] kitchen counter with cups and a kettle...
  [scene] open kitchen area near the pantry...
  [detections] kitchen island with stools...

We register a small preprocessor on the goal_point output topic that pulls the centroid out of that text and converts it to an np.ndarray. The framework then publishes the array as a PoseStamped.

import re
from typing import Optional

import numpy as np

_LOCATION_RE = re.compile(r"Location:\s*\(([^)]+)\)")


def locate_text_to_goal_point(output: str) -> Optional[np.ndarray]:
    """Pull the centroid coordinates out of Memory.locate's text output."""
    match = _LOCATION_RE.search(output)
    if not match:
        return  # no match → nothing to publish
    try:
        coords = np.fromstring(match.group(1), sep=",", dtype=np.float64)
    except ValueError:
        return
    if coords.shape[0] == 2:
        coords = np.append(coords, 0.0)
    if coords.shape[0] != 3:
        return
    return coords


goto.add_publisher_preprocessor(goal_point, locate_text_to_goal_point)

If the LLM (correctly) called locate, the regex matches, the coordinates parse cleanly, and the goal point is published. If the LLM hallucinated or the place is unknown to Memory, the preprocessor returns None and nothing is published – the navigation stack sees no spurious goal.


Step 5: Launch

launcher = Launcher()
launcher.add_pkg(components=[vision, memory, goto])
launcher.bringup()

Full recipe code

Go-to-X with Memory tool calling
 1import re
 2from typing import Optional
 3
 4import numpy as np
 5
 6from agents.clients import OllamaClient, RoboMLRESPClient
 7from agents.components import LLM, Memory, Vision
 8from agents.config import LLMConfig, MemoryConfig, VisionConfig
 9from agents.models import OllamaModel, VisionModel
10from agents.ros import Launcher, MemLayer, Topic
11
12
13# -- Perception side: vision + memory --
14image0 = Topic(name="image_raw", msg_type="Image")
15detections_topic = Topic(name="detections", msg_type="Detections")
16position = Topic(name="odom", msg_type="Odometry")
17
18vision = Vision(
19    inputs=[image0],
20    outputs=[detections_topic],
21    trigger=image0,
22    config=VisionConfig(threshold=0.5),
23    model_client=RoboMLRESPClient(
24        VisionModel(name="rtdetr", checkpoint="PekingU/rtdetr_r50vd_coco_o365")
25    ),
26    component_name="vision",
27)
28
29embedding_client = OllamaClient(
30    OllamaModel(name="embeddings", checkpoint="nomic-embed-text-v2-moe:latest")
31)
32
33memory = Memory(
34    layers=[MemLayer(subscribes_to=detections_topic)],
35    position=position,
36    embedding_client=embedding_client,
37    config=MemoryConfig(db_path="/tmp/go_to_x.db"),
38    trigger=10.0,
39    component_name="memory",
40)
41
42
43# -- Go-to-X LLM --
44qwen = OllamaModel(name="qwen", checkpoint="qwen3.5:latest")
45qwen_client = OllamaClient(qwen)
46
47goto_in = Topic(name="goto_in", msg_type="String")
48goal_point = Topic(name="goal_point", msg_type="PoseStamped")
49
50goto = LLM(
51    inputs=[goto_in],
52    outputs=[goal_point],
53    model_client=qwen_client,
54    trigger=goto_in,
55    config=LLMConfig(),
56    component_name="go_to_x",
57)
58
59goto.set_component_prompt(
60    template=(
61        "The user asks you to go to a place. Use the available tools to "
62        "look up the place's location in memory. Pass the place name to "
63        "the locate tool as the ``concept`` argument. User said: {{goto_in}}"
64    )
65)
66
67memory.register_tools_on(goto, tools=["locate"], send_tool_response_to_model=False)
68
69
70_LOCATION_RE = re.compile(r"Location:\s*\(([^)]+)\)")
71
72
73def locate_text_to_goal_point(output: str) -> Optional[np.ndarray]:
74    """Pull the centroid coordinates out of Memory.locate's text output."""
75    match = _LOCATION_RE.search(output)
76    if not match:
77        return
78    try:
79        coords = np.fromstring(match.group(1), sep=",", dtype=np.float64)
80    except ValueError:
81        return
82    if coords.shape[0] == 2:
83        coords = np.append(coords, 0.0)
84    if coords.shape[0] != 3:
85        return
86    return coords
87
88
89goto.add_publisher_preprocessor(goal_point, locate_text_to_goal_point)
90
91
92# -- Launch (single process so the LLM can call Memory in-process) --
93launcher = Launcher()
94launcher.add_pkg(components=[vision, memory, goto])
95launcher.bringup()

Where next

  • Tool Calling — generalises this pattern. Instead of registering Memory’s pre-defined tools, write your own Python function as a custom tool and register it with goto.register_tool(...).

  • Complete Agent — drops this Go-to-X pattern into a full multi-modal agent (speech I/O + vision + memory + Q&A + routing) defined in one Python script.

  • Cortex Driving the Full Stack — the agentic-harness version: drop a Cortex component on top of Memory and the navigation stack and the robot handles compound natural-language goals like “go to the kitchen and tell me what’s on the counter” with no orchestration code from you.


Tip

Promote this recipe to production. While you’re shaping it, the script runs straight with python recipe.py. Once it’s solid, drop it at ~/emos/recipes/<your_name>/recipe.py and run emos run <your_name> – you’ll get sensor pre-flight checks, persistent logs, and a card on the dashboard so an operator can launch it from a browser. See Running Recipes for the full development-vs-production comparison and install-mode pitfalls (especially in Container mode).