Local Models¶
EMOS can run all AI components entirely on-device using built-in local models. No Ollama, no RoboML, no cloud API — just set enable_local_model=True in the component config and you’re running inference locally.
This is useful for:
Offline robots that operate without network access
Edge deployment where latency to a remote server is unacceptable
Quick prototyping when you don’t want to set up a model serving platform
Note
Models are auto-downloaded from HuggingFace on first use. Subsequent runs load from cache.
Dependencies¶
Local models require one additional pip package depending on the component type:
LLM / VLM:
pip install llama-cpp-pythonSTT / TTS:
pip install sherpa-onnx
These are pre-installed in EMOS Docker containers.
Local LLM¶
The simplest possible EMOS agent — an LLM running entirely on-device with no model client:
from agents.components import LLM
from agents.config import LLMConfig
from agents.ros import Topic, Launcher
config = LLMConfig(
enable_local_model=True,
device_local_model="cpu", # or "cuda" for GPU
ncpu_local_model=4,
)
query = Topic(name="user_query", msg_type="String")
response = Topic(name="response", msg_type="String")
llm = LLM(
inputs=[query],
outputs=[response],
config=config,
trigger=query,
component_name="local_brain",
)
launcher = Launcher()
launcher.add_pkg(components=[llm])
launcher.bringup()
The default model is Qwen3 0.6B (GGUF format). To use a different model, set local_model_path to any HuggingFace GGUF repo ID or a local file path:
config = LLMConfig(
enable_local_model=True,
local_model_path="Qwen/Qwen3-1.7B-GGUF", # larger model
)
Local VLM¶
A vision-language model that processes both text and images on-device:
from agents.components import VLM
from agents.config import VLMConfig
from agents.ros import Topic, Launcher
config = VLMConfig(enable_local_model=True)
text_in = Topic(name="text_query", msg_type="String")
image_in = Topic(name="image_raw", msg_type="Image")
text_out = Topic(name="response", msg_type="String")
vlm = VLM(
inputs=[text_in, image_in],
outputs=[text_out],
config=config,
trigger=text_in,
component_name="local_vision",
)
launcher = Launcher()
launcher.add_pkg(components=[vlm])
launcher.bringup()
The default model is Moondream2 (GGUF format).
Warning
Streaming output (stream=True) is not supported with local VLM models. The component will return the complete response once inference finishes.
Local Speech-to-Text¶
Convert spoken audio to text using an on-device Whisper model:
from agents.components import SpeechToText
from agents.config import SpeechToTextConfig
from agents.ros import Topic, Launcher
config = SpeechToTextConfig(
enable_local_model=True,
enable_vad=True, # voice activity detection
)
audio_in = Topic(name="audio0", msg_type="Audio")
text_out = Topic(name="transcription", msg_type="String")
stt = SpeechToText(
inputs=[audio_in],
outputs=[text_out],
config=config,
trigger=audio_in,
component_name="local_stt",
)
launcher = Launcher()
launcher.add_pkg(components=[stt])
launcher.bringup()
The default model is Whisper tiny.en (via sherpa-onnx). For other languages or larger models, see the sherpa-onnx pretrained models and set local_model_path accordingly.
Warning
Streaming output (stream=True) is not supported with local STT models. Use a WebSocket client (e.g. RoboMLWSClient) if you need streaming transcription.
Local Text-to-Speech¶
Synthesize speech on-device and play it through the robot’s speakers:
from agents.components import TextToSpeech
from agents.config import TextToSpeechConfig
from agents.ros import Topic, Launcher
config = TextToSpeechConfig(
enable_local_model=True,
play_on_device=True, # play audio on the robot's speakers
)
text_in = Topic(name="text_input", msg_type="String")
tts = TextToSpeech(
inputs=[text_in],
outputs=[],
config=config,
trigger=text_in,
component_name="local_tts",
)
launcher = Launcher()
launcher.add_pkg(components=[tts])
launcher.bringup()
The default model is Kokoro EN (via sherpa-onnx).
Warning
Streaming output (stream=True) is not supported with local TTS models.
Complete Example: Local Conversational Agent¶
Here is a full conversational agent — speech-to-text, vision-language model, and text-to-speech — running entirely on local models. No Ollama, no RoboML, no cloud API.
1from agents.components import VLM, SpeechToText, TextToSpeech
2from agents.config import SpeechToTextConfig, VLMConfig, TextToSpeechConfig
3from agents.ros import Topic, Launcher
4
5# --- Speech-to-Text (Whisper tiny.en via sherpa-onnx) ---
6audio_in = Topic(name="audio0", msg_type="Audio")
7text_query = Topic(name="text0", msg_type="String")
8
9stt_config = SpeechToTextConfig(
10 enable_local_model=True,
11 enable_vad=True,
12)
13
14speech_to_text = SpeechToText(
15 inputs=[audio_in],
16 outputs=[text_query],
17 config=stt_config,
18 trigger=audio_in,
19 component_name="speech_to_text",
20)
21
22# --- VLM (Moondream2 via llama-cpp-python) ---
23image_in = Topic(name="image_raw", msg_type="Image")
24text_answer = Topic(name="text1", msg_type="String")
25
26vlm_config = VLMConfig(enable_local_model=True)
27
28vlm = VLM(
29 inputs=[text_query, image_in],
30 outputs=[text_answer],
31 config=vlm_config,
32 trigger=text_query,
33 component_name="vision_brain",
34)
35
36# --- Text-to-Speech (Kokoro via sherpa-onnx) ---
37tts_config = TextToSpeechConfig(
38 enable_local_model=True,
39 play_on_device=True,
40)
41
42text_to_speech = TextToSpeech(
43 inputs=[text_answer],
44 outputs=[],
45 config=tts_config,
46 trigger=text_answer,
47 component_name="text_to_speech",
48)
49
50# --- Launch ---
51launcher = Launcher()
52launcher.add_pkg(components=[speech_to_text, vlm, text_to_speech])
53launcher.bringup()
This recipe creates the same pipeline as the Conversational Agent tutorial, but runs fully offline. The trade-off is that on-device models are smaller and less capable than hosted alternatives — they work well for simple interactions but may struggle with complex reasoning.
See also
Built-in Local Models for the full table of default models and configuration options.
Fallback Recipes for using local models as automatic fallbacks when a remote server fails.
Conversational Agent for the server-based version using Ollama and RoboML.