# Local Models EMOS can run all AI components entirely on-device using built-in local models. No Ollama, no RoboML, no cloud API — just set `enable_local_model=True` in the component config and you're running inference locally. This is useful for: - **Offline robots** that operate without network access - **Edge deployment** where latency to a remote server is unacceptable - **Quick prototyping** when you don't want to set up a model serving platform ```{note} Models are auto-downloaded from HuggingFace on first use. Subsequent runs load from cache. ``` ## Dependencies Local models require one additional pip package depending on the component type: - **LLM / VLM**: `pip install llama-cpp-python` - **STT / TTS**: `pip install sherpa-onnx` These are pre-installed in EMOS Docker containers. ## Local LLM The simplest possible EMOS agent — an LLM running entirely on-device with no model client: ```python from agents.components import LLM from agents.config import LLMConfig from agents.ros import Topic, Launcher config = LLMConfig( enable_local_model=True, device_local_model="cpu", # or "cuda" for GPU ncpu_local_model=4, ) query = Topic(name="user_query", msg_type="String") response = Topic(name="response", msg_type="String") llm = LLM( inputs=[query], outputs=[response], config=config, trigger=query, component_name="local_brain", ) launcher = Launcher() launcher.add_pkg(components=[llm]) launcher.bringup() ``` The default model is **Qwen3 0.6B** (GGUF format). To use a different model, set `local_model_path` to any HuggingFace GGUF repo ID or a local file path: ```python config = LLMConfig( enable_local_model=True, local_model_path="Qwen/Qwen3-1.7B-GGUF", # larger model ) ``` ## Local VLM A vision-language model that processes both text and images on-device: ```python from agents.components import VLM from agents.config import VLMConfig from agents.ros import Topic, Launcher config = VLMConfig(enable_local_model=True) text_in = Topic(name="text_query", msg_type="String") image_in = Topic(name="image_raw", msg_type="Image") text_out = Topic(name="response", msg_type="String") vlm = VLM( inputs=[text_in, image_in], outputs=[text_out], config=config, trigger=text_in, component_name="local_vision", ) launcher = Launcher() launcher.add_pkg(components=[vlm]) launcher.bringup() ``` The default model is **Moondream2** (GGUF format). ```{warning} Streaming output (`stream=True`) is not supported with local VLM models. The component will return the complete response once inference finishes. ``` ## Local Speech-to-Text Convert spoken audio to text using an on-device Whisper model: ```python from agents.components import SpeechToText from agents.config import SpeechToTextConfig from agents.ros import Topic, Launcher config = SpeechToTextConfig( enable_local_model=True, enable_vad=True, # voice activity detection ) audio_in = Topic(name="audio0", msg_type="Audio") text_out = Topic(name="transcription", msg_type="String") stt = SpeechToText( inputs=[audio_in], outputs=[text_out], config=config, trigger=audio_in, component_name="local_stt", ) launcher = Launcher() launcher.add_pkg(components=[stt]) launcher.bringup() ``` The default model is **Whisper tiny.en** (via sherpa-onnx). For other languages or larger models, see the [sherpa-onnx pretrained models](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html) and set `local_model_path` accordingly. ```{warning} Streaming output (`stream=True`) is not supported with local STT models. Use a WebSocket client (e.g. RoboMLWSClient) if you need streaming transcription. ``` ## Local Text-to-Speech Synthesize speech on-device and play it through the robot's speakers: ```python from agents.components import TextToSpeech from agents.config import TextToSpeechConfig from agents.ros import Topic, Launcher config = TextToSpeechConfig( enable_local_model=True, play_on_device=True, # play audio on the robot's speakers ) text_in = Topic(name="text_input", msg_type="String") tts = TextToSpeech( inputs=[text_in], outputs=[], config=config, trigger=text_in, component_name="local_tts", ) launcher = Launcher() launcher.add_pkg(components=[tts]) launcher.bringup() ``` The default model is **Kokoro EN** (via sherpa-onnx). ```{warning} Streaming output (`stream=True`) is not supported with local TTS models. ``` ## Complete Example: Local Conversational Agent Here is a full conversational agent — speech-to-text, vision-language model, and text-to-speech — running entirely on local models. No Ollama, no RoboML, no cloud API. ```{code-block} python :caption: Fully Local Conversational Agent :linenos: from agents.components import VLM, SpeechToText, TextToSpeech from agents.config import SpeechToTextConfig, VLMConfig, TextToSpeechConfig from agents.ros import Topic, Launcher # --- Speech-to-Text (Whisper tiny.en via sherpa-onnx) --- audio_in = Topic(name="audio0", msg_type="Audio") text_query = Topic(name="text0", msg_type="String") stt_config = SpeechToTextConfig( enable_local_model=True, enable_vad=True, ) speech_to_text = SpeechToText( inputs=[audio_in], outputs=[text_query], config=stt_config, trigger=audio_in, component_name="speech_to_text", ) # --- VLM (Moondream2 via llama-cpp-python) --- image_in = Topic(name="image_raw", msg_type="Image") text_answer = Topic(name="text1", msg_type="String") vlm_config = VLMConfig(enable_local_model=True) vlm = VLM( inputs=[text_query, image_in], outputs=[text_answer], config=vlm_config, trigger=text_query, component_name="vision_brain", ) # --- Text-to-Speech (Kokoro via sherpa-onnx) --- tts_config = TextToSpeechConfig( enable_local_model=True, play_on_device=True, ) text_to_speech = TextToSpeech( inputs=[text_answer], outputs=[], config=tts_config, trigger=text_answer, component_name="text_to_speech", ) # --- Launch --- launcher = Launcher() launcher.add_pkg(components=[speech_to_text, vlm, text_to_speech]) launcher.bringup() ``` This recipe creates the same pipeline as the [Conversational Agent](conversational-agent.md) tutorial, but runs fully offline. The trade-off is that on-device models are smaller and less capable than hosted alternatives — they work well for simple interactions but may struggle with complex reasoning. ```{seealso} - [Built-in Local Models](../../intelligence/models.md#built-in-local-models) for the full table of default models and configuration options. - [Fallback Recipes](../events-and-resilience/fallback-recipes.md) for using local models as automatic fallbacks when a remote server fails. - [Conversational Agent](conversational-agent.md) for the server-based version using Ollama and RoboML. ```