Self-Healing with Fallbacks¶
In the real world, connections drop, APIs time out, solvers fail to converge, and serial cables vibrate loose. A “Production Ready” agent cannot simply freeze when something goes wrong.
EMOS provides a unified fallback API that works identically across the intelligence layer (model clients) and the navigation layer (algorithms and hardware). In this recipe, we demonstrate both.
Intelligence Layer: Model Fallback¶
We build an agent that uses a high-intelligence model (hosted remotely) as its primary brain, but automatically switches to a smaller, local model if the primary one fails.
The Strategy: Plan A and Plan B¶
Plan A (Primary): Use a powerful model hosted via RoboML (or a cloud provider) for high-quality reasoning.
Plan B (Backup): Keep a smaller model available locally as a safety net.
The Trigger: If the Primary model fails to respond (latency, disconnection, or server error), automatically swap to the Backup.
EMOS offers two approaches for implementing this strategy — a zero-config local fallback and a manual model client swap.
Approach 1: Built-in Local Fallback (Zero-Config)¶
The simplest way to add resilience is fallback_to_local(). This one-liner tells the component to switch to a built-in local model on failure — no Ollama server, no additional clients, no extra configuration.
Built-in local models are available for LLM, VLM, SpeechToText, and TextToSpeech components. Default models are lightweight and designed for on-device inference:
LLM — Qwen3 0.6B (via
llama-cpp-python)VLM — Moondream2 (via
llama-cpp-python)STT — Whisper tiny (via
sherpa-onnx)TTS — Kokoro (via
sherpa-onnx)
Note
Install the required dependency for your component: pip install llama-cpp-python for LLM/VLM, or pip install sherpa-onnx for STT/TTS. These are pre-installed in EMOS Docker containers.
from agents.components import LLM
from agents.models import TransformersLLM
from agents.clients import RoboMLHTTPClient
from agents.ros import Launcher, Topic, Action
# Primary: A powerful model hosted remotely (e.g., via RoboML)
# NOTE: This is illustrative for the sake of executing on the local machine.
# For a more realistic scenario, replace this with a GenericHTTPClient
# pointing to a cloud model.
primary_model = TransformersLLM(
name="qwen_heavy", checkpoint="Qwen/Qwen2.5-1.5B-Instruct"
)
primary_client = RoboMLHTTPClient(model=primary_model)
# Define Topics
user_query = Topic(name="user_query", msg_type="String")
llm_response = Topic(name="llm_response", msg_type="String")
# Configure the LLM Component with the primary client
llm_component = LLM(
inputs=[user_query],
outputs=[llm_response],
model_client=primary_client,
trigger=user_query,
component_name="brain",
)
# One-liner fallback: switch to built-in local model on failure.
# No additional model clients, no Ollama server — just a single Action.
switch_to_local = Action(method=llm_component.fallback_to_local)
llm_component.on_component_fail(action=switch_to_local, max_retries=3)
llm_component.on_algorithm_fail(action=switch_to_local, max_retries=3)
# Launch
launcher = Launcher()
launcher.add_pkg(
components=[llm_component],
multiprocessing=True,
package_name="automatika_embodied_agents",
)
launcher.bringup()
To test this, shut down your RoboML server (or disconnect the internet) while the agent is running, and watch it seamlessly switch to the local model.
See also
For the full list of built-in local models and configuration options, see Built-in Local Models.
Approach 2: Model Client Hot-Swap¶
When you want explicit control over which backup model and client to use, the change_model_client + additional_model_clients pattern lets you define exactly what runs as the fallback.
1. Defining the Models¶
First, we need to define our two distinct model clients.
from agents.components import LLM
from agents.models import OllamaModel, TransformersLLM
from agents.clients import OllamaClient, RoboMLHTTPClient
from agents.config import LLMConfig
from agents.ros import Launcher, Topic, Action
# --- Plan A: The Powerhouse ---
# A powerful model hosted remotely (e.g., via RoboML).
# NOTE: This is illustrative for executing on a local machine.
# For a production scenario, you might use a GenericHTTPClient pointing to
# GPT-5, Gemini, HuggingFace Inference etc.
primary_model = TransformersLLM(
name="qwen_heavy",
checkpoint="Qwen/Qwen2.5-1.5B-Instruct"
)
primary_client = RoboMLHTTPClient(model=primary_model)
# --- Plan B: The Safety Net ---
# A smaller model running locally (via Ollama) that works offline.
backup_model = OllamaModel(name="llama_local", checkpoint="llama3.2:3b")
backup_client = OllamaClient(model=backup_model)
2. Configuring the Component¶
Next, we set up the standard LLM component. We initialize it using the primary_client.
However, the magic happens in the additional_model_clients attribute. This dictionary allows the component to hold references to other valid clients that are waiting in the wings.
# Define Topics
user_query = Topic(name="user_query", msg_type="String")
llm_response = Topic(name="llm_response", msg_type="String")
# Configure the LLM Component with the PRIMARY client initially
llm_component = LLM(
inputs=[user_query],
outputs=[llm_response],
model_client=primary_client,
component_name="brain",
config=LLMConfig(stream=True),
)
# Register the Backup Client
# We store the backup client in the component's internal registry.
# We will use the key 'local_backup_client' to refer to this later.
llm_component.additional_model_clients = {"local_backup_client": backup_client}
3. Creating the Fallback Action¶
Now we need an Action. In EMOS, components have built-in methods to reconfigure themselves. The LLM component (like all other components that take a model client) has a method called change_model_client.
We wrap this method in an Action so it can be triggered by an event.
Note
All components implement some default actions as well as component specific actions. In this case we are implementing a component specific action.
See also
To see a list of default actions available to all components, checkout the Actions documentation.
# Define the Fallback Action
# This action calls the component's internal method `change_model_client`.
# We pass the key ('local_backup_client') defined in the previous step.
switch_to_backup = Action(
method=llm_component.change_model_client,
args=("local_backup_client",)
)
4. Wiring Failure to Action¶
Finally, we tell the component when to execute this action. We don’t need to write complex try/except blocks in our business logic. Instead, we attach the action to the component’s lifecycle hooks:
on_component_fail: Triggered if the component crashes or fails to initialize (e.g., the remote server is down when the robot starts).on_algorithm_fail: Triggered if the component is running, but the inference fails (e.g., the WiFi drops mid-conversation).
# Bind Failures to the Action
# If the component fails (startup) or the algorithm crashes (runtime),
# it will attempt to switch clients.
llm_component.on_component_fail(action=switch_to_backup, max_retries=3)
llm_component.on_algorithm_fail(action=switch_to_backup, max_retries=3)
Note
Why max_retries? Sometimes a fallback can temporarily fail as well. The system will attempt to restart the component or algorithm up to 3 times while applying the action (switching the client) to resolve the error. This is an optional parameter.
The Complete Client Hot-Swap Recipe¶
Here is the full code. To test this, try shutting down your RoboML server (or disconnecting the internet) while the agent is running, and watch it seamlessly switch to the local Ollama model.
from agents.components import LLM
from agents.models import OllamaModel, TransformersLLM
from agents.clients import OllamaClient, RoboMLHTTPClient
from agents.config import LLMConfig
from agents.ros import Launcher, Topic, Action
# 1. Define the Models and Clients
# Primary: A powerful model hosted remotely
primary_model = TransformersLLM(
name="qwen_heavy", checkpoint="Qwen/Qwen2.5-1.5B-Instruct"
)
primary_client = RoboMLHTTPClient(model=primary_model)
# Backup: A smaller model running locally
backup_model = OllamaModel(name="llama_local", checkpoint="llama3.2:3b")
backup_client = OllamaClient(model=backup_model)
# 2. Define Topics
user_query = Topic(name="user_query", msg_type="String")
llm_response = Topic(name="llm_response", msg_type="String")
# 3. Configure the LLM Component
llm_component = LLM(
inputs=[user_query],
outputs=[llm_response],
model_client=primary_client,
component_name="brain",
config=LLMConfig(stream=True),
)
# 4. Register the Backup Client
llm_component.additional_model_clients = {"local_backup_client": backup_client}
# 5. Define the Fallback Action
switch_to_backup = Action(
method=llm_component.change_model_client,
args=("local_backup_client",)
)
# 6. Bind Failures to the Action
llm_component.on_component_fail(action=switch_to_backup, max_retries=3)
llm_component.on_algorithm_fail(action=switch_to_backup, max_retries=3)
# 7. Launch
launcher = Launcher()
launcher.add_pkg(
components=[llm_component],
multiprocessing=True,
package_name="automatika_embodied_agents",
)
launcher.bringup()
The Same API, Both Layers¶
The key insight is that the same three hooks work everywhere in EMOS:
Hook |
Triggers When |
Intelligence Example |
Navigation Example |
|---|---|---|---|
|
Component crashes or fails to initialize |
Remote model server is down |
Serial port unavailable |
|
Inference or computation fails at runtime |
WiFi drops mid-conversation |
DWA solver can’t converge |
|
External dependency is lost |
API key revoked |
Motor controller resets |
Each hook accepts an Action (or list of actions) and an optional max_retries parameter. This consistency means you can apply the same resilience patterns regardless of which layer you’re working in.