# Event-Driven VLA

In the previous [VLA Manipulation](vla-manipulation.md) recipe, we saw how VLAs can be used in EMOS to perform physical tasks. However, the real utility of VLAs is unlocked when they are part of a bigger cognitive system. With its event-driven agent graph development, EMOS allows us to do exactly that.

Most VLA policies are "open-loop" regarding task completion -- they run for a fixed number of steps and then stop, regardless of whether they succeeded or failed.

In this tutorial, we will build a **Closed-Loop Agent** while using an open-loop policy. Even if the model correctly outputs its termination condition (i.e. an absorbing state policy), our design can act as a safety valve. We will combine:

- {material-regular}`smart_toy;1.2em;sd-text-primary` **The Player (VLA):** Attempts to pick up an object.
- {material-regular}`visibility;1.2em;sd-text-primary` **The Referee (VLM):** Watches the camera stream and judges if the task is complete.

We will use the **Event System** to trigger a stop command on the VLA the moment the VLM confirms success.

## The Player: Setting up the VLA

First, we setup our VLA component exactly as we did in the previous recipe. We will use the same **SmolVLA** policy trained for picking oranges.

```python
from agents.components import VLA
from agents.config import VLAConfig
from agents.clients import LeRobotClient
from agents.models import LeRobotPolicy
from agents.ros import Topic

# Define Topics
state = Topic(name="/isaac_joint_states", msg_type="JointState")
camera1 = Topic(name="/front_camera/image_raw", msg_type="Image")
camera2 = Topic(name="/wrist_camera/image_raw", msg_type="Image")
joints_action = Topic(name="/isaac_joint_command", msg_type="JointState")

# Setup Policy
policy = LeRobotPolicy(
    name="my_policy",
    policy_type="smolvla",
    checkpoint="aleph-ra/smolvla_finetune_pick_orange_20000",
    dataset_info_file="https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange/resolve/main/meta/info.json",
)
client = LeRobotClient(model=policy)

# Configure VLA (Mapping omitted for brevity, see previous tutorial)
# ... (assume joints_map and camera_map are defined)
config = VLAConfig(
    observation_sending_rate=5,
    action_sending_rate=5,
    joint_names_map=joints_map,
    camera_inputs_map=camera_map,
    robot_urdf_file="./so101_new_calib.urdf"
)

player = VLA(
    inputs=[state, camera1, camera2],
    outputs=[joints_action],
    model_client=client,
    config=config,
    component_name="vla_player",
)
```

## The Referee: Setting up the VLM

Now we introduce the "Referee". We will use a Vision Language Model (like Qwen-VL) to monitor the scene.

We want this component to periodically look at the `camera1` feed and answer a specific question: _"Are all the oranges in the bowl?"_

We use a `FixedInput` to ensure the VLM is asked the exact same question every time.

```python
from agents.components import VLM
from agents.clients import OllamaClient
from agents.models import OllamaModel
from agents.ros import FixedInput

# Define the topic where the VLM publishes its judgment
referee_verdict = Topic(name="/referee/verdict", msg_type="String")

# Setup the Model
qwen_vl = OllamaModel(name="qwen_vl", checkpoint="qwen2.5vl:7b")
qwen_client = OllamaClient(model=qwen_vl)

# Define the constant question
question = FixedInput(
    name="prompt",
    msg_type="String",
    fixed="Look at the image. Are all the orange in the bowl? Answer only with YES or NO."
)

# Initialize the VLM
# Note: We trigger periodically (regulated by loop_rate)
referee = VLM(
    inputs=[question, camera1],
    outputs=[referee_verdict],
    model_client=qwen_client,
    trigger=10.0,
    component_name="vlm_referee"
)
```

```{note}
To prevent the VLM from consuming too much compute, we have configured a `float` trigger, which means our `VLM` component will be triggered, not by a topic, but periodically with a `loop_rate` of once every 10 seconds.
```

```{tip}
In order to make sure that the VLM output is formatted as per our requirement (YES or NO), checkout how to use pre-processors in the [Semantic Map](../foundation/semantic-map.md) recipe. For now we will assume that if YES is part of the output string, the event should fire.
```

## The Bridge: Semantic Event Trigger

Now comes the "Self-Referential" magic. We simply define an **Event** that fires when the `/referee/verdict` topic contains the word "YES".

```python
from agents.ros import Event

# Define the Success Event
event_task_success = Event(
    referee_verdict.msg.data.contains("YES")  # the topic, attribute and value to check in it
)
```

Finally, we attach this event to the VLA using the `set_termination_trigger` method. We set the mode to `event`.

```python
# Tell the VLA to stop immediately when the event fires
player.set_termination_trigger(
    mode="event",
    stop_event=event_task_success,
    max_timesteps=500 # Fallback: stop if 500 steps pass without success
)
```

```{seealso}
Events are a very powerful concept in EMOS. You can get infinitely creative with them. For example, imagine setting off the VLA component with a voice command. This can be done by combining the output of a SpeechToText component and an Event that generates an action command. To learn more about them check out the recipes for [Events & Actions](../events-and-resilience/event-driven-cognition.md).
```

## Launching the System

When we launch this graph:

- The **VLA** starts moving the robot to pick the orange.
- The **VLM** simultaneously watches the feed.
- Once the oranges are in the bowl, the VLM outputs "YES".
- The **Event** system catches this, interrupts the VLA, and signals that the task is complete.

```python
from agents.ros import Launcher

launcher = Launcher()
launcher.add_pkg(components=[player, referee])
launcher.bringup()
```

You can send the action command to the VLA as defined in the previous [VLA Manipulation](vla-manipulation.md) recipe.

## Complete Code

```{code-block} python
:caption: Closed-Loop VLA with VLM Verifier
:linenos:

from agents.components import VLA, VLM
from agents.config import VLAConfig
from agents.clients import LeRobotClient, OllamaClient
from agents.models import LeRobotPolicy, OllamaModel
from agents.ros import Topic, Launcher, FixedInput
from agents.ros import Event

# --- Define Topics ---
state = Topic(name="/isaac_joint_states", msg_type="JointState")
camera1 = Topic(name="/front_camera/image_raw", msg_type="Image")
camera2 = Topic(name="/wrist_camera/image_raw", msg_type="Image")
joints_action = Topic(name="/isaac_joint_command", msg_type="JointState")
referee_verdict = Topic(name="/referee/verdict", msg_type="String")

# --- Setup The Player (VLA) ---
policy = LeRobotPolicy(
    name="my_policy",
    policy_type="smolvla",
    checkpoint="aleph-ra/smolvla_finetune_pick_orange_20000",
    dataset_info_file="https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange/resolve/main/meta/info.json",
)
vla_client = LeRobotClient(model=policy)

# VLA Config (Mappings assumed defined as per previous tutorial)
# joints_map = { ... }
# camera_map = { ... }

config = VLAConfig(
    observation_sending_rate=5,
    action_sending_rate=5,
    joint_names_map=joints_map,
    camera_inputs_map=camera_map,
    robot_urdf_file="./so101_new_calib.urdf"
)

player = VLA(
    inputs=[state, camera1, camera2],
    outputs=[joints_action],
    model_client=vla_client,
    config=config,
    component_name="vla_player",
)

# --- Setup The Referee (VLM) ---
qwen_vl = OllamaModel(name="qwen_vl", checkpoint="qwen2.5vl:7b")
qwen_client = OllamaClient(model=qwen_vl)

# A static prompt for the VLM
question = FixedInput(
    name="prompt",
    msg_type="String",
    fixed="Look at the image. Are all the orange in the bowl? Answer only with YES or NO."
)

referee = VLM(
    inputs=[question, camera1],
    outputs=[referee_verdict],
    model_client=qwen_client,
    trigger=camera1,
    component_name="vlm_referee"
)

# --- Define the Logic (Event) ---
# Create an event that looks for "YES" in the VLM's output
event_task_success = Event(
    referee_verdict.msg.data.contains("YES")  # the topic, attribute and value to check in it
)

# Link the event to the VLA's stop mechanism
player.set_termination_trigger(
    mode="event",
    stop_event=event_success,
    max_timesteps=400 # Failsafe
)

# --- Launch ---
launcher = Launcher()
launcher.add_pkg(components=[player, referee])
launcher.bringup()
```