# Event-Driven Cognition Robots process a massive amount of sensory data. Running a large Vision Language Model (VLM) on every single video frame to ask "What is happening?", while possible with smaller models, is in fact computationally expensive and redundant. In this tutorial, we will use the **Event-Driven** nature of EMOS to create a smart "Reflex-Cognition" loop. We will use a lightweight detector to monitor the scene efficiently (the Reflex), and only when a specific object (a person) is found, we will trigger a larger VLM to describe them (the Cognition). One can imagine that this description can be used for logging robot's observations or parsed for triggering further actions downstream. ## The Strategy: Reflex and Cognition 1. **Reflex (Vision Component):** A fast, lightweight object detector runs on every frame. It acts as a gatekeeper. 2. **Event (The Trigger):** We define a smart event that fires only when the detector finds a "person" (and hasn't seen one recently). 3. **Cognition (VLM Component):** A more powerful VLM wakes up only when triggered by the event to describe the scene. ### 1. The Reflex: Vision Component First, we set up the `Vision` component. This component is designed to be lightweight. By enabling the local classifier, we can run a small optimized model contained within the component, directly on the edge. ```python from agents.components import Vision from agents.config import VisionConfig from agents.ros import Topic # Define Topics camera_image = Topic(name="/image_raw", msg_type="Image") detections = Topic(name="/detections", msg_type="Detections") # Output of Vision # Setup the Vision Component (The Trigger) # We use a lower threshold to ensure we catch people easily and we use a small embedded model vision_config = VisionConfig(threshold=0.6, enable_local_classifier=True) vision_detector = Vision( inputs=[camera_image], outputs=[detections], trigger=camera_image, # Runs on every frame config=vision_config, component_name="eye_detector", ) ``` The `trigger=camera_image` argument tells this component to process every single message that arrives on the `/image_raw` topic. ### 2. The Trigger: Smart Events Now, we need to bridge the gap between detection and description. We don't want the VLM to fire 30 times a second just because a person is standing in the frame. We use `Event` with `on_change` mode. This event type is perfect for state changes. It monitors a list inside a message (in this case, the `labels` list of the detections). ```python from agents.ros import Event # Define the Event # This event listens to the 'detections' topic. # It triggers ONLY if the "labels" list inside the message contains "person" # after not containing a person (within a 5 second interval). event_person_detected = Event( detections.msg.labels.contains_any(["person"]), on_change=True, # Trigger only when a change has occurred to stop repeat triggering keep_event_delay=5, # A delay in seconds ) ``` ```{note} **`keep_event_delay=5`**: This is a debouncing mechanism. It ensures that once the event triggers, it won't trigger again for at least 5 seconds, even if the person remains in the frame. This prevents our VLM from being flooded with requests and can be quite useful to prevent jittery detections, which are common especially for mobile robots. ``` ```{seealso} Events can be used to create arbitrarily complex agent graphs. Check out all the events available in the [Events](../../concepts/events-and-actions.md) documentation. ``` ### 3. The Cognition: VLM Component Finally, we set up the heavy lifter. We will use a `VLM` component powered by **Qwen-VL** running on Ollama. Crucially, this component does **not** have a topic trigger like the vision detector. Instead, it is triggered by `event_person_detected`. We also need to tell the VLM _what_ to do when it wakes up. Since there is no user typing a question, we inject a `FixedInput`, a static prompt that acts as a standing order. ```python from agents.components import VLM from agents.clients import OllamaClient from agents.models import OllamaModel from agents.ros import FixedInput description_output = Topic(name="/description", msg_type="String") # Output of VLM # Setup a model client for the component qwen_vl = OllamaModel(name="qwen_vl", checkpoint="qwen2.5vl:7b") ollama_client = OllamaClient(model=qwen_vl) # We define a fixed prompt that is injected whenever the component runs. fixed_prompt = FixedInput( name="prompt", msg_type="String", fixed="A person has been detected. Describe their appearance briefly.", ) visual_describer = VLM( inputs=[fixed_prompt, camera_image], # Takes the fixed prompt + current image outputs=[description_output], model_client=ollama_client, trigger=event_person_detected, # CRITICAL: Only runs when the event fires component_name="visual_describer", ) ``` ## Launching the Application We combine everything into a launcher. ```python from agents.ros import Launcher # Launch launcher = Launcher() launcher.add_pkg( components=[vision_detector, visual_describer], multiprocessing=True, package_name="automatika_embodied_agents", ) launcher.bringup() ``` ## See the results in the UI We can see this recipe in action if we enable the UI. We can do so by simply adding the following line in the launcher. ```python launcher.enable_ui(outputs=[camera_image, detections, description_output]) ``` ````{note} In order to run the client you will need to install [FastHTML](https://www.fastht.ml/) and [MonsterUI](https://github.com/AnswerDotAI/MonsterUI) with ```shell pip install python-fasthtml monsterui ```` The client displays a web UI on **http://localhost:5001** if you have run it on your machine. Or you can access it at **http://:5001** if you have run it on the robot. ### Complete Code Here is the complete recipe for the Event-Driven Cognition agent: ```{code-block} python :caption: Event-Driven Cognition :linenos: from agents.components import Vision, VLM from agents.config import VisionConfig from agents.clients import OllamaClient from agents.models import OllamaModel from agents.ros import Launcher, Topic, FixedInput, Event # Define Topics camera_image = Topic(name="/image_raw", msg_type="Image") detections = Topic(name="/detections", msg_type="Detections") # Output of Vision description_output = Topic(name="/description", msg_type="String") # Output of VLM # Setup the Vision Component (The Trigger) # We use a lower threshold to ensure we catch people easily and we use a small local model vision_config = VisionConfig(threshold=0.6, enable_local_classifier=True) vision_detector = Vision( inputs=[camera_image], outputs=[detections], trigger=camera_image, # Runs on every frame config=vision_config, component_name="eye_detector", ) # Define the Event # This event listens to the 'detections' topic. # It triggers ONLY if the "labels" list inside the message contains "person" # after not containing a person (within a 5 second interval). event_person_detected = Event( detections.msg.labels.contains_any(["person"]), on_change=True, # Trigger only when a change has occurred to stop repeat triggering keep_event_delay=5, # A delay in seconds ) # Setup the VLM Component (The Responder) # This component does NOT run continuously. It waits for the event. # Setup a model client for the component qwen_vl = OllamaModel(name="qwen_vl", checkpoint="qwen2.5vl:7b") ollama_client = OllamaClient(model=qwen_vl) # We define a fixed prompt that is injected whenever the component runs. fixed_prompt = FixedInput( name="prompt", msg_type="String", fixed="A person has been detected. Describe their appearance briefly.", ) visual_describer = VLM( inputs=[fixed_prompt, camera_image], # Takes the fixed prompt + current image outputs=[description_output], model_client=ollama_client, trigger=event_person_detected, # CRITICAL: Only runs when the event fires component_name="visual_describer", ) # Launch launcher = Launcher() launcher.enable_ui(outputs=[camera_image, detections, description_output]) launcher.add_pkg( components=[vision_detector, visual_describer], multiprocessing=True, package_name="automatika_embodied_agents", ) launcher.bringup() ```