agents.components.mllm¶
Module Contents¶
Classes¶
This component utilizes multi-modal large language models (e.g. Llava) that can be used to process text and image data. |
API¶
- class agents.components.mllm.MLLM(*, inputs: List[Union[agents.ros.Topic, agents.ros.FixedInput]], outputs: List[agents.ros.Topic], model_client: agents.clients.model_base.ModelClient, config: Optional[agents.config.MLLMConfig] = None, db_client: Optional[agents.clients.db_base.DBClient] = None, trigger: Union[agents.ros.Topic, List[agents.ros.Topic], float, agents.ros.Event] = 1.0, component_name: str, **kwargs)¶
Bases:
agents.components.llm.LLMThis component utilizes multi-modal large language models (e.g. Llava) that can be used to process text and image data.
- Parameters:
inputs (list[Topic | FixedInput]) – The input topics or fixed inputs for the MLLM component. This should be a list of Topic objects or FixedInput instances, limited to String and Image types.
outputs (list[Topic]) – The output topics for the MLLM component. This should be a list of Topic objects. String, Detections2D and PointsOfInterest2D types is handled automatically.
model_client (ModelClient) – The model client for the MLLM component. This should be an instance of ModelClient.
config (MLLMConfig) – Optional configuration for the MLLM component. This should be an instance of MLLMConfig. If not provided, defaults to MLLMConfig().
trigger (Union[Topic, list[Topic], float]) – The trigger value or topic for the MLLM component. This can be a single Topic object, a list of Topic objects, or a float value for a timed component. Defaults to 1.
component_name (str) – The name of the MLLM component. This should be a string and defaults to “mllm_component”.
Example usage:
text0 = Topic(name="text0", msg_type="String") image0 = Topic(name="image0", msg_type="Image") text0 = Topic(name="text1", msg_type="String") config = MLLMConfig() model = TransformersMLLM(name='idefics') model_client = ModelClient(model=model) mllm_component = MLLM(inputs=[text0, image0], outputs=[text1], model_client=model_client, config=config, component_name='mllm_component')
- set_task(task: Literal[general, pointing, affordance, trajectory, grounding]) None¶
Set a task for the MLLM component. This is useful when using a multimodal LLM model that has been trained on specific tasks. This method can be invoked as an action, in response to an event, to change the task at runtime. For an example checkout RoboBrain2.0, available on RoboML.
- Parameters:
task – A task that is one of the following “general”, “pointing”, “affordance”, “trajectory”, “grounding”.