# Inference Clients

Clients are execution backends that instantiate and call inference on ML models. Certain components in EmbodiedAgents deal with ML models, vector databases, or both. These components take in a model client or DB client as one of their initialization parameters. The reason for this abstraction is to enforce _separation of concerns_. Whether an ML model is running on the edge hardware, on a powerful compute node in the network, or in the cloud, the components running on the robot edge can always use the model (or DB) via a client in a standardized way.

This approach makes components independent of the model serving platforms, which may implement various inference optimizations depending on the model type. As a result, developers can choose an ML serving platform that offers the best latency/accuracy tradeoff based on the application's requirements.

All clients implement a connection check. ML clients must implement inference methods, and optionally model initialization and deinitialization methods. This supports scenarios where an embodied agent dynamically switches between models or fine-tuned versions based on environmental events. Similarly, vector DB clients implement standard CRUD methods tailored to vector databases.

EmbodiedAgents provides the following clients, designed to cover the most popular open-source model deployment platforms. Creating simple clients for other platforms is straightforward.

```{note}
Some clients may require additional dependencies, which are detailed in the table below. If these are not installed, users will be prompted at runtime.
```

```{list-table}
:widths: 20 20 60
:header-rows: 1
* - Platform
  - Client
  - Description

* - **Generic**
  - GenericHTTPClient
  - A generic client for interacting with OpenAI-compatible APIs, including vLLM, ms-swift, lmdeploy, Google Gemini, etc. Supports both standard and streaming responses, and works with LLMs and multimodal LLMs. Designed to be compatible with any API following the OpenAI standard. Supports tool calling.

* - **RoboML**
  - RoboMLHTTPClient
  - An HTTP client for interacting with ML models served on [RoboML](https://github.com/automatika-robotics/roboml). Supports streaming outputs.

* - **RoboML**
  - RoboMLWSClient
  - A WebSocket-based client for persistent interaction with [RoboML](https://github.com/automatika-robotics/roboml)-hosted ML models. Particularly useful for low-latency streaming of audio or text data.

* - **RoboML**
  - RoboMLRESPClient
  - A Redis Serialization Protocol (RESP) based client for ML models served via [RoboML](https://github.com/automatika-robotics/roboml).
    Requires `pip install redis[hiredis]`.

* - **Ollama**
  - OllamaClient
  - An HTTP client for interacting with ML models served on [Ollama](https://ollama.com). Supports LLMs/MLLMs and embedding models. Supports tool calling.
    Requires `pip install ollama`.

* - **LeRobot**
  - LeRobotClient
  - A gRPC-based asynchronous client for vision-language-action (VLA) policies served on LeRobot Policy Server. Supports various robot action policies available in the LeRobot package by HuggingFace.
    Requires:
    `pip install grpcio`
    `pip install torch --index-url https://download.pytorch.org/whl/cpu`

* - **ChromaDB**
  - ChromaClient
  - An HTTP client for interacting with a ChromaDB instance running as a server.
    Ensure that a ChromaDB server is active using:
    `pip install chromadb`
    `chroma run --path /db_path`
```