From 23aa19e1b04e960cda0453ccc7ea1c9152c072a6 Mon Sep 17 00:00:00 2001 From: AmitChaubey Date: Fri, 12 Jun 2026 16:47:57 +0100 Subject: [PATCH 1/2] docs: add OpenAI-compatible gateway + vLLM tool-calling guide Signed-off-by: AmitChaubey --- docs/user-guide/README.md | 9 ++ .../openai-compatible-gateway-vllm.md | 129 ++++++++++++++++++ examples/modelconfig-openai-gateway-vllm.yaml | 49 +++++++ 3 files changed, 187 insertions(+) create mode 100644 docs/user-guide/README.md create mode 100644 docs/user-guide/openai-compatible-gateway-vllm.md create mode 100644 examples/modelconfig-openai-gateway-vllm.yaml diff --git a/docs/user-guide/README.md b/docs/user-guide/README.md new file mode 100644 index 0000000000..b50790649c --- /dev/null +++ b/docs/user-guide/README.md @@ -0,0 +1,9 @@ +# User Guide + +End-to-end guides for common kagent deployment patterns. + +| Guide | Description | +| --- | --- | +| [OpenAI-compatible gateway with vLLM](openai-compatible-gateway-vllm.md) | Wire kagent to Bifrost, LiteLLM, or another OpenAI-compatible gateway fronting self-hosted vLLM | + +For provider-specific setup published on [kagent.dev](https://kagent.dev/docs/kagent/supported-providers), see the [BYO OpenAI-compatible model](https://kagent.dev/docs/kagent/supported-providers/byo-openai) page. diff --git a/docs/user-guide/openai-compatible-gateway-vllm.md b/docs/user-guide/openai-compatible-gateway-vllm.md new file mode 100644 index 0000000000..89d32708dd --- /dev/null +++ b/docs/user-guide/openai-compatible-gateway-vllm.md @@ -0,0 +1,129 @@ +# OpenAI-compatible gateway with self-hosted vLLM + +This guide covers a common self-hosted pattern that is not spelled out in the hosted-provider docs: + +``` +kagent agent → OpenAI-compatible gateway (Bifrost, LiteLLM, …) → vLLM +``` + +kagent treats the gateway as an OpenAI provider: configure a `ModelConfig` with `provider: OpenAI` and a custom `openAI.baseUrl`. The gateway routes requests to vLLM using its own model identifiers. + +For the general BYO OpenAI-compatible setup (secrets, TLS, Cohere example), see [BYO OpenAI-compatible model](https://kagent.dev/docs/kagent/supported-providers/byo-openai) on kagent.dev. + +## Prerequisites + +- kagent installed and running +- An OpenAI-compatible gateway reachable from agent pods (for example [Bifrost](https://github.com/maximhq/bifrost) or [LiteLLM](https://docs.litellm.ai/)) +- vLLM serving a model behind that gateway +- A Kubernetes namespace for your `ModelConfig` and agent (for example `kagent`) + +## 1. Launch vLLM with tool calling enabled + +kagent's declarative Python runtime always registers at least one built-in tool (`ask_user`) on every agent — even when you configure no tools yourself. As a result, every chat completion request kagent sends includes a non-empty `tools` array with `tool_choice: "auto"`. The vLLM backend **must** support automatic tool choice or every agent turn fails, including on agents with no user-configured tools. + +Start vLLM with at least: + +```bash +vllm serve Qwen/Qwen2.5-7B-Instruct \ + --enable-auto-tool-choice \ + --tool-call-parser hermes +``` + +Notes: + +- `--enable-auto-tool-choice` is required for `tool_choice: "auto"`. +- `--tool-call-parser` must match your model family. The [vLLM tool calling docs](https://docs.vllm.ai/en/latest/features/tool_calling.html) are the source of truth for parser names — at the time of writing, Qwen2.5 uses `hermes` and Llama 3.1 uses `llama3_json`, but parsers are added and renamed across vLLM releases. Always check the docs for your exact model and vLLM version. + +Register the model in your gateway using the identifier your gateway expects (often provider-prefixed). Example LiteLLM-style name: `vllm/Qwen/Qwen2.5-7B-Instruct`. + +## 2. Point the gateway at vLLM + +Configure Bifrost, LiteLLM, or your gateway so the model name above routes to the vLLM OpenAI endpoint (typically `http://:8000/v1`). + +Keep note of: + +- Gateway base URL — the port depends on your gateway (LiteLLM defaults to `4000`, Bifrost to `8080`) and must include the `/v1` path if your gateway exposes the OpenAI API there +- Gateway API key (if required) +- **Gateway model ID** — the string you pass in chat completion `model` requests + +## 3. Create a ModelConfig in kagent + +Use the gateway model ID in `spec.model`, not necessarily the bare Hugging Face name vLLM serves internally. + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: gateway-api-key + namespace: kagent +type: Opaque +stringData: + key: sk-gateway-example # replace with your gateway key, or any placeholder if auth is disabled +--- +apiVersion: kagent.dev/v1alpha2 +kind: ModelConfig +metadata: + name: qwen-vllm-via-gateway + namespace: kagent +spec: + provider: OpenAI + # Use the gateway routing ID, not the raw vLLM model name. + model: vllm/Qwen/Qwen2.5-7B-Instruct + apiKeySecret: gateway-api-key + apiKeySecretKey: key + openAI: + # Adjust host and port for your gateway (LiteLLM defaults to 4000, Bifrost to 8080) + baseUrl: http://litellm.kagent.svc.cluster.local:4000/v1 +``` + +| Field | What to set | +| --- | --- | +| `provider` | Always `OpenAI` for OpenAI-compatible gateways | +| `model` | Gateway routing identifier (for example `vllm/Qwen/...`) | +| `openAI.baseUrl` | Gateway OpenAI API base URL | +| `apiKeySecret` / `apiKeySecretKey` | Secret holding the gateway API key | + +Reference the `ModelConfig` from your agent via `spec.declarative.modelConfig`. The `ModelConfig` must live in the **same namespace** as the agent. + +## 4. Verify + +1. Confirm the gateway can reach vLLM (`curl` the gateway `/v1/models` or equivalent). +2. Create or update an agent that references the `ModelConfig`. +3. Open the agent in the kagent UI and send a message that should invoke a tool. + +If tool calling works, the agent completes turns normally. If vLLM was started without tool-calling flags, requests fail before useful output appears. + +## Troubleshooting + +### `provider API error (status 400)` on every agent message + +**Likely cause:** vLLM was started without `--enable-auto-tool-choice` and a matching `--tool-call-parser`. + +kagent always sends a `tools` array with `tool_choice: "auto"` — the runtime injects a built-in `ask_user` tool on every declarative agent, so this happens even when the agent has no tools configured. vLLM rejects that request shape unless auto tool choice is enabled at startup, and the error surfaces in kagent as a generic 400 with no hint of the cause. + +**Fix:** Restart vLLM with the flags from step 1, then retry. + +### Model not found / empty model dropdown in the UI + +**Likely cause:** `ModelConfig` namespace does not match the agent namespace, or `spec.model` does not match any name the gateway knows. + +**Fix:** + +- Create the `ModelConfig` in the same namespace as the agent. +- Set `spec.model` to the gateway's routing ID (check the gateway config or `GET /v1/models`). + +### Connection errors to the gateway + +- Ensure `openAI.baseUrl` is reachable from agent pods (cluster DNS, correct port, `/v1` suffix). +- For TLS with a private CA, see [TLS configuration](https://kagent.dev/docs/kagent/supported-providers/byo-openai#tls-configuration) on the BYO OpenAI page. + +## Example manifest + +See [`examples/modelconfig-openai-gateway-vllm.yaml`](../../examples/modelconfig-openai-gateway-vllm.yaml) for a copy-paste starting point. + +## Related + +- [BYO OpenAI-compatible model](https://kagent.dev/docs/kagent/supported-providers/byo-openai) — secrets, base URL, TLS +- [ModelConfig TLS examples](../../examples/modelconfig-with-tls.yaml) — LiteLLM with custom CA certificates +- [vLLM tool calling](https://docs.vllm.ai/en/latest/features/tool_calling.html) — parser flags by model family +- [Issue #959](https://github.com/kagent-dev/kagent/issues/959) — native vLLM `ModelConfig` provider (future) diff --git a/examples/modelconfig-openai-gateway-vllm.yaml b/examples/modelconfig-openai-gateway-vllm.yaml new file mode 100644 index 0000000000..933cd228bb --- /dev/null +++ b/examples/modelconfig-openai-gateway-vllm.yaml @@ -0,0 +1,49 @@ +# OpenAI-compatible gateway (Bifrost / LiteLLM) fronting self-hosted vLLM +# +# End-to-end guide: docs/user-guide/openai-compatible-gateway-vllm.md +# +# vLLM must be started with tool calling enabled, for example: +# vllm serve Qwen/Qwen2.5-7B-Instruct \ +# --enable-auto-tool-choice \ +# --tool-call-parser hermes +# +# Use the gateway routing model ID in spec.model (often provider-prefixed), +# not necessarily the bare Hugging Face name vLLM serves internally. + +--- +apiVersion: v1 +kind: Secret +metadata: + name: gateway-api-key + namespace: kagent +type: Opaque +stringData: + key: sk-gateway-example # replace with your gateway API key + +--- +apiVersion: kagent.dev/v1alpha2 +kind: ModelConfig +metadata: + name: qwen-vllm-via-gateway + namespace: kagent +spec: + provider: OpenAI + model: vllm/Qwen/Qwen2.5-7B-Instruct + apiKeySecret: gateway-api-key + apiKeySecretKey: key + openAI: + baseUrl: http://litellm.kagent.svc.cluster.local:4000/v1 + +--- +# Example agent referencing the ModelConfig +apiVersion: kagent.dev/v1alpha2 +kind: Agent +metadata: + name: qwen-assistant + namespace: kagent +spec: + type: Declarative + description: Assistant backed by vLLM through an OpenAI-compatible gateway + declarative: + systemMessage: You are a helpful assistant. + modelConfig: qwen-vllm-via-gateway From 75d213ba9796ff724daeda58ea35e44c46a67c2c Mon Sep 17 00:00:00 2001 From: AmitChaubey Date: Fri, 12 Jun 2026 17:04:16 +0100 Subject: [PATCH 2/2] docs: use unmistakable API key placeholder in gateway examples Signed-off-by: AmitChaubey --- docs/user-guide/openai-compatible-gateway-vllm.md | 4 ++-- examples/modelconfig-openai-gateway-vllm.yaml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/user-guide/openai-compatible-gateway-vllm.md b/docs/user-guide/openai-compatible-gateway-vllm.md index 89d32708dd..20ceca84e6 100644 --- a/docs/user-guide/openai-compatible-gateway-vllm.md +++ b/docs/user-guide/openai-compatible-gateway-vllm.md @@ -58,7 +58,7 @@ metadata: namespace: kagent type: Opaque stringData: - key: sk-gateway-example # replace with your gateway key, or any placeholder if auth is disabled + key: REPLACE_WITH_GATEWAY_API_KEY # replace with your gateway key, or any placeholder if auth is disabled --- apiVersion: kagent.dev/v1alpha2 kind: ModelConfig @@ -126,4 +126,4 @@ See [`examples/modelconfig-openai-gateway-vllm.yaml`](../../examples/modelconfig - [BYO OpenAI-compatible model](https://kagent.dev/docs/kagent/supported-providers/byo-openai) — secrets, base URL, TLS - [ModelConfig TLS examples](../../examples/modelconfig-with-tls.yaml) — LiteLLM with custom CA certificates - [vLLM tool calling](https://docs.vllm.ai/en/latest/features/tool_calling.html) — parser flags by model family -- [Issue #959](https://github.com/kagent-dev/kagent/issues/959) — native vLLM `ModelConfig` provider (future) +- [Issue #959](https://github.com/kagent-dev/kagent/issues/959) — a native vLLM `ModelConfig` provider was proposed and closed as not planned; the OpenAI-compatible gateway pattern in this guide is the supported approach diff --git a/examples/modelconfig-openai-gateway-vllm.yaml b/examples/modelconfig-openai-gateway-vllm.yaml index 933cd228bb..3be805f2ec 100644 --- a/examples/modelconfig-openai-gateway-vllm.yaml +++ b/examples/modelconfig-openai-gateway-vllm.yaml @@ -18,7 +18,7 @@ metadata: namespace: kagent type: Opaque stringData: - key: sk-gateway-example # replace with your gateway API key + key: REPLACE_WITH_GATEWAY_API_KEY # not a real key; replace before applying --- apiVersion: kagent.dev/v1alpha2