Feat/llm verify endpoint by chuckbutkus · Pull Request #3410 · OpenHands/software-agent-sdk

chuckbutkus · 2026-05-27T20:01:15Z

A human has tested these changes.

Why

The agent-server needs a way to let a client (e.g. a "Verify credentials"
button in the GUI) confirm an LLM configuration is valid before it is
saved or used to start a conversation. Verifying client-side is brittle:

CORS blocks many providers from the browser.
Bedrock SigV4 / Azure api_version require provider-specific request
shapes the client would have to re-implement.
The client wouldn't be exercising the same code path the agent uses at
conversation time, so "verified in the browser" wouldn't reliably mean
"the agent will be able to use this".

This PR adds a small, structured probe — at both the SDK and HTTP layer —
that fixes all three problems.

Summary

SDK: new LLM.verify() / LLM.averify() send a single 1-token probe
through the existing completion code path and surface the first provider
error as a typed SDK exception via map_provider_exception. Retry and
fallback_strategy are deliberately bypassed so verify fails fast on the
model the caller actually intends to save.
agent-server: new POST /api/llm/verify accepts an LLM body and
returns a discriminated VerifyLLMResponse (success, auth_error,
rate_limited, timeout, unreachable, bad_request, unknown_error).
Always returns HTTP 200 — clients branch on status, not transport
errors — so the UI has a single decision tree. The probe is hard-capped
at 30 s (vs the SDK's 300 s default) and the inferred LiteLLM provider
name is attributed even on failure so the UI can render "Anthropic auth
failed" rather than an unattributed error.
Hardening: error messages are scrubbed of any SecretStr values
present on the LLM (api_key, aws_access_key_id,
aws_secret_access_key, aws_session_token) and truncated to 512 chars
to defend against providers that echo credentials back or return large
HTML error pages. A small fix in api.py runs the 422 sanitizer output
through jsonable_encoder so Pydantic's non-JSON-serializable
ctx['error'] (raised by LLM's @model_validator(mode='before'))
doesn't itself turn a 422 into a 500.

Issue Number

None.

How to Test

Start the agent server locally:

uv run uvicorn openhands.agent_server.api:app --reload

Send a verify request with real credentials:

curl -sS -X POST http://localhost:8000/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"model": "openai/gpt-4o-mini", "api_key": "'"$OPENAI_API_KEY"'"}'
# → {"status":"success","message":null,"provider":"openai"}

Confirm each failure mode round-trips as HTTP 200 + a typed status:

# invalid key → status="auth_error"
curl -sS -X POST http://localhost:8000/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"model": "openai/gpt-4o-mini", "api_key": "sk-bogus"}'

# unknown model → status="bad_request"
curl -sS -X POST http://localhost:8000/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"model": "openai/does-not-exist", "api_key": "'"$OPENAI_API_KEY"'"}'

# missing model → HTTP 422 (FastAPI validation), not a verify outcome
curl -sS -X POST http://localhost:8000/api/llm/verify \
  -H 'Content-Type: application/json' -d '{}'

Run the new automated coverage:

uv run pytest tests/sdk/llm/test_llm_verify.py \
              tests/agent_server/test_llm_router.py \
              tests/agent_server/test_validation_error_sanitization.py

Video/Screenshots

N/A — backend-only endpoint; behavior is exercised by the curl examples
and tests above.

Type

Notes

Auth posture: POST /api/llm/verify is mounted under the /api/*
group and inherits the existing X-Session-API-Key header dependency,
so it is not anonymously reachable in deployments that configure session
keys. Credentials are caller-supplied per request — the agent server
never carries a shared upstream key on this path.
Telemetry intentionally suppressed: verify probes do not fire
on_request / on_response callbacks; they are not agent steps and
should not contribute to spend tracking, prompt logs, or user-facing
metrics.
Why no retry / fallback on verify: retrying would delay an inevitable
auth failure; falling back to a different LLM would silently report
"success" for a model that isn't the one being saved. Both defeat the
point of "verify before save".

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:0310711-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-0310711-python \
  ghcr.io/openhands/agent-server:0310711-python

All tags pushed for this build

ghcr.io/openhands/agent-server:0310711-golang-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang-amd64
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:0310711-golang-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang-arm64
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:0310711-java-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java-amd64
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:0310711-java-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java-arm64
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:0310711-python-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python-amd64
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:0310711-python-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python-arm64
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:0310711-golang
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:0310711-java
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:0310711-python
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 0310711-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 0310711-python-amd64) are also available if needed

Sends a minimal one-token completion to confirm credentials and connectivity work end-to-end. Reuses _prepare_completion_params so the verify call path exercises the same provider-aware option normalization (Azure max_tokens rename, OpenRouter headers, extended-thinking budgets, etc.) as a real completion, but deliberately bypasses both the retry decorator and the fallback strategy so the first real provider error is surfaced as a typed SDK exception (LLMAuthenticationError, LLMRateLimitError, LLMTimeoutError, LLMServiceUnavailableError, LLMBadRequestError) via map_provider_exception. Why bypass retry/fallback for verify: - Retry: verify should fail fast. Retrying a 401 wastes the user's budget and delays the inevitable auth error. - Fallback: if the user is verifying model X, they want to know if X works. Silently substituting model Y would defeat the entire point of 'verify before save'. The companion agent-server endpoint POST /llm/verify is added in a separate commit. Co-authored-by: openhands <openhands@all-hands.dev>

Exposes LLM.averify over HTTP so clients (notably the agent-canvas frontend) can delegate credential verification to the agent server rather than reimplementing provider-specific routing in the browser. Wire shape: VerifyLLMRequest accepts the LLM fields the settings UI exposes (model, api_key, base_url, api_version, plus the AWS fields for Bedrock). VerifyLLMResponse carries a discriminated status and an optional human-readable message + inferred provider name. Status values: - success: probe succeeded; credentials work. - auth_error: credentials rejected; client should block save. - rate_limited: provider returned 429; credentials are valid. - timeout: request timed out; warn but allow save. - unreachable: network/DNS/5xx; warn but allow save. - bad_request: credentials valid but config (e.g. model name) is wrong. - unknown_error: catch-all so the endpoint never raises. The endpoint always returns HTTP 200 — clients branch on the status field rather than wrapping the request in HTTP error handling. This keeps the frontend state machine to a single switch statement. Why move verify server-side: - All LiteLLM-supported providers become verifiable, including Bedrock with SigV4/IAM and Azure with api_version (both impossible from a browser today). - The verify call path is identical to the conversation-time call path, so 'verified' really means 'the agent will be able to use this'. - No CORS proxy / browser-side provider routing to maintain as LiteLLM adds providers. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-27T20:01:51Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-27T20:02:02Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot · 2026-05-27T20:02:53Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

✅ QA Report: PASS

Verified the new LLM verify API and SDK probe end-to-end with a live agent-server and a local OpenAI-compatible provider.

Does this PR achieve its stated goal?

Yes. The PR adds the missing LLM verification capability: on the base branch, POST /api/llm/verify returned 404 and LLM had no verify() method; on the PR commit, the same real HTTP path returned status=success, invalid credentials returned structured status=auth_error with HTTP 200, and both LLM.verify() and LLM.averify() completed successfully. The local provider logs confirm the agent server made real LiteLLM chat-completion probes using the unmasked API key and max_completion_tokens: 1.

Phase	Result
Environment Setup	✅ `make build` completed and installed the repo packages.
CI Status	⏳ Observed 13 passing, 0 failing, 13 pending checks at QA time.
Functional Verification	✅ Base-vs-PR behavior verified through actual HTTP and SDK calls.

Functional Verification

Test 1: New `/api/llm/verify` endpoint exists and performs a real provider probe

Step 1 — Establish baseline without the PR:
Checked out origin/main, synced dependencies, started the real agent server, then ran:

curl -sS -i -X POST http://127.0.0.1:18080/api/llm/verify   -H 'Content-Type: application/json'   -d '{"model":"openai/qa-model","api_key":"sk-valid","base_url":"http://127.0.0.1:18081/v1"}'

Observed:

HTTP/1.1 404 Not Found
content-type: application/json

{"detail":"Not Found"}

This confirms the endpoint did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out a326ca2bb5ae9b4ce2e3a3a79a7e4ae46bb22b01, synced dependencies, started the agent server, and started a local OpenAI-compatible stub at http://127.0.0.1:18081/v1/chat/completions.

Step 3 — Re-run with the PR in place:
Ran the same request with valid credentials:

HTTP/1.1 200 OK
content-type: application/json

{"status":"success","message":null,"provider":"openai"}

Then ran the same request with an invalid key:

HTTP/1.1 200 OK
content-type: application/json

{"status":"auth_error","message":"litellm.BadRequestError: OpenAIException - invalid api key","provider":"openai"}

The mock provider observed the actual requests made by the server:

{"path":"/v1/chat/completions","authorization":"Bearer sk-valid","payload":{"messages":[{"content":[{"type":"text","text":"hi"}],"role":"user"}],"model":"qa-model","max_completion_tokens":1}}
{"path":"/v1/chat/completions","authorization":"Bearer sk-invalid","payload":{"messages":[{"content":[{"type":"text","text":"hi"}],"role":"user"}],"model":"qa-model","max_completion_tokens":1}}

This shows the endpoint works end-to-end through the real agent-server → SDK → LiteLLM → provider HTTP path, including success and structured auth failure behavior.

Test 2: SDK `LLM.verify()` / `LLM.averify()` are available and exercise the provider path

Step 1 — Establish baseline without the PR:
On origin/main, ran:

from pydantic import SecretStr
from openhands.sdk.llm import LLM
llm = LLM(model='openai/qa-model', api_key=SecretStr('sk-valid'), base_url='http://127.0.0.1:18081/v1')
print('has_verify=', hasattr(llm, 'verify'))
try:
    llm.verify()
except Exception as exc:
    print(type(exc).__name__ + ':', str(exc)[:160])

Observed:

has_verify= False
AttributeError: 'LLM' object has no attribute 'verify'

This confirms the SDK method did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out a326ca2bb5ae9b4ce2e3a3a79a7e4ae46bb22b01 with the same local provider stub running.

Step 3 — Re-run with the PR in place:
Ran:

import asyncio
from pydantic import SecretStr
from openhands.sdk.llm import LLM

llm = LLM(model='openai/qa-model', api_key=SecretStr('sk-valid'), base_url='http://127.0.0.1:18081/v1')
print('has_verify=', hasattr(llm, 'verify'))
print('sync_verify_result=', llm.verify())
print('async_verify_result=', asyncio.run(llm.averify()))

Observed:

has_verify= True
sync_verify_result= None
async_verify_result= None

This confirms both SDK entry points are usable and complete successfully against a real OpenAI-compatible HTTP provider.

Test 3: Request validation remains a normal API contract error

With the PR server running, submitted an empty request body:

curl -sS -i -X POST http://127.0.0.1:18080/api/llm/verify   -H 'Content-Type: application/json'   -d '{}'

Observed:

HTTP/1.1 422 Unprocessable Content
content-type: application/json

{"detail":[{"type":"missing","loc":["body","model"],"msg":"Field required","input":{}}]}

This confirms malformed client input still fails as an API contract error instead of being confused with provider verification status.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Code Review: `POST /llm/verify` endpoint

Solid implementation overall. The core design decisions are correct: always-HTTP-200 with a discriminated status field makes for a clean client-side decision tree, the bypass of retry and fallback strategy is intentional and well-tested, and SecretStr fields with extra="ignore" give a forward-compatible, safe API surface. Test coverage is thorough.

Two items worth addressing before merging, plus a couple of minor suggestions:

Verify timeout: the probe inherits the LLM's 300 s default — too long for a user waiting on a credential-check button. Either hardcode a short timeout on the LLM constructed here, or expose it via VerifyLLMRequest. See inline comment.
Provider error messages: str(exc) is returned verbatim to the client; some providers embed raw credential fragments in error bodies. Worth a quick audit of what LiteLLM surfaces. See inline comment.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

github-actions · 2026-05-27T20:15:59Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-agent-server/openhands/agent_server
api.py	251	21	91%	101, 103–108, 110, 112, 114, 148, 160, 175, 181, 456, 459, 463–465, 467, 473
openhands-sdk/openhands/sdk/llm
llm.py	708	107	84%	492, 516, 549, 834–835, 838–842, 844, 852–854, 858, 875–876, 880, 882–883, 885–887, 965, 1028, 1196–1198, 1299, 1368, 1409, 1421–1423, 1426–1429, 1435, 1478, 1521, 1534–1536, 1539–1542, 1548, 1704, 1718–1723, 1839–1840, 2072–2073, 2082, 2088, 2093, 2133, 2135–2140, 2142–2159, 2162–2166, 2168–2169, 2175–2184, 2241, 2243
TOTAL	28918	6527	77%

enyst

Sorry to post on a draft, just would like to point out a detail quick:

"Mirrors the LLM fields the settings UI exposes."

Maybe we can do differently here, like, use the full object, the full profile, or an existing object, instead of a bucket of special case settings?

I think what the agent did at this moment would be pretty bad design, sorry. The UI can change; and, it should not be the only UI we write a REST API for. (?!)

I believe we should not want it to be hardcoded - other people can create UIs and use this API. I would prefer to not limit UIs and customizability. I know technically this isn't a "limit", but it is: for agents it's a clear signal to dump partial settings everywhere.

Can we verify the LLM or profile that the user has?

chuckbutkus · 2026-05-28T03:19:13Z

openhands-agent-server/openhands/agent_server/llm_router.py
Mirrors the LLM fields the settings UI exposes. Additional fields are
accepted but ignored at the API boundary; if you need to verify a more
exotic configuration, persist it via PATCH /settings and then verify
the saved state separately.

@enyst , great call out. Will be fixed in a sec.

Drop the hand-maintained VerifyLLMRequest subset and take the SDK's LLM model as the request body. Removes duplication with LLM, removes the field-by-field rebind (and the SecretStr-masking pitfall it papered over), and lets the endpoint grow new credential fields automatically whenever LLM does. Also fix a pre-existing bug in _sanitize_validation_errors that this refactor surfaces: Pydantic's value_error entries from @model_validator hooks include the original ValueError under ctx['error'], which is not JSON-serializable. Without jsonable_encoder the 422 handler itself raises a TypeError. The new test exercises that path directly. Tests: - Add parametrized AWS SecretStr passthrough test (review feedback) - Inline the SecretStr capture in the api_key passthrough test now that LLM always normalises to SecretStr via _validate_secrets - Add regression test for non-JSON-serializable ctx in the sanitizer Co-authored-by: openhands <openhands@all-hands.dev>

Addresses three open review threads on POST /llm/verify: 1. **Timeout cap.** Wrap llm.averify() in asyncio.wait_for with a hard 30 s ceiling (taken as min with any smaller caller-supplied timeout). The SDK's 300 s LLM.timeout default would otherwise park an interactive UI on a five-minute hang when a provider is unreachable. Maps asyncio.TimeoutError to VerifyLLMStatus.TIMEOUT. 2. **Error message sanitization.** New _sanitize_error_message scrubs any SecretStr values present on the constructed LLM (api_key plus the three AWS SecretStr fields) from the exception's str() representation, then truncates to 512 chars. Defends against providers that echo Authorization headers / query-string keys in error bodies, and bounds the response size against pathological HTML error pages. 3. **Telemetry comment in llm.py.** Explain that _telemetry_ctx is intentionally discarded in _prepare_verify_call so a future reader doesn't 'fix' it back to wiring telemetry callbacks for what is not an agent step. Tests cover all three behaviors: timeout enforcement, caller-supplied timeout honored when smaller, api_key scrubbing from error messages, and 512-char truncation with the ellipsis sentinel. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot · 2026-05-28T04:10:06Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

❌ QA Report: FAIL

The new /api/llm/verify endpoint is present and handles some error paths, but it falsely rejects a real valid LLM configuration that can complete normally.

Does this PR achieve its stated goal?

Partially / No for the core success path. The PR sets out to add an LLM verify endpoint and SDK verify probe; I confirmed the endpoint did not exist on the base commit and exists on this PR. However, with the real configured LLM_MODEL / LLM_API_KEY / LLM_BASE_URL, the verify endpoint returned status=bad_request while a normal SDK completion using the exact same credentials succeeded and returned OK, so the feature can tell users valid credentials are invalid.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the workspace packages
CI Status	🟡 All listed checks were successful except `qa-changes`, which was still in progress; `cleanup-on-approval` was skipped
Functional Verification	❌ Real API + SDK execution found a false-negative verify result for a valid LLM config

Functional Verification

Test 1: Baseline before this PR

Step 1 — Establish baseline without the feature:
Ran git checkout c6347949c4dacbdf9db364fc902e2be216599747, started the agent server with uv run python -m openhands.agent_server --port 8765, then called:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-invalid"}'

Observed:

HTTP_STATUS:404
{"detail":"Not Found"}

Also ran a short SDK import check and observed:

has_verify False
has_averify False

This establishes the baseline: the endpoint and SDK methods were not available before the PR.

Step 2 — Apply the PR changes:
Checked out feat/llm-verify-endpoint at f0af2c3b1e8c922af1a935d9f5ee617d57388aa5.

Test 2: PR endpoint behavior with validation and bad credentials

Started the PR agent server with uv run python -m openhands.agent_server --port 8765 and called GET /server_info; it returned HTTP 200 with the OpenHands Agent Server metadata.

Then called the new verify endpoint with a missing model:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"api_key":"sk-qa-secret-should-redact"}'

Observed:

HTTP_STATUS:422
{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}

This shows malformed input stays a JSON 422 and redacts the submitted API key instead of crashing or echoing the secret.

Then called the endpoint with the configured model/base URL but an intentionally invalid key:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  --data-binary @/tmp/oh_invalid_payload.json

Observed:

HTTP_STATUS:200
{"status":"auth_error","message":"litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed...","provider":"litellm_proxy"}

This shows the endpoint performs a real provider/proxy request and maps invalid credentials to structured HTTP 200 auth_error.

Test 3: PR false-negative on valid credentials

With the real configured LLM_MODEL, LLM_API_KEY, and LLM_BASE_URL, called the new endpoint as a real user would when verifying saved credentials:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  --data-binary @/tmp/oh_valid_payload.json

Observed:

HTTP_STATUS:200
{"status":"bad_request","message":"litellm.BadRequestError: Error code: 400 - {'error': {'message': \"litellm.BadRequestError: OpenAIException - Could not finish the message because max_tokens or model output limit was reached. Please try again with higher max_tokens...","provider":"litellm_proxy"}

To confirm the credentials/model/base URL were actually usable, I ran a normal SDK completion with the exact same environment-provided configuration and a realistic output budget:

uv run python -c "... llm.completion(..., max_completion_tokens=256) ..."

Observed:

completion_succeeded True
assistant_text 'OK'
completion_tokens 16

I also exercised the new SDK method directly with the same valid config:

uv run python -c "... llm.verify() ..."

Observed:

has_verify True has_averify True
verify_result exception
LLMBadRequestError
litellm.BadRequestError: Error code: 400 - {'error': {'message': "litellm.BadRequestError: OpenAIException - Could not finish the message because max_tokens or model output limit was reached...

Interpretation: the PR adds the endpoint/method, but the verify probe itself is too constrained for this valid reasoning-model configuration. The same credentials successfully complete normally, so returning bad_request from verify is a false negative.

Issues Found

🟠 Issue: Valid configured credentials/model/base URL were rejected by /api/llm/verify and LLM.verify() because the one-token probe hits a provider output-limit error, even though normal SDK completion with the same config succeeds.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Solid implementation overall. The POST /llm/verify design is well-considered: HTTP-200-always with a discriminated status enum is exactly the right contract for an interactive UI, and the security controls (timeout cap, credential scrubbing, error truncation) are all present. The bypass of retry and fallback is correctly motivated and tested. The jsonable_encoder fix for 422 responses is clean and targeted. Two issues worth addressing before merge:

_sanitize_error_message duplicates LLM_SECRET_FIELDS — the constant already declared as a single source of truth in llm.py (line 148) should be used here, not re-typed.
infer_litellm_provider sits outside the try block — if it raises on an unusual model string, the endpoint returns an unhandled 500 instead of a structured response, silently violating the documented HTTP-200 contract.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

chuckbutkus · 2026-05-28T04:33:16Z

@OpenHands fix failing sdk tests.

openhands-ai · 2026-05-28T04:33:44Z

I'm on it! chuckbutkus can track my progress at all-hands.dev

…tion_tokens The verify probe was updated to use max_completion_tokens=1024 (instead of 1) for better compatibility with models that require a minimum token budget. Align the test assertion, test name, and docstring to match. Co-authored-by: openhands <openhands@all-hands.dev>

chuckbutkus · 2026-05-28T04:38:15Z

Fixed in c534f76.

The failing test was test_verify_sends_minimal_one_token_probe — it was asserting max_completion_tokens == 1, but the max_completion_tokens in the verify probe was intentionally updated to 1024. I updated the test assertion, test name, and docstring to match.

This comment was created by an AI agent (OpenHands) on behalf of the user.

openhands-ai · 2026-05-28T04:38:56Z

OpenHands encountered an error: Request timeout after 30 seconds to https://kqxtrzjmjhxkkuaz.prod-runtime.all-hands.dev/api/conversations/c7177cce-6a99-4009-9625-735fa15157d5/ask_agent

See the conversation for more information.

chuckbutkus · 2026-05-28T04:42:24Z

@OpenHands address the unresolved all-hands-bot comments.

openhands-ai · 2026-05-28T04:42:55Z

I'm on it! chuckbutkus can track my progress at all-hands.dev

- Use LLM_SECRET_FIELDS instead of hardcoded tuple in _sanitize_error_message so any future secret field added to LLM is automatically scrubbed - Move infer_litellm_provider inside the try block so a failure there returns a structured VerifyLLMResponse (HTTP 200) instead of a 500 - Add parametrized test for AWS credential scrubbing in error messages, covering aws_access_key_id, aws_secret_access_key, and aws_session_token Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-05-28T04:48:38Z

Summary of Changes

All three unresolved all-hands-bot review threads on PR #3410 have been addressed in commit a4045933.

Checklist

LLM_SECRET_FIELDS import — Added from openhands.sdk.llm.llm import LLM_SECRET_FIELDS and replaced the hardcoded 4-field tuple in _sanitize_error_message with LLM_SECRET_FIELDS. Future secret fields added to LLM are now automatically scrubbed without needing a separate update here.
infer_litellm_provider inside try block — Moved infer_litellm_provider(...) inside the try block (initializing provider = None before it). If infer_litellm_provider raises for any reason, the exception is caught by the existing broad handler and a structured VerifyLLMResponse (HTTP 200) is returned — preserving the documented HTTP-200 contract.
AWS credential scrubbing tests — Added test_verify_endpoint_scrubs_aws_secret_from_error_message, a parametrized test covering all three AWS secret fields (aws_access_key_id, aws_secret_access_key, aws_session_token), mirroring the existing api_key scrubbing test.
All 31 tests pass, pre-commit hooks (ruff lint/format, pyright, pycodestyle) pass.
All three review threads replied to and resolved.
Changes are minimal and scoped exactly to the issues raised — no extraneous modifications.

all-hands-bot · 2026-05-28T05:03:19Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

🟡 Acceptable — ready with two fixes before merge

This PR introduces a POST /llm/verify endpoint that accepts a full LLM config, fires a minimal one-token probe through averify(), and returns a discriminated-status HTTP 200 response. The overall architecture is sound: accepting LLM directly as the request body exercises the same code path as real conversation creation, the timeout cap and secret-scrubbing layers are well thought-out, and the test coverage is comprehensive. Previous review rounds caught and addressed the major issues (timeout cap, secret sanitization, hardcoded field list, infer_litellm_provider inside the try block). Two issues remain before this is merge-ready.

[CRITICAL ISSUES]

timeout=0 is silently promoted to _VERIFY_TIMEOUT_S
llm.timeout or _VERIFY_TIMEOUT_S uses Python's falsy check, so a caller passing "timeout": 0 gets a 30-second window instead of the intended immediate timeout. Fix with an explicit None guard:

timeout = min(
    llm.timeout if llm.timeout is not None else _VERIFY_TIMEOUT_S,
    _VERIFY_TIMEOUT_S,
)

logger.exception emits the raw exception for unmapped errors
In the UNKNOWN_ERROR branch, logger.exception(...) has implicit exc_info=True, so the raw exception string — which may include provider-echoed content — lands in server logs unsanitized. The client gets the scrubbed message; the log does not. Risk is low since unmapped exceptions are rare, but for consistency with the rest of the hardening, use the already-scrubbed value:

logger.error("llm.verify failed with an unmapped exception: %s", message)

[TESTING GAPS]

The PR description has no evidence section. Before removing the draft flag, add:

The exact command used to exercise the verify endpoint against a real provider (e.g. local Ollama or a live OpenAI key)
The resulting response body showing "status": "success" (or "rate_limited", which counts as valid credentials)

Unit tests confirm the wiring is correct; they cannot substitute for proof that the real LiteLLM call round-trips to a live provider. This matters especially given that the previous review found a false-negative with the 1-token cap on certain proxies — confirming that max_completion_tokens=1024 works end-to-end with at least one real provider is the minimum bar.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

New public endpoint on the agent server that accepts credentials and makes outbound network calls. The always-200 contract, timeout cap, and secret-sanitization layers are all in place. The two issues above are low-probability in practice (callers won't send timeout=0; unmapped exceptions are rare), but should be fixed before merging given the security sensitivity of the endpoint's purpose.

VERDICT:
❌ Needs rework: Fix the timeout=0 falsy check and the unsanitized server-side log call before removing the draft flag.

KEY INSIGHT:
The or-based falsy check in the timeout calculation is the classic Python pitfall for optional numeric fields — an explicit is not None guard is correct whenever 0 is a meaningful value.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.

Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

Was this review helpful? React with 👍 or 👎 to give feedback.

all-hands-bot

✅ QA Report: PASS

Verified the new LLM verification flow by running the agent server and making real HTTP requests, plus calling the SDK LLM.verify() entrypoint with real credentials.

Does this PR achieve its stated goal?

Yes. The PR appears intended to add an LLM credential/connectivity verification endpoint and SDK probe; on main, POST /api/llm/verify returned 404, while on this PR the same real agent server accepted the endpoint and returned status: success using the configured real LLM credentials. I also verified structured failure behavior over HTTP (auth_error for an invalid token, 422 with redacted input for a malformed body) and confirmed direct LLM.verify() succeeds in the project uv environment.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests/linters were run.
CI Status	🟡 At time checked: 31 passing, 0 failing, 1 pending (`qa-changes`), 3 skipped.
Functional Verification	✅ Real server/API and SDK execution matched the intended behavior.

Functional Verification

Test 1: New `/api/llm/verify` endpoint exists and verifies real credentials

Step 1 — Establish baseline without the PR:
Checked out main, started the real agent server, and posted to the verify endpoint:

git switch main
TMUX_TMPDIR=/tmp/oh-qa-tmux-main OPENHANDS_SUPPRESS_BANNER=1   uv run python -m openhands.agent_server --host 127.0.0.1 --port 8765
curl -sS -o /tmp/oh-main-verify-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8765/api/llm/verify   -H 'Content-Type: application/json'   --data '{"model":"openai/gpt-4o","api_key":"sk-fake"}'

Observed:

MAIN_SERVER_READY_HTTP_STATUS=200
MAIN_VERIFY_HTTP_STATUS=404
MAIN_VERIFY_BODY:
{"detail":"Not Found"}

This confirms the endpoint was not available on main.

Step 2 — Apply the PR's changes:
Checked out feat/llm-verify-endpoint at f3fd93af048b169a4b1298564ccb97bb4fb1a90c and started the real agent server.

Step 3 — Re-run with the PR:
Posted a real verification request using $LLM_MODEL, $LLM_BASE_URL, and $LLM_API_KEY from the QA environment:

git switch feat/llm-verify-endpoint
TMUX_TMPDIR=/tmp/oh-qa-tmux-pr OPENHANDS_SUPPRESS_BANNER=1   uv run python -m openhands.agent_server --host 127.0.0.1 --port 8766
curl -sS -o /tmp/oh-pr-verify-real-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8766/api/llm/verify   -H 'Content-Type: application/json'   --data @/tmp/verify-real-payload.json

Observed:

PR_SERVER_READY_HTTP_STATUS=200
PR_VERIFY_REAL_HTTP_STATUS=200
{"message": null, "provider": "litellm_proxy", "status": "success"}

This confirms the new endpoint is reachable and successfully exercises the real LLM credential path end-to-end.

Test 2: Verify endpoint returns structured failures instead of transport errors

Step 1 — Baseline:
On main, the endpoint is absent as shown in Test 1, so structured verify outcomes are not available.

Step 2 — Apply the PR's changes:
Used the PR branch server.

Step 3 — Exercise failure cases:
Sent an invalid API key with the configured model/base URL:

curl -sS -o /tmp/oh-pr-verify-invalid-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8767/api/llm/verify   -H 'Content-Type: application/json'   --data @/tmp/verify-invalid-payload.json

Observed:

PR_VERIFY_INVALID_HTTP_STATUS=200
{"message_contains_invalid_key": false, "message_excerpt": "litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...-ke", "provider": "litellm_proxy", "status": "auth_error"}

This confirms provider auth failure is represented as a structured status with HTTP 200, and the invalid key was not echoed verbatim.

Sent a malformed body omitting model:

curl -sS -o /tmp/oh-pr-verify-missing-model-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8766/api/llm/verify   -H 'Content-Type: application/json'   --data '{"api_key":"sk-secret-that-should-be-redacted"}'

Observed:

PR_VERIFY_MISSING_MODEL_HTTP_STATUS=422
{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}

This confirms malformed requests still use FastAPI validation semantics while avoiding the validation-error serialization/redaction failure mode.

Test 3: SDK `LLM.verify()` works directly

Step 1 — Baseline:
The feature adds a new SDK probe method used by the server endpoint; the main-branch endpoint baseline above shows the user-facing verify flow did not exist before this PR.

Step 2 — Apply the PR's changes:
Used the PR branch in the repository uv environment.

Step 3 — Call the SDK directly with real credentials:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'import os; from openhands.sdk.llm import LLM; kwargs={"model": os.environ["LLM_MODEL"], "api_key": os.environ["LLM_API_KEY"]}; base=os.environ.get("LLM_BASE_URL"); kwargs.update({"base_url": base} if base else {}); LLM(**kwargs).verify(); print("SDK_VERIFY_RESULT=success")'

Observed:

SDK_VERIFY_RESULT=success

This confirms the SDK entrypoint performs a real provider probe successfully.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

chuckbutkus · 2026-05-28T05:11:20Z

@OpenHands address the unresolved all-hands-bot comments.

openhands-ai · 2026-05-28T05:11:47Z

I'm on it! chuckbutkus can track my progress at all-hands.dev

Fix two issues flagged in the second review pass: 1. Timeout None check: replace 'llm.timeout or _VERIFY_TIMEOUT_S' with an explicit None check so timeout=0 is not falsily promoted to the 30 s cap. 2. Logger scrubbing: replace logger.exception with logger.error so the already-scrubbed message string is what gets recorded, rather than raw exception content via implicit exc_info=True. Co-authored-by: openhands <openhands@all-hands.dev>

chuckbutkus · 2026-05-28T05:14:42Z

Addressed the two remaining all-hands-bot review comments in 2967d88:

Timeout 0 promoted to cap: replaced llm.timeout or _VERIFY_TIMEOUT_S with an explicit None check (llm.timeout if llm.timeout is not None else _VERIFY_TIMEOUT_S) so timeout=0 is honoured rather than silently falling through to the 30 s ceiling.
logger.exception leaking raw exception content to logs: replaced with logger.error("...", message) so the already-scrubbed message string (with credentials replaced by *** and truncated to 512 chars) is what gets recorded, rather than the raw exception body via implicit exc_info=True.

All 31 tests still pass.

This comment was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-05-28T05:18:01Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

✅ QA Report: PASS

Verified the new LLM verification flow by running the agent server, making real HTTP requests to /api/llm/verify, and directly calling the new SDK verify APIs.

Does this PR achieve its stated goal?

Yes. The PR appears intended to add a server-side LLM credential/connectivity verification endpoint backed by SDK LLM.verify/LLM.averify. On main, the same request returned 404 Not Found; on this PR, the endpoint returned structured verification results, successfully reached an OpenAI-compatible provider over HTTP, mapped an invalid real OpenAI credential to auth_error, and returned a sanitized 422 for malformed input instead of leaking the submitted key.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no test/lint suite was run.
CI Status	🟡 Most checks are green; `qa-changes` and Docker image build/push jobs were still in progress when checked.
Functional Verification	✅ Endpoint and SDK verify behavior worked with real server/API execution.

Functional Verification

Test 1: Before/after server endpoint behavior

Step 1 — Establish baseline on origin/main:
Started the base agent server at f305353d4c369b4047e3cfd84e0d5f1dc48a6cab and ran:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"good-key","base_url":"http://127.0.0.1:18111/v1","timeout":5}' \
  http://127.0.0.1:18001/api/llm/verify

Output:

{"detail":"Not Found"}
HTTP_STATUS:404

This confirms the verify endpoint did not exist before the PR.

Step 2 — Apply the PR's changes:
Checked out feat/llm-verify-endpoint at 2967d88fa396bd8a9e59c2349d22fefb016c9b8b, started the agent server, and ran the same endpoint request against a local OpenAI-compatible provider.

Step 3 — Re-run with the PR in place:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"good-key","base_url":"http://127.0.0.1:18111/v1","timeout":5}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"status":"success","message":null,"provider":"openai"}
HTTP_STATUS:200

The local provider log also showed the server made a real completion request:

127.0.0.1 - "POST /v1/chat/completions HTTP/1.1" 200 -

This shows the new endpoint is reachable and performs an actual provider call rather than only parsing input.

Test 2: Real invalid credential handling

Ran a real OpenAI verification attempt with an intentionally invalid key:

curl -sS -m 20 -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-invalid-openhands-qa","timeout":10}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"status":"auth_error","message":"litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: sk-inval***********s-qa. You can find your API key at https://platform.openai.com/account/api-keys.","provider":"openai"}
HTTP_STATUS:200

This confirms provider authentication failures are surfaced as a structured auth_error response with HTTP 200, as a client/UI can branch on.

Test 3: Malformed body returns sanitized validation error

Ran:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"api_key":"sk-real-secret-key-12345"}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}
HTTP_STATUS:422

This confirms malformed requests still fail as validation errors, but the submitted secret is redacted and the response remains JSON-serializable.

Test 4: SDK verify APIs execute directly

Ran a short user-style script importing the SDK and calling the new APIs:

uv run python - <<'PY'
import asyncio
from openhands.sdk.llm import LLM
from openhands.sdk.llm.exceptions import LLMAuthenticationError

llm = LLM(model='openai/gpt-4o', api_key='good-key', base_url='http://127.0.0.1:18111/v1', timeout=5)
print('sync verify:', llm.verify())
print('async verify:', asyncio.run(llm.averify()))

bad = LLM(model='openai/gpt-4o', api_key='sk-invalid-openhands-qa', timeout=10)
try:
    bad.verify()
except LLMAuthenticationError as exc:
    print('invalid OpenAI maps to:', type(exc).__name__)
PY

Output:

sync verify: None
async verify: None
invalid OpenAI maps to: LLMAuthenticationError

This confirms both sync and async SDK entry points work end-to-end and preserve typed authentication error behavior.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Review: `feat/llm-verify-endpoint`

This is a clean, security-conscious addition to the agent server. The decision to accept LLM directly as the request body (instead of a bespoke request model) keeps the code DRY and sidesteps the SecretStr-masking footgun that earlier revisions had with manual field rebinding. All critical issues from previous rounds — 30 s timeout cap, LLM_SECRET_FIELDS-based scrubbing, infer_litellm_provider inside the try block, timeout=0 handling, logger.error instead of logger.exception — are addressed.

Strengths:

HTTP-200-always contract with discriminated status values is the right UX for a verify button.
Defence-in-depth credential scrubbing (_sanitize_error_message + truncation) is solid.
Bypassing retry and fallback strategy by design means the first real error surfaces immediately — exactly what "verify before save" needs.
_prepare_verify_call reuses the same normalisation path as a real completion (Azure rename, OpenRouter headers, etc.) so "verified" means the same code path will work at conversation time.
Test coverage is comprehensive: timeout, credential leakage, large error bodies, AWS credentials, keyless local servers, unknown_error path.

Two small items below.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Add a NOTE explaining that infer_litellm_provider sits inside the try block intentionally, so its failures collapse to UNKNOWN_ERROR alongside genuine provider-side unknown errors. Helps future readers reason about the two distinct failure modes hidden behind a single status. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot · 2026-05-28T05:52:28Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

✅ QA Report: PASS

Verified the new LLM verification capability through the SDK, the agent-server HTTP API, a local OpenAI-compatible provider, and the configured real LiteLLM proxy.

Does this PR achieve its stated goal?

Yes. The PR title indicates a new LLM verify endpoint, and the PR also adds the SDK probe it depends on. On origin/main, LLM.verify is absent and POST /api/llm/verify returns 404; on PR commit f19626cfd8f3654b27b45b044863cd3456a1be06, the SDK probe sends a real chat-completions request, the agent-server endpoint returns structured success, validation errors return 422 without leaking secrets, and a real configured LiteLLM proxy verification returned status: success.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the editable packages.
CI Status	🟡 Latest snapshot: 20 passed, 0 failed, 7 pending, 3 skipped. Pending jobs were QA and Docker image builds.
Functional Verification	✅ SDK and API behavior worked end-to-end with local and real provider probes.

Functional Verification

Test 1: Baseline confirms the feature did not exist before this PR

Step 1 — Establish baseline without the PR:
Ran:

git checkout --detach origin/main
uv run python /tmp/qa_baseline_probe.py

Key output:

BASE_SDK_HAS_VERIFY False
BASE_SDK_VERIFY_CALL AttributeError 'LLM' object has no attribute 'verify'
BASE_ENDPOINT_VERIFY_STATUS 404
BASE_ENDPOINT_VERIFY_BODY {"detail":"Not Found"}

This shows the SDK method and /api/llm/verify endpoint are new behavior in this PR.

Step 2 — Apply the PR changes:
Checked out:

git checkout --detach f19626cfd8f3654b27b45b044863cd3456a1be06

Test 2: SDK `LLM.verify()` sends a real provider request and succeeds

Step 3 — Re-run with the PR:
Ran a QA driver that starts a local OpenAI-compatible HTTP server and calls LLM.verify():

uv run python /tmp/qa_verify_behavior.py

Key output:

PR_SDK_VERIFY_RESULT None
PR_SDK_FAKE_PROVIDER_REQUEST {"authorization_present": true, "first_role": "user", "messages_count": 1, "path": "/v1/chat/completions", "token_cap": 1024}

This confirms the SDK method executes a real HTTP completion probe, includes auth, sends one user message, caps output, and returns None on success.

Test 3: Agent-server `/api/llm/verify` works over HTTP

The same QA driver started agent-server --host 127.0.0.1 --port <free-port> and posted to the new endpoint with a local provider base_url.

Key output:

PR_ENDPOINT_LOCAL_STATUS 200
PR_ENDPOINT_LOCAL_BODY {"message": null, "provider": "openai", "status": "success"}
PR_ENDPOINT_FAKE_PROVIDER_REQUEST {"authorization_present": true, "first_role": "user", "messages_count": 1, "path": "/v1/chat/completions"}

This confirms a real HTTP client can call the new endpoint and the endpoint performs the verify probe through the provider-compatible HTTP path.

Test 4: Bad request path returns 422 and redacts secrets

Ran the same running agent server with a malformed verify payload containing an API key but no model.

Key output:

PR_ENDPOINT_MISSING_MODEL_STATUS 422
PR_ENDPOINT_MISSING_MODEL_REDACTED True
PR_ENDPOINT_MISSING_MODEL_BODY {"detail": [{"ctx": {"error": {}}, "input": {"api_key": "<redacted>"}, "loc": ["body"], "msg": "Value error, model must be specified in LLM", "type": "value_error"}]}

This confirms the validation path returns a client error instead of crashing and does not echo the submitted secret.

Test 5: Real configured provider credentials verify successfully

Using the existing environment (LLM_MODEL=litellm_proxy/openai/gpt-5.5, LLM_BASE_URL=https://llm-proxy.app.all-hands.dev, and injected LLM_API_KEY), the running agent server verified against the real LiteLLM proxy.

Key output:

PR_ENDPOINT_REAL_PROVIDER_STATUS 200
PR_ENDPOINT_REAL_PROVIDER_BODY {"message": null, "provider": "litellm_proxy", "status": "success"}

This is the strongest end-to-end evidence: the new endpoint verified real credentials through the same server path a client would use.

Test 6: Auth failures are structured and redacted

Ran a local rejecting OpenAI-compatible provider and called both SDK LLM.verify() and POST /api/llm/verify.

Key output:

PR_SDK_AUTH_FAILURE_EXCEPTION LLMAuthenticationError
PR_SDK_AUTH_FAILURE_IS_TYPED True
PR_ENDPOINT_AUTH_FAILURE_STATUS_CODE 200
PR_ENDPOINT_AUTH_FAILURE_BODY {"status":"auth_error","message":"litellm.BadRequestError: OpenAIException - invalid api key ***","provider":"openai"}
PR_ENDPOINT_AUTH_FAILURE_SECRET_REDACTED True

This confirms invalid credentials are mapped into the structured auth_error outcome and echoed provider text is scrubbed before reaching the client.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Review: `feat/llm-verify-endpoint`

The implementation is now in strong shape. The iterative review process has addressed all the critical and important issues from previous rounds — explicit None check for the timeout cap, LLM_SECRET_FIELDS-based secret scrubbing (imported rather than duplicated), infer_litellm_provider inside the try block, logger.error with the pre-sanitized message rather than logger.exception, and the correct model_copy guard that skips the allocation when provider is None. Accepting the full LLM object directly as the request body cleanly resolved the API design question raised by @enyst — any LiteLLM-supported configuration works without hard-coded field lists, and the SecretStr masking footgun is gone. Test coverage is thorough across all failure modes, timeout enforcement, retry/fallback bypass, and credential field scrubbing for both api_key and AWS fields.

One item worth tracking before broader exposure:

No request-level rate limit on POST /llm/verify: The endpoint fires a real paid LLM completion on every call. If the agent server becomes reachable by unauthenticated or lightly-authenticated callers (e.g. behind a shared API gateway), a tight loop of verify requests could exhaust provider quota or accumulate unexpected cost. Today the agent server is typically accessed only by authenticated sessions, so this is acceptable, but worth adding a per-session or per-IP throttle as a hardening follow-up before this pattern propagates to a broader UI surface. Inline comment on the handler below.

Everything else is clean and correct. The jsonable_encoder fix in _sanitize_validation_errors is a targeted, well-tested regression patch. The _prepare_verify_call approach of reusing _prepare_completion_params is smart — it means provider-specific option normalization (Azure max_tokens rename, OpenRouter headers, extended-thinking budgets) is exercised during verify, so "verified" genuinely means the agent will be able to use the credentials at conversation time.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

✅ QA Report: PASS

Verified the new LLM verification capability by running the agent server and SDK against a live local OpenAI-compatible provider plus real error paths.

Does this PR achieve its stated goal?

Yes. The PR adds functional LLM verification behavior: the endpoint was absent on main but on this PR POST /api/llm/verify returned status=success after making a real /v1/chat/completions probe to a local OpenAI-compatible backend. The SDK-facing LLM.verify() and LLM.averify() methods also worked end-to-end, and provider/validation errors came back structured and redacted rather than leaking secrets or crashing.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests/linters were run.
CI Status	⏳ At check time: 20 passing, 6 pending Build & Push jobs, 4 skipped.
Functional Verification	✅ Endpoint, SDK methods, auth-error handling, and malformed-body validation were exercised with real requests.

Functional Verification

Test 1: API endpoint exists and performs a real verify probe

Step 1 — Establish baseline without the fix:
Checked out origin/main, started the agent server, and posted to the new endpoint:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -m openhands.agent_server --host 127.0.0.1 --port 8910
curl -H 'Content-Type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-baseline-placeholder"}' \
  http://127.0.0.1:8910/api/llm/verify

baseline /api/llm/verify HTTP 404
{"detail":"Not Found"}

This confirms the feature did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out feat/llm-verify-endpoint at f19626c, started a local OpenAI-compatible /v1/chat/completions stub, then started the agent server.

Step 3 — Re-run with the fix in place:
Posted a realistic verify payload to the PR server:

curl -H 'Content-Type: application/json' \
  -d '{"model":"openai/qa-local","base_url":"http://127.0.0.1:9912/v1","api_key":"sk-local-ok","timeout":5}' \
  http://127.0.0.1:8911/api/llm/verify

HTTP codes:
success_local 200
success_local status= success provider= openai message_len= 0 secret_present= False redaction_present= False

stub request log:
{"path": "/v1/chat/completions", "model": "qa-local", "messages": [{"content": [{"type": "text", "text": "hi"}], "role": "user"}], "max_completion_tokens": 1024, "max_tokens": null, "auth_present": true}

This shows the endpoint is present, returns the documented structured success response, and actually sends the one-message completion probe to the provider backend.

Test 2: SDK `LLM.verify()` and `LLM.averify()` work for SDK users

Step 1 — Establish baseline without the fix:
On origin/main, imported LLM and checked method availability:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'from openhands.sdk.llm import LLM; print("baseline_has_verify", hasattr(LLM, "verify")); print("baseline_has_averify", hasattr(LLM, "averify"))'

baseline_has_verify False
baseline_has_averify False

This confirms the SDK verification methods are new behavior.

Step 2 — Apply the PR changes:
Checked out the PR branch and reused the same local OpenAI-compatible provider.

Step 3 — Re-run with the fix in place:
Called both methods as an SDK user would:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'import asyncio; from openhands.sdk.llm import LLM; llm=LLM(model="openai/qa-local", base_url="http://127.0.0.1:9912/v1", api_key="sk-sdk-ok", timeout=5); print("verify_return", llm.verify()); print("averify_return", asyncio.run(llm.averify()))'

verify_return None
averify_return None
stub request count 2

This confirms both sync and async SDK entry points complete successfully and perform real provider calls.

Test 3: Error responses are structured and secrets are redacted

With the PR server running, I exercised auth and validation failures:

# Fake official OpenAI key path
auth_openai 200
auth_openai status= auth_error provider= openai message_len= 189 secret_present= False redaction_present= True
auth_openai_message= litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: sk-inval*****-key. You can find your API key at https://platform.openai.com/account/api-keys.

# OpenAI-like local 401 body
http 200 status auth_error provider openai secret_present False redacted True message litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: Bearer ***

# Malformed body missing model
missing_model 422
detail_type list detail_count 1
first_type value_error loc ['body'] input {'api_key': '<redacted>'} ctx_keys ['error']

This shows verify-time failures stay on HTTP 200 with a structured status, while malformed request bodies remain 422 and redact secret inputs.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

VascoSch92 · 2026-05-28T16:16:00Z

+# ``timeout`` to a smaller value in the request body; we take the
+# ``min`` so they can never extend it past what is a reasonable
+# interactive wait.
+_VERIFY_TIMEOUT_S = 30.0


Use Final from typing

VascoSch92 · 2026-05-28T16:16:07Z

+# client. Caps any accidental large payload (truncated HTML error pages,
+# verbose stack traces, etc.) and bounds the worst case if a provider were
+# ever to echo request content back in an error body.
+_MAX_ERROR_MESSAGE_CHARS = 512


Same as above

VascoSch92 · 2026-05-28T16:16:18Z

+# ─────────────────────────────────────────────────────────────────────────────
+
+
+class VerifyLLMStatus(str, Enum):


Use StrEnum

VascoSch92 · 2026-05-28T16:18:12Z

+    ``UNKNOWN_ERROR`` so the endpoint never raises and the frontend always
+    has a structured result to branch on.
+    """
+    message = _sanitize_error_message(exc, llm)


Can we just not add a class method to VerifyLLMStatus?

Something like VerifyLLMStatus.from_exception?

VascoSch92 · 2026-05-28T16:25:06Z

+        the Azure branch of :func:`select_chat_options` renames it to
+        ``max_tokens=1024`` automatically.
+        """
+        probe = [Message(role="user", content=[TextContent(text="hi")])]


Do we want to do that?

Almost every provider has an auth-check endpoint that costs nothing:

┌────────────────────────────────────┬───────────────────────────────────────────────────────────────────┐ │ Provider │ Free probe │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ OpenAI │ GET /v1/models │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Anthropic │ GET /v1/models (added ~2024) │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Google Gemini │ GET /v1/models │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Azure OpenAI │ GET /openai/deployments?api-version=… │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Bedrock │ sts:GetCallerIdentity or bedrock:ListFoundationModels │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ OpenRouter │ GET /api/v1/auth/key (also returns credit balance — bonus UX win) │ ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Mistral / Cohere / Together / Groq │ all have list-models │ └────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘

Taking a look.

but actually I don't know if it checks that you have credits.

This would be just to check that you have a valide API KEY.

Anyway, just an idea. I don't want to block the PR

openhands-agent added 2 commits May 27, 2026 19:02

chuckbutkus requested a review from all-hands-bot May 27, 2026 20:01

all-hands-bot reviewed May 27, 2026

View reviewed changes

enyst reviewed May 27, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

enyst requested changes May 27, 2026

View reviewed changes

openhands-agent and others added 3 commits May 28, 2026 03:40

Merge branch 'main' into feat/llm-verify-endpoint

f0af2c3

chuckbutkus requested a review from all-hands-bot May 28, 2026 04:08

all-hands-bot reviewed May 28, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated

all-hands-bot reviewed May 28, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

Comment thread tests/agent_server/test_llm_router.py

Change max tokens on verify call

12eb2e9

Merge branch 'main' into feat/llm-verify-endpoint

f3fd93a

chuckbutkus requested a review from all-hands-bot May 28, 2026 05:02

all-hands-bot reviewed May 28, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

all-hands-bot reviewed May 28, 2026

View reviewed changes

chuckbutkus requested a review from all-hands-bot May 28, 2026 05:17

OpenHands deleted a comment from openhands-ai Bot May 28, 2026

all-hands-bot reviewed May 28, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py

chuckbutkus and others added 2 commits May 28, 2026 01:33

Review fix

3478f9f

OpenHands deleted a comment from openhands-ai Bot May 28, 2026

chuckbutkus requested a review from all-hands-bot May 28, 2026 05:51

OpenHands deleted a comment from openhands-ai Bot May 28, 2026

all-hands-bot reviewed May 28, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py

chuckbutkus marked this pull request as ready for review May 28, 2026 06:10

chuckbutkus requested a review from enyst May 28, 2026 06:10

all-hands-bot reviewed May 28, 2026

View reviewed changes

Merge branch 'main' into feat/llm-verify-endpoint

0310711

VascoSch92 reviewed May 28, 2026

View reviewed changes

		# ─────────────────────────────────────────────────────────────────────────────


		class VerifyLLMStatus(str, Enum):

Conversation

chuckbutkus commented May 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Issue Number

How to Test

Video/Screenshots

Type

Notes

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: New /api/llm/verify endpoint exists and performs a real provider probe

Test 2: SDK LLM.verify() / LLM.averify() are available and exercise the provider path

Test 3: Request validation remains a normal API contract error

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: POST /llm/verify endpoint

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

chuckbutkus commented May 28, 2026

Uh oh!

all-hands-bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

❌ QA Report: FAIL

Does this PR achieve its stated goal?

Test 1: Baseline before this PR

Test 2: PR endpoint behavior with validation and bad credentials

Test 3: PR false-negative on valid credentials

Issues Found

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chuckbutkus commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

chuckbutkus commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

chuckbutkus commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

chuckbutkus commented May 27, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

all-hands-bot commented May 27, 2026 •

edited

Loading

Test 1: New `/api/llm/verify` endpoint exists and performs a real provider probe

Test 2: SDK `LLM.verify()` / `LLM.averify()` are available and exercise the provider path

Code Review: `POST /llm/verify` endpoint

github-actions Bot commented May 27, 2026 •

edited

Loading

all-hands-bot commented May 28, 2026 •

edited

Loading

all-hands-bot commented May 28, 2026 •

edited

Loading

Test 1: New `/api/llm/verify` endpoint exists and verifies real credentials

Test 3: SDK `LLM.verify()` works directly

all-hands-bot commented May 28, 2026 •

edited

Loading

Review: `feat/llm-verify-endpoint`

all-hands-bot commented May 28, 2026 •

edited

Loading

Test 2: SDK `LLM.verify()` sends a real provider request and succeeds

Test 3: Agent-server `/api/llm/verify` works over HTTP