Skip to content

Feat/llm verify endpoint#3410

Open
chuckbutkus wants to merge 13 commits into
mainfrom
feat/llm-verify-endpoint
Open

Feat/llm verify endpoint#3410
chuckbutkus wants to merge 13 commits into
mainfrom
feat/llm-verify-endpoint

Conversation

@chuckbutkus
Copy link
Copy Markdown
Contributor

@chuckbutkus chuckbutkus commented May 27, 2026

  • A human has tested these changes.

Why

The agent-server needs a way to let a client (e.g. a "Verify credentials"
button in the GUI) confirm an LLM configuration is valid before it is
saved or used to start a conversation. Verifying client-side is brittle:

  • CORS blocks many providers from the browser.
  • Bedrock SigV4 / Azure api_version require provider-specific request
    shapes the client would have to re-implement.
  • The client wouldn't be exercising the same code path the agent uses at
    conversation time, so "verified in the browser" wouldn't reliably mean
    "the agent will be able to use this".

This PR adds a small, structured probe — at both the SDK and HTTP layer —
that fixes all three problems.

Summary

  • SDK: new LLM.verify() / LLM.averify() send a single 1-token probe
    through the existing completion code path and surface the first provider
    error as a typed SDK exception via map_provider_exception. Retry and
    fallback_strategy are deliberately bypassed so verify fails fast on the
    model the caller actually intends to save.
  • agent-server: new POST /api/llm/verify accepts an LLM body and
    returns a discriminated VerifyLLMResponse (success, auth_error,
    rate_limited, timeout, unreachable, bad_request, unknown_error).
    Always returns HTTP 200 — clients branch on status, not transport
    errors — so the UI has a single decision tree. The probe is hard-capped
    at 30 s (vs the SDK's 300 s default) and the inferred LiteLLM provider
    name is attributed even on failure so the UI can render "Anthropic auth
    failed" rather than an unattributed error.
  • Hardening: error messages are scrubbed of any SecretStr values
    present on the LLM (api_key, aws_access_key_id,
    aws_secret_access_key, aws_session_token) and truncated to 512 chars
    to defend against providers that echo credentials back or return large
    HTML error pages. A small fix in api.py runs the 422 sanitizer output
    through jsonable_encoder so Pydantic's non-JSON-serializable
    ctx['error'] (raised by LLM's @model_validator(mode='before'))
    doesn't itself turn a 422 into a 500.

Issue Number

None.

How to Test

  1. Start the agent server locally:
    uv run uvicorn openhands.agent_server.api:app --reload
  2. Send a verify request with real credentials:
    curl -sS -X POST http://localhost:8000/api/llm/verify \
      -H 'Content-Type: application/json' \
      -d '{"model": "openai/gpt-4o-mini", "api_key": "'"$OPENAI_API_KEY"'"}'
    # → {"status":"success","message":null,"provider":"openai"}
  3. Confirm each failure mode round-trips as HTTP 200 + a typed status:
    # invalid key → status="auth_error"
    curl -sS -X POST http://localhost:8000/api/llm/verify \
      -H 'Content-Type: application/json' \
      -d '{"model": "openai/gpt-4o-mini", "api_key": "sk-bogus"}'
    
    # unknown model → status="bad_request"
    curl -sS -X POST http://localhost:8000/api/llm/verify \
      -H 'Content-Type: application/json' \
      -d '{"model": "openai/does-not-exist", "api_key": "'"$OPENAI_API_KEY"'"}'
    
    # missing model → HTTP 422 (FastAPI validation), not a verify outcome
    curl -sS -X POST http://localhost:8000/api/llm/verify \
      -H 'Content-Type: application/json' -d '{}'
  4. Run the new automated coverage:
    uv run pytest tests/sdk/llm/test_llm_verify.py \
                  tests/agent_server/test_llm_router.py \
                  tests/agent_server/test_validation_error_sanitization.py

Video/Screenshots

N/A — backend-only endpoint; behavior is exercised by the curl examples
and tests above.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • Auth posture: POST /api/llm/verify is mounted under the /api/*
    group and inherits the existing X-Session-API-Key header dependency,
    so it is not anonymously reachable in deployments that configure session
    keys. Credentials are caller-supplied per request — the agent server
    never carries a shared upstream key on this path.
  • Telemetry intentionally suppressed: verify probes do not fire
    on_request / on_response callbacks; they are not agent steps and
    should not contribute to spend tracking, prompt logs, or user-facing
    metrics.
  • Why no retry / fallback on verify: retrying would delay an inevitable
    auth failure; falling back to a different LLM would silently report
    "success" for a model that isn't the one being saved. Both defeat the
    point of "verify before save".

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:0310711-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-0310711-python \
  ghcr.io/openhands/agent-server:0310711-python

All tags pushed for this build

ghcr.io/openhands/agent-server:0310711-golang-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang-amd64
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:0310711-golang-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang-arm64
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:0310711-java-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java-amd64
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:0310711-java-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java-arm64
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:0310711-python-amd64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python-amd64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python-amd64
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:0310711-python-arm64
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python-arm64
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python-arm64
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:0310711-golang
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-golang
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-golang
ghcr.io/openhands/agent-server:0310711-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:0310711-java
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-java
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-java
ghcr.io/openhands/agent-server:0310711-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:0310711-python
ghcr.io/openhands/agent-server:031071167426752baae44ec947f9c060cbae7fe3-python
ghcr.io/openhands/agent-server:feat-llm-verify-endpoint-python
ghcr.io/openhands/agent-server:0310711-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 0310711-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 0310711-python-amd64) are also available if needed

Sends a minimal one-token completion to confirm credentials and
connectivity work end-to-end. Reuses _prepare_completion_params so the
verify call path exercises the same provider-aware option normalization
(Azure max_tokens rename, OpenRouter headers, extended-thinking budgets,
etc.) as a real completion, but deliberately bypasses both the retry
decorator and the fallback strategy so the first real provider error is
surfaced as a typed SDK exception (LLMAuthenticationError,
LLMRateLimitError, LLMTimeoutError, LLMServiceUnavailableError,
LLMBadRequestError) via map_provider_exception.

Why bypass retry/fallback for verify:
- Retry: verify should fail fast. Retrying a 401 wastes the user's
  budget and delays the inevitable auth error.
- Fallback: if the user is verifying model X, they want to know if X
  works. Silently substituting model Y would defeat the entire point of
  'verify before save'.

The companion agent-server endpoint POST /llm/verify is added in a
separate commit.

Co-authored-by: openhands <openhands@all-hands.dev>
Exposes LLM.averify over HTTP so clients (notably the agent-canvas
frontend) can delegate credential verification to the agent server
rather than reimplementing provider-specific routing in the browser.

Wire shape: VerifyLLMRequest accepts the LLM fields the settings UI
exposes (model, api_key, base_url, api_version, plus the AWS fields for
Bedrock). VerifyLLMResponse carries a discriminated status and an
optional human-readable message + inferred provider name.

Status values:
- success: probe succeeded; credentials work.
- auth_error: credentials rejected; client should block save.
- rate_limited: provider returned 429; credentials are valid.
- timeout: request timed out; warn but allow save.
- unreachable: network/DNS/5xx; warn but allow save.
- bad_request: credentials valid but config (e.g. model name) is wrong.
- unknown_error: catch-all so the endpoint never raises.

The endpoint always returns HTTP 200 — clients branch on the status
field rather than wrapping the request in HTTP error handling. This
keeps the frontend state machine to a single switch statement.

Why move verify server-side:
- All LiteLLM-supported providers become verifiable, including Bedrock
  with SigV4/IAM and Azure with api_version (both impossible from a
  browser today).
- The verify call path is identical to the conversation-time call
  path, so 'verified' really means 'the agent will be able to use
  this'.
- No CORS proxy / browser-side provider routing to maintain as
  LiteLLM adds providers.

Co-authored-by: openhands <openhands@all-hands.dev>
@chuckbutkus chuckbutkus requested a review from all-hands-bot May 27, 2026 20:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 27, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new LLM verify API and SDK probe end-to-end with a live agent-server and a local OpenAI-compatible provider.

Does this PR achieve its stated goal?

Yes. The PR adds the missing LLM verification capability: on the base branch, POST /api/llm/verify returned 404 and LLM had no verify() method; on the PR commit, the same real HTTP path returned status=success, invalid credentials returned structured status=auth_error with HTTP 200, and both LLM.verify() and LLM.averify() completed successfully. The local provider logs confirm the agent server made real LiteLLM chat-completion probes using the unmasked API key and max_completion_tokens: 1.

Phase Result
Environment Setup make build completed and installed the repo packages.
CI Status ⏳ Observed 13 passing, 0 failing, 13 pending checks at QA time.
Functional Verification ✅ Base-vs-PR behavior verified through actual HTTP and SDK calls.
Functional Verification

Test 1: New /api/llm/verify endpoint exists and performs a real provider probe

Step 1 — Establish baseline without the PR:
Checked out origin/main, synced dependencies, started the real agent server, then ran:

curl -sS -i -X POST http://127.0.0.1:18080/api/llm/verify   -H 'Content-Type: application/json'   -d '{"model":"openai/qa-model","api_key":"sk-valid","base_url":"http://127.0.0.1:18081/v1"}'

Observed:

HTTP/1.1 404 Not Found
content-type: application/json

{"detail":"Not Found"}

This confirms the endpoint did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out a326ca2bb5ae9b4ce2e3a3a79a7e4ae46bb22b01, synced dependencies, started the agent server, and started a local OpenAI-compatible stub at http://127.0.0.1:18081/v1/chat/completions.

Step 3 — Re-run with the PR in place:
Ran the same request with valid credentials:

HTTP/1.1 200 OK
content-type: application/json

{"status":"success","message":null,"provider":"openai"}

Then ran the same request with an invalid key:

HTTP/1.1 200 OK
content-type: application/json

{"status":"auth_error","message":"litellm.BadRequestError: OpenAIException - invalid api key","provider":"openai"}

The mock provider observed the actual requests made by the server:

{"path":"/v1/chat/completions","authorization":"Bearer sk-valid","payload":{"messages":[{"content":[{"type":"text","text":"hi"}],"role":"user"}],"model":"qa-model","max_completion_tokens":1}}
{"path":"/v1/chat/completions","authorization":"Bearer sk-invalid","payload":{"messages":[{"content":[{"type":"text","text":"hi"}],"role":"user"}],"model":"qa-model","max_completion_tokens":1}}

This shows the endpoint works end-to-end through the real agent-server → SDK → LiteLLM → provider HTTP path, including success and structured auth failure behavior.

Test 2: SDK LLM.verify() / LLM.averify() are available and exercise the provider path

Step 1 — Establish baseline without the PR:
On origin/main, ran:

from pydantic import SecretStr
from openhands.sdk.llm import LLM
llm = LLM(model='openai/qa-model', api_key=SecretStr('sk-valid'), base_url='http://127.0.0.1:18081/v1')
print('has_verify=', hasattr(llm, 'verify'))
try:
    llm.verify()
except Exception as exc:
    print(type(exc).__name__ + ':', str(exc)[:160])

Observed:

has_verify= False
AttributeError: 'LLM' object has no attribute 'verify'

This confirms the SDK method did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out a326ca2bb5ae9b4ce2e3a3a79a7e4ae46bb22b01 with the same local provider stub running.

Step 3 — Re-run with the PR in place:
Ran:

import asyncio
from pydantic import SecretStr
from openhands.sdk.llm import LLM

llm = LLM(model='openai/qa-model', api_key=SecretStr('sk-valid'), base_url='http://127.0.0.1:18081/v1')
print('has_verify=', hasattr(llm, 'verify'))
print('sync_verify_result=', llm.verify())
print('async_verify_result=', asyncio.run(llm.averify()))

Observed:

has_verify= True
sync_verify_result= None
async_verify_result= None

This confirms both SDK entry points are usable and complete successfully against a real OpenAI-compatible HTTP provider.

Test 3: Request validation remains a normal API contract error

With the PR server running, submitted an empty request body:

curl -sS -i -X POST http://127.0.0.1:18080/api/llm/verify   -H 'Content-Type: application/json'   -d '{}'

Observed:

HTTP/1.1 422 Unprocessable Content
content-type: application/json

{"detail":[{"type":"missing","loc":["body","model"],"msg":"Field required","input":{}}]}

This confirms malformed client input still fails as an API contract error instead of being confused with provider verification status.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: POST /llm/verify endpoint

Solid implementation overall. The core design decisions are correct: always-HTTP-200 with a discriminated status field makes for a clean client-side decision tree, the bypass of retry and fallback strategy is intentional and well-tested, and SecretStr fields with extra="ignore" give a forward-compatible, safe API surface. Test coverage is thorough.

Two items worth addressing before merging, plus a couple of minor suggestions:

  1. Verify timeout: the probe inherits the LLM's 300 s default — too long for a user waiting on a credential-check button. Either hardcode a short timeout on the LLM constructed here, or expose it via VerifyLLMRequest. See inline comment.
  2. Provider error messages: str(exc) is returned verbatim to the client; some providers embed raw credential fragments in error bodies. Worth a quick audit of what LiteLLM surfaces. See inline comment.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/llm.py
Comment thread tests/agent_server/test_llm_router.py Outdated
Comment thread tests/sdk/llm/test_llm_verify.py
Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py2512191%101, 103–108, 110, 112, 114, 148, 160, 175, 181, 456, 459, 463–465, 467, 473
openhands-sdk/openhands/sdk/llm
   llm.py70810784%492, 516, 549, 834–835, 838–842, 844, 852–854, 858, 875–876, 880, 882–883, 885–887, 965, 1028, 1196–1198, 1299, 1368, 1409, 1421–1423, 1426–1429, 1435, 1478, 1521, 1534–1536, 1539–1542, 1548, 1704, 1718–1723, 1839–1840, 2072–2073, 2082, 2088, 2093, 2133, 2135–2140, 2142–2159, 2162–2166, 2168–2169, 2175–2184, 2241, 2243
TOTAL28918652777% 

Copy link
Copy Markdown
Member

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to post on a draft, just would like to point out a detail quick:

"Mirrors the LLM fields the settings UI exposes."

Maybe we can do differently here, like, use the full object, the full profile, or an existing object, instead of a bucket of special case settings?

I think what the agent did at this moment would be pretty bad design, sorry. The UI can change; and, it should not be the only UI we write a REST API for. (?!)

I believe we should not want it to be hardcoded - other people can create UIs and use this API. I would prefer to not limit UIs and customizability. I know technically this isn't a "limit", but it is: for agents it's a clear signal to dump partial settings everywhere.

Can we verify the LLM or profile that the user has?

@chuckbutkus
Copy link
Copy Markdown
Contributor Author

openhands-agent-server/openhands/agent_server/llm_router.py
Mirrors the LLM fields the settings UI exposes. Additional fields are
accepted but ignored at the API boundary; if you need to verify a more
exotic configuration, persist it via PATCH /settings and then verify
the saved state separately.

@enyst , great call out. Will be fixed in a sec.

openhands-agent and others added 3 commits May 28, 2026 03:40
Drop the hand-maintained VerifyLLMRequest subset and take the SDK's LLM
model as the request body. Removes duplication with LLM, removes the
field-by-field rebind (and the SecretStr-masking pitfall it papered
over), and lets the endpoint grow new credential fields automatically
whenever LLM does.

Also fix a pre-existing bug in _sanitize_validation_errors that this
refactor surfaces: Pydantic's value_error entries from
@model_validator hooks include the original ValueError under
ctx['error'], which is not JSON-serializable. Without jsonable_encoder
the 422 handler itself raises a TypeError. The new test exercises that
path directly.

Tests:
- Add parametrized AWS SecretStr passthrough test (review feedback)
- Inline the SecretStr capture in the api_key passthrough test now that
  LLM always normalises to SecretStr via _validate_secrets
- Add regression test for non-JSON-serializable ctx in the sanitizer

Co-authored-by: openhands <openhands@all-hands.dev>
Addresses three open review threads on POST /llm/verify:

1. **Timeout cap.** Wrap llm.averify() in asyncio.wait_for with a hard
   30 s ceiling (taken as min with any smaller caller-supplied timeout).
   The SDK's 300 s LLM.timeout default would otherwise park an
   interactive UI on a five-minute hang when a provider is unreachable.
   Maps asyncio.TimeoutError to VerifyLLMStatus.TIMEOUT.

2. **Error message sanitization.** New _sanitize_error_message scrubs
   any SecretStr values present on the constructed LLM (api_key plus
   the three AWS SecretStr fields) from the exception's str()
   representation, then truncates to 512 chars. Defends against
   providers that echo Authorization headers / query-string keys in
   error bodies, and bounds the response size against pathological HTML
   error pages.

3. **Telemetry comment in llm.py.** Explain that _telemetry_ctx is
   intentionally discarded in _prepare_verify_call so a future reader
   doesn't 'fix' it back to wiring telemetry callbacks for what is not
   an agent step.

Tests cover all three behaviors: timeout enforcement, caller-supplied
timeout honored when smaller, api_key scrubbing from error messages,
and 512-char truncation with the ellipsis sentinel.

Co-authored-by: openhands <openhands@all-hands.dev>
@chuckbutkus chuckbutkus requested a review from all-hands-bot May 28, 2026 04:08
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 28, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ QA Report: FAIL

The new /api/llm/verify endpoint is present and handles some error paths, but it falsely rejects a real valid LLM configuration that can complete normally.

Does this PR achieve its stated goal?

Partially / No for the core success path. The PR sets out to add an LLM verify endpoint and SDK verify probe; I confirmed the endpoint did not exist on the base commit and exists on this PR. However, with the real configured LLM_MODEL / LLM_API_KEY / LLM_BASE_URL, the verify endpoint returned status=bad_request while a normal SDK completion using the exact same credentials succeeded and returned OK, so the feature can tell users valid credentials are invalid.

Phase Result
Environment Setup make build completed successfully and installed the workspace packages
CI Status 🟡 All listed checks were successful except qa-changes, which was still in progress; cleanup-on-approval was skipped
Functional Verification ❌ Real API + SDK execution found a false-negative verify result for a valid LLM config
Functional Verification

Test 1: Baseline before this PR

Step 1 — Establish baseline without the feature:
Ran git checkout c6347949c4dacbdf9db364fc902e2be216599747, started the agent server with uv run python -m openhands.agent_server --port 8765, then called:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-invalid"}'

Observed:

HTTP_STATUS:404
{"detail":"Not Found"}

Also ran a short SDK import check and observed:

has_verify False
has_averify False

This establishes the baseline: the endpoint and SDK methods were not available before the PR.

Step 2 — Apply the PR changes:
Checked out feat/llm-verify-endpoint at f0af2c3b1e8c922af1a935d9f5ee617d57388aa5.

Test 2: PR endpoint behavior with validation and bad credentials

Started the PR agent server with uv run python -m openhands.agent_server --port 8765 and called GET /server_info; it returned HTTP 200 with the OpenHands Agent Server metadata.

Then called the new verify endpoint with a missing model:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  -d '{"api_key":"sk-qa-secret-should-redact"}'

Observed:

HTTP_STATUS:422
{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}

This shows malformed input stays a JSON 422 and redacts the submitted API key instead of crashing or echoing the secret.

Then called the endpoint with the configured model/base URL but an intentionally invalid key:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  --data-binary @/tmp/oh_invalid_payload.json

Observed:

HTTP_STATUS:200
{"status":"auth_error","message":"litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed...","provider":"litellm_proxy"}

This shows the endpoint performs a real provider/proxy request and maps invalid credentials to structured HTTP 200 auth_error.

Test 3: PR false-negative on valid credentials

With the real configured LLM_MODEL, LLM_API_KEY, and LLM_BASE_URL, called the new endpoint as a real user would when verifying saved credentials:

curl -X POST http://127.0.0.1:8765/api/llm/verify \
  -H 'Content-Type: application/json' \
  --data-binary @/tmp/oh_valid_payload.json

Observed:

HTTP_STATUS:200
{"status":"bad_request","message":"litellm.BadRequestError: Error code: 400 - {'error': {'message': \"litellm.BadRequestError: OpenAIException - Could not finish the message because max_tokens or model output limit was reached. Please try again with higher max_tokens...","provider":"litellm_proxy"}

To confirm the credentials/model/base URL were actually usable, I ran a normal SDK completion with the exact same environment-provided configuration and a realistic output budget:

uv run python -c "... llm.completion(..., max_completion_tokens=256) ..."

Observed:

completion_succeeded True
assistant_text 'OK'
completion_tokens 16

I also exercised the new SDK method directly with the same valid config:

uv run python -c "... llm.verify() ..."

Observed:

has_verify True has_averify True
verify_result exception
LLMBadRequestError
litellm.BadRequestError: Error code: 400 - {'error': {'message': "litellm.BadRequestError: OpenAIException - Could not finish the message because max_tokens or model output limit was reached...

Interpretation: the PR adds the endpoint/method, but the verify probe itself is too constrained for this valid reasoning-model configuration. The same credentials successfully complete normally, so returning bad_request from verify is a false negative.

Issues Found

  • 🟠 Issue: Valid configured credentials/model/base URL were rejected by /api/llm/verify and LLM.verify() because the one-token probe hits a provider output-limit error, even though normal SDK completion with the same config succeeds.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation overall. The POST /llm/verify design is well-considered: HTTP-200-always with a discriminated status enum is exactly the right contract for an interactive UI, and the security controls (timeout cap, credential scrubbing, error truncation) are all present. The bypass of retry and fallback is correctly motivated and tested. The jsonable_encoder fix for 422 responses is clean and targeted. Two issues worth addressing before merge:

  1. _sanitize_error_message duplicates LLM_SECRET_FIELDS — the constant already declared as a single source of truth in llm.py (line 148) should be used here, not re-typed.
  2. infer_litellm_provider sits outside the try block — if it raises on an unusual model string, the endpoint returns an unhandled 500 instead of a structured response, silently violating the documented HTTP-200 contract.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread tests/agent_server/test_llm_router.py
@chuckbutkus
Copy link
Copy Markdown
Contributor Author

@OpenHands fix failing sdk tests.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 28, 2026

I'm on it! chuckbutkus can track my progress at all-hands.dev

…tion_tokens

The verify probe was updated to use max_completion_tokens=1024 (instead of 1)
for better compatibility with models that require a minimum token budget.
Align the test assertion, test name, and docstring to match.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Fixed in c534f76.

The failing test was test_verify_sends_minimal_one_token_probe — it was asserting max_completion_tokens == 1, but the max_completion_tokens in the verify probe was intentionally updated to 1024. I updated the test assertion, test name, and docstring to match.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 28, 2026

OpenHands encountered an error: Request timeout after 30 seconds to https://kqxtrzjmjhxkkuaz.prod-runtime.all-hands.dev/api/conversations/c7177cce-6a99-4009-9625-735fa15157d5/ask_agent

See the conversation for more information.

@chuckbutkus
Copy link
Copy Markdown
Contributor Author

@OpenHands address the unresolved all-hands-bot comments.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 28, 2026

I'm on it! chuckbutkus can track my progress at all-hands.dev

- Use LLM_SECRET_FIELDS instead of hardcoded tuple in _sanitize_error_message
  so any future secret field added to LLM is automatically scrubbed
- Move infer_litellm_provider inside the try block so a failure there
  returns a structured VerifyLLMResponse (HTTP 200) instead of a 500
- Add parametrized test for AWS credential scrubbing in error messages,
  covering aws_access_key_id, aws_secret_access_key, and aws_session_token

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 28, 2026

Summary of Changes

All three unresolved all-hands-bot review threads on PR #3410 have been addressed in commit a4045933.

Checklist

  • LLM_SECRET_FIELDS import — Added from openhands.sdk.llm.llm import LLM_SECRET_FIELDS and replaced the hardcoded 4-field tuple in _sanitize_error_message with LLM_SECRET_FIELDS. Future secret fields added to LLM are now automatically scrubbed without needing a separate update here.

  • infer_litellm_provider inside try block — Moved infer_litellm_provider(...) inside the try block (initializing provider = None before it). If infer_litellm_provider raises for any reason, the exception is caught by the existing broad handler and a structured VerifyLLMResponse (HTTP 200) is returned — preserving the documented HTTP-200 contract.

  • AWS credential scrubbing tests — Added test_verify_endpoint_scrubs_aws_secret_from_error_message, a parametrized test covering all three AWS secret fields (aws_access_key_id, aws_secret_access_key, aws_session_token), mirroring the existing api_key scrubbing test.

  • All 31 tests pass, pre-commit hooks (ruff lint/format, pyright, pycodestyle) pass.

  • All three review threads replied to and resolved.

  • Changes are minimal and scoped exactly to the issues raised — no extraneous modifications.

@chuckbutkus chuckbutkus requested a review from all-hands-bot May 28, 2026 05:02
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 28, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — ready with two fixes before merge

This PR introduces a POST /llm/verify endpoint that accepts a full LLM config, fires a minimal one-token probe through averify(), and returns a discriminated-status HTTP 200 response. The overall architecture is sound: accepting LLM directly as the request body exercises the same code path as real conversation creation, the timeout cap and secret-scrubbing layers are well thought-out, and the test coverage is comprehensive. Previous review rounds caught and addressed the major issues (timeout cap, secret sanitization, hardcoded field list, infer_litellm_provider inside the try block). Two issues remain before this is merge-ready.


[CRITICAL ISSUES]

timeout=0 is silently promoted to _VERIFY_TIMEOUT_S
llm.timeout or _VERIFY_TIMEOUT_S uses Python's falsy check, so a caller passing "timeout": 0 gets a 30-second window instead of the intended immediate timeout. Fix with an explicit None guard:

timeout = min(
    llm.timeout if llm.timeout is not None else _VERIFY_TIMEOUT_S,
    _VERIFY_TIMEOUT_S,
)

logger.exception emits the raw exception for unmapped errors
In the UNKNOWN_ERROR branch, logger.exception(...) has implicit exc_info=True, so the raw exception string — which may include provider-echoed content — lands in server logs unsanitized. The client gets the scrubbed message; the log does not. Risk is low since unmapped exceptions are rare, but for consistency with the rest of the hardening, use the already-scrubbed value:

logger.error("llm.verify failed with an unmapped exception: %s", message)

[TESTING GAPS]

The PR description has no evidence section. Before removing the draft flag, add:

  • The exact command used to exercise the verify endpoint against a real provider (e.g. local Ollama or a live OpenAI key)
  • The resulting response body showing "status": "success" (or "rate_limited", which counts as valid credentials)

Unit tests confirm the wiring is correct; they cannot substitute for proof that the real LiteLLM call round-trips to a live provider. This matters especially given that the previous review found a false-negative with the 1-token cap on certain proxies — confirming that max_completion_tokens=1024 works end-to-end with at least one real provider is the minimum bar.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

New public endpoint on the agent server that accepts credentials and makes outbound network calls. The always-200 contract, timeout cap, and secret-sanitization layers are all in place. The two issues above are low-probability in practice (callers won't send timeout=0; unmapped exceptions are rare), but should be fixed before merging given the security sensitivity of the endpoint's purpose.


VERDICT:
Needs rework: Fix the timeout=0 falsy check and the unsanitized server-side log call before removing the draft flag.

KEY INSIGHT:
The or-based falsy check in the timeout calculation is the classic Python pitfall for optional numeric fields — an explicit is not None guard is correct whenever 0 is a meaningful value.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.
  2. Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

Was this review helpful? React with 👍 or 👎 to give feedback.

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new LLM verification flow by running the agent server and making real HTTP requests, plus calling the SDK LLM.verify() entrypoint with real credentials.

Does this PR achieve its stated goal?

Yes. The PR appears intended to add an LLM credential/connectivity verification endpoint and SDK probe; on main, POST /api/llm/verify returned 404, while on this PR the same real agent server accepted the endpoint and returned status: success using the configured real LLM credentials. I also verified structured failure behavior over HTTP (auth_error for an invalid token, 422 with redacted input for a malformed body) and confirmed direct LLM.verify() succeeds in the project uv environment.

Phase Result
Environment Setup make build completed successfully; no tests/linters were run.
CI Status 🟡 At time checked: 31 passing, 0 failing, 1 pending (qa-changes), 3 skipped.
Functional Verification ✅ Real server/API and SDK execution matched the intended behavior.
Functional Verification

Test 1: New /api/llm/verify endpoint exists and verifies real credentials

Step 1 — Establish baseline without the PR:
Checked out main, started the real agent server, and posted to the verify endpoint:

git switch main
TMUX_TMPDIR=/tmp/oh-qa-tmux-main OPENHANDS_SUPPRESS_BANNER=1   uv run python -m openhands.agent_server --host 127.0.0.1 --port 8765
curl -sS -o /tmp/oh-main-verify-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8765/api/llm/verify   -H 'Content-Type: application/json'   --data '{"model":"openai/gpt-4o","api_key":"sk-fake"}'

Observed:

MAIN_SERVER_READY_HTTP_STATUS=200
MAIN_VERIFY_HTTP_STATUS=404
MAIN_VERIFY_BODY:
{"detail":"Not Found"}

This confirms the endpoint was not available on main.

Step 2 — Apply the PR's changes:
Checked out feat/llm-verify-endpoint at f3fd93af048b169a4b1298564ccb97bb4fb1a90c and started the real agent server.

Step 3 — Re-run with the PR:
Posted a real verification request using $LLM_MODEL, $LLM_BASE_URL, and $LLM_API_KEY from the QA environment:

git switch feat/llm-verify-endpoint
TMUX_TMPDIR=/tmp/oh-qa-tmux-pr OPENHANDS_SUPPRESS_BANNER=1   uv run python -m openhands.agent_server --host 127.0.0.1 --port 8766
curl -sS -o /tmp/oh-pr-verify-real-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8766/api/llm/verify   -H 'Content-Type: application/json'   --data @/tmp/verify-real-payload.json

Observed:

PR_SERVER_READY_HTTP_STATUS=200
PR_VERIFY_REAL_HTTP_STATUS=200
{"message": null, "provider": "litellm_proxy", "status": "success"}

This confirms the new endpoint is reachable and successfully exercises the real LLM credential path end-to-end.

Test 2: Verify endpoint returns structured failures instead of transport errors

Step 1 — Baseline:
On main, the endpoint is absent as shown in Test 1, so structured verify outcomes are not available.

Step 2 — Apply the PR's changes:
Used the PR branch server.

Step 3 — Exercise failure cases:
Sent an invalid API key with the configured model/base URL:

curl -sS -o /tmp/oh-pr-verify-invalid-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8767/api/llm/verify   -H 'Content-Type: application/json'   --data @/tmp/verify-invalid-payload.json

Observed:

PR_VERIFY_INVALID_HTTP_STATUS=200
{"message_contains_invalid_key": false, "message_excerpt": "litellm.AuthenticationError: AuthenticationError: Litellm_proxyException - Authentication Error, Invalid proxy server token passed. Received API Key = sk-...-ke", "provider": "litellm_proxy", "status": "auth_error"}

This confirms provider auth failure is represented as a structured status with HTTP 200, and the invalid key was not echoed verbatim.

Sent a malformed body omitting model:

curl -sS -o /tmp/oh-pr-verify-missing-model-body.txt -w '%{http_code}'   -X POST http://127.0.0.1:8766/api/llm/verify   -H 'Content-Type: application/json'   --data '{"api_key":"sk-secret-that-should-be-redacted"}'

Observed:

PR_VERIFY_MISSING_MODEL_HTTP_STATUS=422
{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}

This confirms malformed requests still use FastAPI validation semantics while avoiding the validation-error serialization/redaction failure mode.

Test 3: SDK LLM.verify() works directly

Step 1 — Baseline:
The feature adds a new SDK probe method used by the server endpoint; the main-branch endpoint baseline above shows the user-facing verify flow did not exist before this PR.

Step 2 — Apply the PR's changes:
Used the PR branch in the repository uv environment.

Step 3 — Call the SDK directly with real credentials:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'import os; from openhands.sdk.llm import LLM; kwargs={"model": os.environ["LLM_MODEL"], "api_key": os.environ["LLM_API_KEY"]}; base=os.environ.get("LLM_BASE_URL"); kwargs.update({"base_url": base} if base else {}); LLM(**kwargs).verify(); print("SDK_VERIFY_RESULT=success")'

Observed:

SDK_VERIFY_RESULT=success

This confirms the SDK entrypoint performs a real provider probe successfully.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@chuckbutkus
Copy link
Copy Markdown
Contributor Author

@OpenHands address the unresolved all-hands-bot comments.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 28, 2026

I'm on it! chuckbutkus can track my progress at all-hands.dev

Fix two issues flagged in the second review pass:

1. Timeout None check: replace 'llm.timeout or _VERIFY_TIMEOUT_S' with
   an explicit None check so timeout=0 is not falsily promoted to the
   30 s cap.

2. Logger scrubbing: replace logger.exception with logger.error so the
   already-scrubbed message string is what gets recorded, rather than
   raw exception content via implicit exc_info=True.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Addressed the two remaining all-hands-bot review comments in 2967d88:

  1. Timeout 0 promoted to cap: replaced llm.timeout or _VERIFY_TIMEOUT_S with an explicit None check (llm.timeout if llm.timeout is not None else _VERIFY_TIMEOUT_S) so timeout=0 is honoured rather than silently falling through to the 30 s ceiling.

  2. logger.exception leaking raw exception content to logs: replaced with logger.error("...", message) so the already-scrubbed message string (with credentials replaced by *** and truncated to 512 chars) is what gets recorded, rather than the raw exception body via implicit exc_info=True.

All 31 tests still pass.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@chuckbutkus chuckbutkus requested a review from all-hands-bot May 28, 2026 05:17
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 28, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@OpenHands OpenHands deleted a comment from openhands-ai Bot May 28, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new LLM verification flow by running the agent server, making real HTTP requests to /api/llm/verify, and directly calling the new SDK verify APIs.

Does this PR achieve its stated goal?

Yes. The PR appears intended to add a server-side LLM credential/connectivity verification endpoint backed by SDK LLM.verify/LLM.averify. On main, the same request returned 404 Not Found; on this PR, the endpoint returned structured verification results, successfully reached an OpenAI-compatible provider over HTTP, mapped an invalid real OpenAI credential to auth_error, and returned a sanitized 422 for malformed input instead of leaking the submitted key.

Phase Result
Environment Setup make build completed successfully; no test/lint suite was run.
CI Status 🟡 Most checks are green; qa-changes and Docker image build/push jobs were still in progress when checked.
Functional Verification ✅ Endpoint and SDK verify behavior worked with real server/API execution.
Functional Verification

Test 1: Before/after server endpoint behavior

Step 1 — Establish baseline on origin/main:
Started the base agent server at f305353d4c369b4047e3cfd84e0d5f1dc48a6cab and ran:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"good-key","base_url":"http://127.0.0.1:18111/v1","timeout":5}' \
  http://127.0.0.1:18001/api/llm/verify

Output:

{"detail":"Not Found"}
HTTP_STATUS:404

This confirms the verify endpoint did not exist before the PR.

Step 2 — Apply the PR's changes:
Checked out feat/llm-verify-endpoint at 2967d88fa396bd8a9e59c2349d22fefb016c9b8b, started the agent server, and ran the same endpoint request against a local OpenAI-compatible provider.

Step 3 — Re-run with the PR in place:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"good-key","base_url":"http://127.0.0.1:18111/v1","timeout":5}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"status":"success","message":null,"provider":"openai"}
HTTP_STATUS:200

The local provider log also showed the server made a real completion request:

127.0.0.1 - "POST /v1/chat/completions HTTP/1.1" 200 -

This shows the new endpoint is reachable and performs an actual provider call rather than only parsing input.

Test 2: Real invalid credential handling

Ran a real OpenAI verification attempt with an intentionally invalid key:

curl -sS -m 20 -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-invalid-openhands-qa","timeout":10}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"status":"auth_error","message":"litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: sk-inval***********s-qa. You can find your API key at https://platform.openai.com/account/api-keys.","provider":"openai"}
HTTP_STATUS:200

This confirms provider authentication failures are surfaced as a structured auth_error response with HTTP 200, as a client/UI can branch on.

Test 3: Malformed body returns sanitized validation error

Ran:

curl -sS -w '\nHTTP_STATUS:%{http_code}\n' \
  -H 'content-type: application/json' \
  -d '{"api_key":"sk-real-secret-key-12345"}' \
  http://127.0.0.1:18002/api/llm/verify

Output:

{"detail":[{"type":"value_error","loc":["body"],"msg":"Value error, model must be specified in LLM","input":{"api_key":"<redacted>"},"ctx":{"error":{}}}]}
HTTP_STATUS:422

This confirms malformed requests still fail as validation errors, but the submitted secret is redacted and the response remains JSON-serializable.

Test 4: SDK verify APIs execute directly

Ran a short user-style script importing the SDK and calling the new APIs:

uv run python - <<'PY'
import asyncio
from openhands.sdk.llm import LLM
from openhands.sdk.llm.exceptions import LLMAuthenticationError

llm = LLM(model='openai/gpt-4o', api_key='good-key', base_url='http://127.0.0.1:18111/v1', timeout=5)
print('sync verify:', llm.verify())
print('async verify:', asyncio.run(llm.averify()))

bad = LLM(model='openai/gpt-4o', api_key='sk-invalid-openhands-qa', timeout=10)
try:
    bad.verify()
except LLMAuthenticationError as exc:
    print('invalid OpenAI maps to:', type(exc).__name__)
PY

Output:

sync verify: None
async verify: None
invalid OpenAI maps to: LLMAuthenticationError

This confirms both sync and async SDK entry points work end-to-end and preserve typed authentication error behavior.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat/llm-verify-endpoint

This is a clean, security-conscious addition to the agent server. The decision to accept LLM directly as the request body (instead of a bespoke request model) keeps the code DRY and sidesteps the SecretStr-masking footgun that earlier revisions had with manual field rebinding. All critical issues from previous rounds — 30 s timeout cap, LLM_SECRET_FIELDS-based scrubbing, infer_litellm_provider inside the try block, timeout=0 handling, logger.error instead of logger.exception — are addressed.

Strengths:

  • HTTP-200-always contract with discriminated status values is the right UX for a verify button.
  • Defence-in-depth credential scrubbing (_sanitize_error_message + truncation) is solid.
  • Bypassing retry and fallback strategy by design means the first real error surfaces immediately — exactly what "verify before save" needs.
  • _prepare_verify_call reuses the same normalisation path as a real completion (Azure rename, OpenRouter headers, etc.) so "verified" means the same code path will work at conversation time.
  • Test coverage is comprehensive: timeout, credential leakage, large error bodies, AWS credentials, keyless local servers, unknown_error path.

Two small items below.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/llm_router.py
chuckbutkus and others added 2 commits May 28, 2026 01:33
Add a NOTE explaining that infer_litellm_provider sits inside the try
block intentionally, so its failures collapse to UNKNOWN_ERROR alongside
genuine provider-side unknown errors. Helps future readers reason about
the two distinct failure modes hidden behind a single status.

Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai Bot May 28, 2026
@chuckbutkus chuckbutkus requested a review from all-hands-bot May 28, 2026 05:51
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 28, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@OpenHands OpenHands deleted a comment from openhands-ai Bot May 28, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new LLM verification capability through the SDK, the agent-server HTTP API, a local OpenAI-compatible provider, and the configured real LiteLLM proxy.

Does this PR achieve its stated goal?

Yes. The PR title indicates a new LLM verify endpoint, and the PR also adds the SDK probe it depends on. On origin/main, LLM.verify is absent and POST /api/llm/verify returns 404; on PR commit f19626cfd8f3654b27b45b044863cd3456a1be06, the SDK probe sends a real chat-completions request, the agent-server endpoint returns structured success, validation errors return 422 without leaking secrets, and a real configured LiteLLM proxy verification returned status: success.

Phase Result
Environment Setup make build completed successfully and installed the editable packages.
CI Status 🟡 Latest snapshot: 20 passed, 0 failed, 7 pending, 3 skipped. Pending jobs were QA and Docker image builds.
Functional Verification ✅ SDK and API behavior worked end-to-end with local and real provider probes.
Functional Verification

Test 1: Baseline confirms the feature did not exist before this PR

Step 1 — Establish baseline without the PR:
Ran:

git checkout --detach origin/main
uv run python /tmp/qa_baseline_probe.py

Key output:

BASE_SDK_HAS_VERIFY False
BASE_SDK_VERIFY_CALL AttributeError 'LLM' object has no attribute 'verify'
BASE_ENDPOINT_VERIFY_STATUS 404
BASE_ENDPOINT_VERIFY_BODY {"detail":"Not Found"}

This shows the SDK method and /api/llm/verify endpoint are new behavior in this PR.

Step 2 — Apply the PR changes:
Checked out:

git checkout --detach f19626cfd8f3654b27b45b044863cd3456a1be06

Test 2: SDK LLM.verify() sends a real provider request and succeeds

Step 3 — Re-run with the PR:
Ran a QA driver that starts a local OpenAI-compatible HTTP server and calls LLM.verify():

uv run python /tmp/qa_verify_behavior.py

Key output:

PR_SDK_VERIFY_RESULT None
PR_SDK_FAKE_PROVIDER_REQUEST {"authorization_present": true, "first_role": "user", "messages_count": 1, "path": "/v1/chat/completions", "token_cap": 1024}

This confirms the SDK method executes a real HTTP completion probe, includes auth, sends one user message, caps output, and returns None on success.

Test 3: Agent-server /api/llm/verify works over HTTP

The same QA driver started agent-server --host 127.0.0.1 --port <free-port> and posted to the new endpoint with a local provider base_url.

Key output:

PR_ENDPOINT_LOCAL_STATUS 200
PR_ENDPOINT_LOCAL_BODY {"message": null, "provider": "openai", "status": "success"}
PR_ENDPOINT_FAKE_PROVIDER_REQUEST {"authorization_present": true, "first_role": "user", "messages_count": 1, "path": "/v1/chat/completions"}

This confirms a real HTTP client can call the new endpoint and the endpoint performs the verify probe through the provider-compatible HTTP path.

Test 4: Bad request path returns 422 and redacts secrets

Ran the same running agent server with a malformed verify payload containing an API key but no model.

Key output:

PR_ENDPOINT_MISSING_MODEL_STATUS 422
PR_ENDPOINT_MISSING_MODEL_REDACTED True
PR_ENDPOINT_MISSING_MODEL_BODY {"detail": [{"ctx": {"error": {}}, "input": {"api_key": "<redacted>"}, "loc": ["body"], "msg": "Value error, model must be specified in LLM", "type": "value_error"}]}

This confirms the validation path returns a client error instead of crashing and does not echo the submitted secret.

Test 5: Real configured provider credentials verify successfully

Using the existing environment (LLM_MODEL=litellm_proxy/openai/gpt-5.5, LLM_BASE_URL=https://llm-proxy.app.all-hands.dev, and injected LLM_API_KEY), the running agent server verified against the real LiteLLM proxy.

Key output:

PR_ENDPOINT_REAL_PROVIDER_STATUS 200
PR_ENDPOINT_REAL_PROVIDER_BODY {"message": null, "provider": "litellm_proxy", "status": "success"}

This is the strongest end-to-end evidence: the new endpoint verified real credentials through the same server path a client would use.

Test 6: Auth failures are structured and redacted

Ran a local rejecting OpenAI-compatible provider and called both SDK LLM.verify() and POST /api/llm/verify.

Key output:

PR_SDK_AUTH_FAILURE_EXCEPTION LLMAuthenticationError
PR_SDK_AUTH_FAILURE_IS_TYPED True
PR_ENDPOINT_AUTH_FAILURE_STATUS_CODE 200
PR_ENDPOINT_AUTH_FAILURE_BODY {"status":"auth_error","message":"litellm.BadRequestError: OpenAIException - invalid api key ***","provider":"openai"}
PR_ENDPOINT_AUTH_FAILURE_SECRET_REDACTED True

This confirms invalid credentials are mapped into the structured auth_error outcome and echoed provider text is scrubbed before reaching the client.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat/llm-verify-endpoint

The implementation is now in strong shape. The iterative review process has addressed all the critical and important issues from previous rounds — explicit None check for the timeout cap, LLM_SECRET_FIELDS-based secret scrubbing (imported rather than duplicated), infer_litellm_provider inside the try block, logger.error with the pre-sanitized message rather than logger.exception, and the correct model_copy guard that skips the allocation when provider is None. Accepting the full LLM object directly as the request body cleanly resolved the API design question raised by @enyst — any LiteLLM-supported configuration works without hard-coded field lists, and the SecretStr masking footgun is gone. Test coverage is thorough across all failure modes, timeout enforcement, retry/fallback bypass, and credential field scrubbing for both api_key and AWS fields.

One item worth tracking before broader exposure:

No request-level rate limit on POST /llm/verify: The endpoint fires a real paid LLM completion on every call. If the agent server becomes reachable by unauthenticated or lightly-authenticated callers (e.g. behind a shared API gateway), a tight loop of verify requests could exhaust provider quota or accumulate unexpected cost. Today the agent server is typically accessed only by authenticated sessions, so this is acceptable, but worth adding a per-session or per-IP throttle as a hardening follow-up before this pattern propagates to a broader UI surface. Inline comment on the handler below.

Everything else is clean and correct. The jsonable_encoder fix in _sanitize_validation_errors is a targeted, well-tested regression patch. The _prepare_verify_call approach of reusing _prepare_completion_params is smart — it means provider-specific option normalization (Azure max_tokens rename, OpenRouter headers, extended-thinking budgets) is exercised during verify, so "verified" genuinely means the agent will be able to use the credentials at conversation time.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-agent-server/openhands/agent_server/llm_router.py
@chuckbutkus chuckbutkus marked this pull request as ready for review May 28, 2026 06:10
@chuckbutkus chuckbutkus requested a review from enyst May 28, 2026 06:10
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new LLM verification capability by running the agent server and SDK against a live local OpenAI-compatible provider plus real error paths.

Does this PR achieve its stated goal?

Yes. The PR adds functional LLM verification behavior: the endpoint was absent on main but on this PR POST /api/llm/verify returned status=success after making a real /v1/chat/completions probe to a local OpenAI-compatible backend. The SDK-facing LLM.verify() and LLM.averify() methods also worked end-to-end, and provider/validation errors came back structured and redacted rather than leaking secrets or crashing.

Phase Result
Environment Setup make build completed successfully; no tests/linters were run.
CI Status ⏳ At check time: 20 passing, 6 pending Build & Push jobs, 4 skipped.
Functional Verification ✅ Endpoint, SDK methods, auth-error handling, and malformed-body validation were exercised with real requests.
Functional Verification

Test 1: API endpoint exists and performs a real verify probe

Step 1 — Establish baseline without the fix:
Checked out origin/main, started the agent server, and posted to the new endpoint:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -m openhands.agent_server --host 127.0.0.1 --port 8910
curl -H 'Content-Type: application/json' \
  -d '{"model":"openai/gpt-4o","api_key":"sk-baseline-placeholder"}' \
  http://127.0.0.1:8910/api/llm/verify

baseline /api/llm/verify HTTP 404
{"detail":"Not Found"}

This confirms the feature did not exist on the base branch.

Step 2 — Apply the PR changes:
Checked out feat/llm-verify-endpoint at f19626c, started a local OpenAI-compatible /v1/chat/completions stub, then started the agent server.

Step 3 — Re-run with the fix in place:
Posted a realistic verify payload to the PR server:

curl -H 'Content-Type: application/json' \
  -d '{"model":"openai/qa-local","base_url":"http://127.0.0.1:9912/v1","api_key":"sk-local-ok","timeout":5}' \
  http://127.0.0.1:8911/api/llm/verify

HTTP codes:
success_local 200
success_local status= success provider= openai message_len= 0 secret_present= False redaction_present= False

stub request log:
{"path": "/v1/chat/completions", "model": "qa-local", "messages": [{"content": [{"type": "text", "text": "hi"}], "role": "user"}], "max_completion_tokens": 1024, "max_tokens": null, "auth_present": true}

This shows the endpoint is present, returns the documented structured success response, and actually sends the one-message completion probe to the provider backend.

Test 2: SDK LLM.verify() and LLM.averify() work for SDK users

Step 1 — Establish baseline without the fix:
On origin/main, imported LLM and checked method availability:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'from openhands.sdk.llm import LLM; print("baseline_has_verify", hasattr(LLM, "verify")); print("baseline_has_averify", hasattr(LLM, "averify"))'

baseline_has_verify False
baseline_has_averify False

This confirms the SDK verification methods are new behavior.

Step 2 — Apply the PR changes:
Checked out the PR branch and reused the same local OpenAI-compatible provider.

Step 3 — Re-run with the fix in place:
Called both methods as an SDK user would:

OPENHANDS_SUPPRESS_BANNER=1 uv run python -c 'import asyncio; from openhands.sdk.llm import LLM; llm=LLM(model="openai/qa-local", base_url="http://127.0.0.1:9912/v1", api_key="sk-sdk-ok", timeout=5); print("verify_return", llm.verify()); print("averify_return", asyncio.run(llm.averify()))'

verify_return None
averify_return None
stub request count 2

This confirms both sync and async SDK entry points complete successfully and perform real provider calls.

Test 3: Error responses are structured and secrets are redacted

With the PR server running, I exercised auth and validation failures:

# Fake official OpenAI key path
auth_openai 200
auth_openai status= auth_error provider= openai message_len= 189 secret_present= False redaction_present= True
auth_openai_message= litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: sk-inval*****-key. You can find your API key at https://platform.openai.com/account/api-keys.

# OpenAI-like local 401 body
http 200 status auth_error provider openai secret_present False redacted True message litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: Bearer ***

# Malformed body missing model
missing_model 422
detail_type list detail_count 1
first_type value_error loc ['body'] input {'api_key': '<redacted>'} ctx_keys ['error']

This shows verify-time failures stay on HTTP 200 with a structured status, while malformed request bodies remain 422 and redact secret inputs.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

# ``timeout`` to a smaller value in the request body; we take the
# ``min`` so they can never extend it past what is a reasonable
# interactive wait.
_VERIFY_TIMEOUT_S = 30.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Final from typing

# client. Caps any accidental large payload (truncated HTML error pages,
# verbose stack traces, etc.) and bounds the worst case if a provider were
# ever to echo request content back in an error body.
_MAX_ERROR_MESSAGE_CHARS = 512
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

# ─────────────────────────────────────────────────────────────────────────────


class VerifyLLMStatus(str, Enum):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StrEnum

``UNKNOWN_ERROR`` so the endpoint never raises and the frontend always
has a structured result to branch on.
"""
message = _sanitize_error_message(exc, llm)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just not add a class method to VerifyLLMStatus?

Something like VerifyLLMStatus.from_exception?

the Azure branch of :func:`select_chat_options` renames it to
``max_tokens=1024`` automatically.
"""
probe = [Message(role="user", content=[TextContent(text="hi")])]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do that?

Almost every provider has an auth-check endpoint that costs nothing:

 ┌────────────────────────────────────┬───────────────────────────────────────────────────────────────────┐
  │              Provider              │                            Free probe                             │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ OpenAI                             │ GET /v1/models                                                    │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ Anthropic                          │ GET /v1/models (added ~2024)                                      │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ Google Gemini                      │ GET /v1/models                                                    │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ Azure OpenAI                       │ GET /openai/deployments?api-version=…                             │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ Bedrock                            │ sts:GetCallerIdentity or bedrock:ListFoundationModels             │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ OpenRouter                         │ GET /api/v1/auth/key (also returns credit balance — bonus UX win) │
  ├────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
  │ Mistral / Cohere / Together / Groq │ all have list-models                                              │
  └────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but actually I don't know if it checks that you have credits.

This would be just to check that you have a valide API KEY.

Anyway, just an idea. I don't want to block the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants