Skip to content

feat: send x-litellm-session-id header for conversation-level routing affinity#3437

Open
csmith49 wants to merge 3 commits into
mainfrom
csmith/litellm-session-affinity
Open

feat: send x-litellm-session-id header for conversation-level routing affinity#3437
csmith49 wants to merge 3 commits into
mainfrom
csmith/litellm-session-affinity

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 29, 2026

Summary

Every LLM in a conversation now sends x-litellm-session-id: <conversation_id> on all requests. When the LiteLLM proxy has session_affinity enabled in its router settings, this pins all requests within a conversation to the same upstream deployment.

Motivation

LiteLLM supports weighted load balancing across multiple deployments of the same model (e.g., spreading traffic across providers or regions). Without session affinity, each request is independently routed by weight, so a single conversation can bounce between deployments mid-stream. This breaks KV prefix caching (the new deployment has a cold cache for the conversation's history) and can cause subtle behavior differences if deployments run different versions or quantizations.

With this header, the proxy caches the routing decision per session ID. The first request picks a deployment by weight; all subsequent requests in the same conversation stick to it. Conversations still distribute across deployments probabilistically — it's just stable per-conversation rather than per-request.

Implementation

Injected in _ensure_plugins_loaded() alongside the existing LLM registry loop, following the same pattern as _pin_prompt_cache_key() (which already uses conversation_id to shard OpenAI prefix caches). User-supplied extra_headers are preserved and win on conflict. The not in existing guard prevents double-injection.

No-op when unused

If the LiteLLM proxy doesn't have session_affinity enabled, the header is ignored — no behavioral change.


This PR was created by an AI agent (OpenHands) on behalf of @calvinsmith.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9c6449c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9c6449c-python \
  ghcr.io/openhands/agent-server:9c6449c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9c6449c-golang-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang-amd64
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9c6449c-golang-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang-arm64
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9c6449c-java-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java-amd64
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9c6449c-java-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java-arm64
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9c6449c-python-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python-amd64
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:9c6449c-python-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python-arm64
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:9c6449c-golang
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:9c6449c-java
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:9c6449c-python
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 9c6449c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 9c6449c-python-amd64) are also available if needed

Every LLM in a conversation now carries the conversation ID as
x-litellm-session-id in extra_headers. The LiteLLM proxy uses this
to pin all requests in a conversation to the same upstream deployment
(e.g., native MiniMax API vs self-hosted vLLM on Modal).

Injected in _ensure_plugins_loaded alongside existing LLM registration,
following the same pattern as _pin_prompt_cache_key. User-supplied
extra_headers are preserved; the guard prevents double-injection.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK Conversation/LLM request path against a local OpenAI-compatible endpoint: the base branch sent no session header, while this PR sent a stable x-litellm-session-id on every observed LLM request and preserved user-supplied headers.

Does this PR achieve its stated goal?

Yes. The PR’s stated goal is to send x-litellm-session-id: <conversation_id> for conversation-level LiteLLM routing affinity while preserving existing extra_headers. A two-turn Conversation.run() probe captured two real /v1/chat/completions POSTs from the same conversation, and both carried x-litellm-session-id equal to that conversation ID; the same probe on origin/main captured null for that header. I also verified the conflict case: when a user supplied x-litellm-session-id, the request kept the user value and retained another custom header.

Phase Result
Environment Setup uv 0.11.17 and make build completed successfully
CI Status 🟡 GitHub reported 20 successful, 1 skipped, 8 pending, 0 failing checks at review time
Functional Verification ✅ Real SDK conversation traffic showed the expected header behavior
Functional Verification

Test 1: Conversation session header is sent on all observed LLM requests

Step 1 — Establish baseline without the fix:
Ran git checkout origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode default --turns 2 against a local OpenAI-compatible capture server. Relevant output:

{
  "conversation_id": "ddaa2ffe-d903-446d-bdd5-dcf8c4b60c79",
  "llm_extra_headers_after_run": null,
  "request_count": 2,
  "requests": [
    {"path": "/v1/chat/completions", "x_litellm_session_id": null},
    {"path": "/v1/chat/completions", "x_litellm_session_id": null}
  ]
}

This confirms the old behavior: two real SDK LLM calls completed, but neither request included x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode default --turns 2. Relevant output:

{
  "conversation_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5",
  "llm_extra_headers_after_run": {
    "x-litellm-session-id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"
  },
  "request_count": 2,
  "requests": [
    {"path": "/v1/chat/completions", "x_litellm_session_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"},
    {"path": "/v1/chat/completions", "x_litellm_session_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"}
  ]
}

This shows the PR works for the core claim: both LLM POSTs in the same conversation carried the same session ID, and that value matched the SDK conversation ID.

Test 2: User-supplied x-litellm-session-id is not overwritten

Step 1 — Establish baseline without the fix:
Ran git checkout origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode conflict --turns 1. Relevant output:

{
  "conversation_id": "20fbfc8b-2730-4795-a02b-fb4edd354bab",
  "llm_extra_headers_after_run": {
    "x-custom-qa": "kept",
    "x-litellm-session-id": "user-provided-session"
  },
  "requests": [
    {"x_custom": "kept", "x_litellm_session_id": "user-provided-session"}
  ]
}

This establishes that user-provided extra_headers are sent through the real SDK request path.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode conflict. Relevant output:

{
  "conversation_id": "e0633603-c499-4c66-abc7-e356ef7b63c5",
  "llm_extra_headers_after_run": {
    "x-custom-qa": "kept",
    "x-litellm-session-id": "user-provided-session"
  },
  "requests": [
    {"x_custom": "kept", "x_litellm_session_id": "user-provided-session"}
  ]
}

This confirms the PR preserves existing custom headers and lets a user-supplied session ID win on conflict.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@csmith49 csmith49 requested a review from all-hands-bot May 29, 2026 15:54
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 29, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat: send x-litellm-session-id header for conversation-level routing affinity

The implementation is clean and correctly follows the _pin_prompt_cache_key() pattern used for OpenAI prefix-cache sharding. The double-injection guard ("x-litellm-session-id" not in existing) and user-header-wins semantics are both correct. Two gaps are worth addressing:

  1. switch_llm() / switch_profile() miss the injection. _pin_prompt_cache_key() is explicitly called again in switch_llm() to re-apply the prompt-cache pin after an LLM swap. The session-affinity header injection has no equivalent hook, so any LLM installed via switch_llm() mid-conversation will be missing x-litellm-session-id and the affinity will silently break for the remainder of that conversation.

  2. Call-site extra_headers bypass. In select_chat_options(), if any caller passes extra_headers explicitly as a completion kwarg, the entire llm.extra_headers dict (including the injected session ID) is skipped in favor of the caller-supplied dict. The OpenRouter-header injection in that same function sidesteps this by merging after the user dict is applied ({**openrouter_headers, **existing}). The session ID would benefit from the same treatment to survive call-site overrides — though for typical conversations that never pass extra_headers at the call level this is a non-issue.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified a real SDK Conversation against a local OpenAI-compatible capture server: the PR sends the conversation ID as x-litellm-session-id on repeated LLM calls and preserves explicit user headers.

Does this PR achieve its stated goal?

Yes. The stated goal is to add conversation-level routing affinity by sending x-litellm-session-id: <conversation_id> on LLM requests. In a real two-turn Conversation.run() flow, the base branch made two /v1/chat/completions calls with no session header, while PR commit b8532c1e8508c7e9c7c2f65281512f2571133f7d made the same two calls with the header set to the exact conversation ID both times.

Phase Result
Environment Setup make build completed successfully
CI Status 🟡 Observed 32 successful checks, 3 skipped, and 1 pending qa-changes check
Functional Verification ✅ Real SDK conversation execution produced the expected HTTP headers
Functional Verification

Test 1: Conversation-level session header on repeated LLM calls

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran a real SDK conversation against a local HTTP server that implements enough of the OpenAI chat completions API to capture incoming request headers:

git switch --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "d02134ed-40a7-476d-9a99-857f9e138eaa",
  "matches_conversation_id": false,
  "paths": ["/v1/chat/completions", "/v1/chat/completions"],
  "request_count": 2,
  "x_litellm_session_id_values": [null, null]
}

This confirms the old behavior: two real LLM HTTP requests were made during one conversation, but neither included x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out PR commit b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran the same SDK conversation exercise:

git switch --detach b8532c1e8508c7e9c7c2f65281512f2571133f7d
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "f8349153-2baa-45e2-937c-fd869d59a5ac",
  "matches_conversation_id": true,
  "paths": ["/v1/chat/completions", "/v1/chat/completions"],
  "request_count": 2,
  "x_litellm_session_id_values": [
    "f8349153-2baa-45e2-937c-fd869d59a5ac",
    "f8349153-2baa-45e2-937c-fd869d59a5ac"
  ]
}

This confirms the PR behavior: both LLM requests in the same conversation carried the same x-litellm-session-id, and the value exactly matched the SDK conversation ID.

Test 2: Existing user-supplied headers are preserved

With the PR commit still checked out, I ran the same real conversation while configuring the LLM with extra_headers={"x-litellm-session-id": "manual-session-qa", "x-custom-qa-header": "kept"}:

QA_PRESET_SESSION_ID=manual-session-qa OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "0a559190-086e-4460-8c75-975ed6197093",
  "request_count": 2,
  "x_custom_qa_header_values": ["kept", "kept"],
  "x_litellm_session_id_values": ["manual-session-qa", "manual-session-qa"]
}

This confirms the documented conflict behavior: explicit user headers were not overwritten, and unrelated custom headers continued to flow to the LLM endpoint.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Extract _pin_session_affinity_header() helper and call it from both
_ensure_plugins_loaded (initial setup) and switch_llm (mid-conversation
model swap). Without this, switching models mid-conversation would
silently lose the x-litellm-session-id header.

Also documents the known limitation that call-site extra_headers kwargs
bypass llm.extra_headers in select_chat_options — a pre-existing issue
shared with OpenRouter headers, out of scope for this PR.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 requested a review from all-hands-bot May 29, 2026 16:16
Copy link
Copy Markdown
Collaborator

all-hands-bot commented May 29, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified with real SDK conversations against a local OpenAI-compatible HTTP endpoint: the PR now sends a stable x-litellm-session-id per conversation, preserves caller-supplied headers, and also covers switched LLMs.

Does this PR achieve its stated goal?

Yes. The PR's goal is to send x-litellm-session-id: <conversation_id> on LLM requests so LiteLLM can apply conversation-level routing affinity. In the baseline (origin/main), two real conversation turns sent no such header; on this PR, the same two turns sent the conversation UUID on both requests. I also verified the user-supplied header conflict case and the switch_llm() path.

Phase Result
Environment Setup make build completed successfully; no tests/linters were run.
CI Status ⚠️ 31 success, 3 skipped; unresolved-review-threads failing and qa-changes in progress at review time.
Functional Verification ✅ Real SDK Conversation.run() calls sent the expected headers on repeated and switched LLM requests.
Functional Verification

Test 1: Conversation-level header appears on all LLM requests

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_litellm_session_affinity.py, which creates a real Agent/Conversation, points its LLM at a local OpenAI-compatible HTTP server, sends two user messages, and records the actual HTTP request headers:

{
  "captured_headers": [
    null,
    null
  ],
  "conversation_id": "89f6f4b1-0335-4c55-95a1-7c9a3a303fca",
  "x_header_keys": [
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms the old behavior: real LLM HTTP requests did not include x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at 06b3beb7f919a68dd2f3e7bb8deb574014b7705f.

Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_litellm_session_affinity.py:

{
  "captured_headers": [
    "15c29138-188c-45ec-8311-779cdd48ec42",
    "15c29138-188c-45ec-8311-779cdd48ec42"
  ],
  "conversation_id": "15c29138-188c-45ec-8311-779cdd48ec42",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms the new behavior: both LLM requests in the same conversation carry the same session-affinity header, and the value matches conversation.state.id.

Test 2: Caller-supplied extra_headers still win

Ran uv run python /tmp/qa_litellm_session_affinity.py --preset-header user-session on the PR branch:

{
  "captured_headers": [
    "user-session",
    "user-session"
  ],
  "conversation_id": "eb57b806-2d6a-42c0-b0c9-10dd971543a8",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms existing user-supplied x-litellm-session-id values are preserved rather than overwritten.

Test 3: switch_llm() requests also receive the conversation header

Ran uv run python /tmp/qa_litellm_session_affinity.py --switch-llm on the PR branch:

{
  "captured_headers": [
    "331ede81-7cd8-4c78-afbe-b2cdc54029c1",
    "331ede81-7cd8-4c78-afbe-b2cdc54029c1"
  ],
  "conversation_id": "331ede81-7cd8-4c78-afbe-b2cdc54029c1",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms a real conversation using a switched LLM still sends the session-affinity header on subsequent LLM requests.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat: send x-litellm-session-id header for conversation-level routing affinity

This is a well-scoped, correctly implemented feature. The implementation mirrors the _pin_prompt_cache_key() pattern exactly and handles the key edge cases. Approving.

What's well done

Guard semantics: if "x-litellm-session-id" not in existing ensures user-supplied values always win. The new dict literal {"x-litellm-session-id": ..., **existing} also provides a redundant layer of safety — even if the check somehow failed, the spread would override the leading value — though this path is unreachable in practice.

switch_profile() is transitively covered: The previous code review flagged both switch_llm() and switch_profile() as gaps. switch_llm() now calls _pin_session_affinity_header(new_llm) directly, and switch_profile() is covered transitively because it unconditionally delegates to self.switch_llm(cached) (line 786). No separate fix is needed there.

Idempotent helper: _pin_session_affinity_header is safe to call multiple times. This matters because in _ensure_agent_ready it is placed outside the if llm.usage_id not in registered: guard — every LLM in the loop gets the header pinned regardless of whether it was freshly registered or already existed.

Honest docstring: The note about select_chat_options bypassing llm.extra_headers when extra_headers is passed directly to completion() is the right call for transparency. Deferring that fix as out of scope is appropriate.

Minor suggestion

No unit test directly covers _pin_session_affinity_header. The two QA reports provide strong functional evidence, but a small parametrized test — (1) default injection, (2) user-supplied value preserved, (3) other existing headers retained — would guard against future regressions without needing a full conversation.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Remove note about fixing select_chat_options limitation.
@enyst
Copy link
Copy Markdown
Member

enyst commented May 29, 2026

@OpenHands Understand the goal of this PR, investigate on the internet if you need; and /codereview this PR. WDYT?

Respond directly on github as a review, with gh api and set 'event' to any of approved, comment, request-changes as you think. Think well and don't make your choice on nitpicking though.

Mention me in the review so I can see it too.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 29, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Member

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enyst I checked the LiteLLM side too: the proxy extracts x-litellm-session-id into metadata.session_id, and the router's session_affinity pre-call check uses that session key for deployment affinity. So the PR goal and header name are right.

🟡 Taste Rating: Acceptable, but there is one real correctness issue.

I’m leaving this as a COMMENT rather than REQUEST_CHANGES per this repo’s review guidance, but I would not merge as-is. The implementation stores conversation-scoped routing state by mutating LLM.extra_headers, and those LLM/Agent objects can be reused across conversations. See inline.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

The diff is small and the feature is well motivated, but it changes LLM request routing behavior and currently has a silent cross-conversation affinity failure mode when callers reuse SDK objects. It also plausibly affects benchmark/eval behavior through LiteLLM routing/cache locality, so I would want a human maintainer to decide on validation after the correctness fix.

VERDICT:
Needs rework: fix the shared-LLM/session-id reuse case before merging.

KEY INSIGHT:
Conversation-scoped routing state should not be persisted as an unmarked mutation on a potentially shared LLM config object.

This review was created by an AI agent (OpenHands) on behalf of @enyst.

entirely — the same limitation that affects OpenRouter headers.
"""
existing = llm.extra_headers or {}
if "x-litellm-session-id" not in existing:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard conflates a truly user-supplied x-litellm-session-id with one that this helper injected into a shared LLM object for a previous conversation.

LocalConversation stores the passed agent/llm by reference. If an application reuses the same Agent (or LLM) for two conversations, the first conversation mutates llm.extra_headers; the second conversation sees the existing key here and keeps the first conversation's ID. I reproduced that locally with:

llm = LLM(model="openai/gpt-4o-mini", api_key="test")
agent = Agent(llm=llm, tools=[])
conv1 = LocalConversation(agent=agent, workspace=tmp1)
conv1._ensure_agent_ready()
conv2 = LocalConversation(agent=agent, workspace=tmp2)
conv2._ensure_agent_ready()
assert agent.llm.extra_headers["x-litellm-session-id"] == str(conv2.id)  # fails; still conv1.id

That silently pins conversation 2 to conversation 1's LiteLLM affinity bucket, defeating the per-conversation routing guarantee and potentially mixing unrelated conversations onto the same upstream/cache shard.

Please either avoid mutating shared LLM objects for this conversation-scoped value, or track enough state to distinguish an SDK-injected header from a caller-supplied conflict. A regression test for reusing the same Agent/LLM across two LocalConversations would catch this.

@openhands-ai

This comment was marked as duplicate.

Copy link
Copy Markdown
Member

enyst commented May 29, 2026

Following up on the shared-LLM question: I don't think this is only theoretical.

References from this repo:

  • LocalConversation keeps the caller's agent by reference (self.agent = agent), not a defensive copy: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:197.
  • get_default_agent() and get_planning_agent() put the passed llm directly into the returned Agent: openhands-tools/openhands/tools/preset/default.py:87-89, openhands-tools/openhands/tools/preset/planning.py:169-171.
  • examples/01_standalone_sdk/24_planning_agent_workflow.py:44-49,67-72,92-98 creates one llm, then uses it for both a planning conversation and an execution conversation.
  • examples/01_standalone_sdk/31_iterative_refinement.py:373-378,408-412,425-428 creates one llm, then repeatedly creates separate refactoring/critique conversations from it.
  • LocalConversation.fork() already documents this aliasing hazard for _prompt_cache_key: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:389-392 says the fork deep-copies the agent because __init__ mutates agent.llm._prompt_cache_key and a shared/aliased agent would clobber another conversation's cache key.

My thought: adding x-litellm-session-id by mutating LLM.extra_headers has the same aliasing shape as _prompt_cache_key, but the value is even more explicitly conversation-scoped. So if one shared LLM is used in conversation A and then conversation B, the current if "x-litellm-session-id" not in existing guard can preserve A's injected ID as if it were user-supplied, and B will route under A's session affinity bucket.

I think the fix should either make the injected header per-call/per-conversation instead of persistent on the shared LLM, or mark/track SDK-injected values so a later conversation can safely replace only SDK-injected values while still preserving truly user-supplied headers.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@enyst
Copy link
Copy Markdown
Member

enyst commented May 29, 2026

HUMAN:

LocalConversation.fork() already documents this aliasing hazard for _prompt_cache_key: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:389-392 says the fork deep-copies the agent because init mutates agent.llm._prompt_cache_key and a shared/aliased agent would clobber another conversation's cache key.

This precedent, and multiple examples with shared LLM, seem like maybe it's worth checking if this is indeed a issue? Like the agent had a small test script in this comment #3437 (comment)

WDYT?

xingyaoww
xingyaoww previously approved these changes May 29, 2026
Copy link
Copy Markdown
Member

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is greta!

@xingyaoww xingyaoww dismissed their stale review May 29, 2026 17:37

Ah yes, i think we should at least add a test following @enyst's test here
#3437 (comment)

@csmith49
Copy link
Copy Markdown
Collaborator Author

@enyst and @xingyaoww: been looking into this aliasing issue a bit. Definitely a real edge case, but I'm going to suggest fixing it in another PR.

When the session IDs would get weirdly aliased, so would the prompt cache keys. Since these values are doing pretty much the same thing (linking convos to IDs to keep them associated with a particular node) I'd want to keep their behavior identical, and since the prompt cache keys have already been subject to this aliasing for a while it seems the harms are not severe.

But we should definitely fix it. Makes sense to keep these conversation-scoped values outside the shared state in LLM as much as possible. The fix options are all a little fiddly, but here's an issue tracking it that we can resolve separately: #3443

@neubig
Copy link
Copy Markdown
Member

neubig commented May 30, 2026

I agree that it's probably ok to fix in a follow up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants