feat: send x-litellm-session-id header for conversation-level routing affinity by csmith49 · Pull Request #3437 · OpenHands/software-agent-sdk

csmith49 · 2026-05-29T15:40:00Z

Summary

Every LLM in a conversation now sends x-litellm-session-id: <conversation_id> on all requests. When the LiteLLM proxy has session_affinity enabled in its router settings, this pins all requests within a conversation to the same upstream deployment.

Motivation

LiteLLM supports weighted load balancing across multiple deployments of the same model (e.g., spreading traffic across providers or regions). Without session affinity, each request is independently routed by weight, so a single conversation can bounce between deployments mid-stream. This breaks KV prefix caching (the new deployment has a cold cache for the conversation's history) and can cause subtle behavior differences if deployments run different versions or quantizations.

With this header, the proxy caches the routing decision per session ID. The first request picks a deployment by weight; all subsequent requests in the same conversation stick to it. Conversations still distribute across deployments probabilistically — it's just stable per-conversation rather than per-request.

Implementation

Injected in _ensure_plugins_loaded() alongside the existing LLM registry loop, following the same pattern as _pin_prompt_cache_key() (which already uses conversation_id to shard OpenAI prefix caches). User-supplied extra_headers are preserved and win on conflict. The not in existing guard prevents double-injection.

No-op when unused

If the LiteLLM proxy doesn't have session_affinity enabled, the header is ignored — no behavioral change.

This PR was created by an AI agent (OpenHands) on behalf of @calvinsmith.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9c6449c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9c6449c-python \
  ghcr.io/openhands/agent-server:9c6449c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9c6449c-golang-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang-amd64
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9c6449c-golang-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang-arm64
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9c6449c-java-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java-amd64
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9c6449c-java-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java-arm64
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9c6449c-python-amd64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python-amd64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python-amd64
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:9c6449c-python-arm64
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python-arm64
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python-arm64
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:9c6449c-golang
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-golang
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-golang
ghcr.io/openhands/agent-server:9c6449c-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:9c6449c-java
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-java
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-java
ghcr.io/openhands/agent-server:9c6449c-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:9c6449c-python
ghcr.io/openhands/agent-server:9c6449cc46490c99add4a3613501fa0a60927616-python
ghcr.io/openhands/agent-server:csmith-litellm-session-affinity-python
ghcr.io/openhands/agent-server:9c6449c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 9c6449c-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 9c6449c-python-amd64) are also available if needed

Every LLM in a conversation now carries the conversation ID as x-litellm-session-id in extra_headers. The LiteLLM proxy uses this to pin all requests in a conversation to the same upstream deployment (e.g., native MiniMax API vs self-hosted vLLM on Modal). Injected in _ensure_plugins_loaded alongside existing LLM registration, following the same pattern as _pin_prompt_cache_key. User-supplied extra_headers are preserved; the guard prevents double-injection. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-29T15:40:33Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-29T15:40:41Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

✅ QA Report: PASS

Verified the SDK Conversation/LLM request path against a local OpenAI-compatible endpoint: the base branch sent no session header, while this PR sent a stable x-litellm-session-id on every observed LLM request and preserved user-supplied headers.

Does this PR achieve its stated goal?

Yes. The PR’s stated goal is to send x-litellm-session-id: <conversation_id> for conversation-level LiteLLM routing affinity while preserving existing extra_headers. A two-turn Conversation.run() probe captured two real /v1/chat/completions POSTs from the same conversation, and both carried x-litellm-session-id equal to that conversation ID; the same probe on origin/main captured null for that header. I also verified the conflict case: when a user supplied x-litellm-session-id, the request kept the user value and retained another custom header.

Phase	Result
Environment Setup	✅ `uv 0.11.17` and `make build` completed successfully
CI Status	🟡 GitHub reported 20 successful, 1 skipped, 8 pending, 0 failing checks at review time
Functional Verification	✅ Real SDK conversation traffic showed the expected header behavior

Functional Verification

Test 1: Conversation session header is sent on all observed LLM requests

Step 1 — Establish baseline without the fix:
Ran git checkout origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode default --turns 2 against a local OpenAI-compatible capture server. Relevant output:

{
  "conversation_id": "ddaa2ffe-d903-446d-bdd5-dcf8c4b60c79",
  "llm_extra_headers_after_run": null,
  "request_count": 2,
  "requests": [
    {"path": "/v1/chat/completions", "x_litellm_session_id": null},
    {"path": "/v1/chat/completions", "x_litellm_session_id": null}
  ]
}

This confirms the old behavior: two real SDK LLM calls completed, but neither request included x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode default --turns 2. Relevant output:

{
  "conversation_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5",
  "llm_extra_headers_after_run": {
    "x-litellm-session-id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"
  },
  "request_count": 2,
  "requests": [
    {"path": "/v1/chat/completions", "x_litellm_session_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"},
    {"path": "/v1/chat/completions", "x_litellm_session_id": "31441b8a-b2be-46db-b71d-51ad584bd1a5"}
  ]
}

This shows the PR works for the core claim: both LLM POSTs in the same conversation carried the same session ID, and that value matched the SDK conversation ID.

Test 2: User-supplied `x-litellm-session-id` is not overwritten

Step 1 — Establish baseline without the fix:
Ran git checkout origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode conflict --turns 1. Relevant output:

{
  "conversation_id": "20fbfc8b-2730-4795-a02b-fb4edd354bab",
  "llm_extra_headers_after_run": {
    "x-custom-qa": "kept",
    "x-litellm-session-id": "user-provided-session"
  },
  "requests": [
    {"x_custom": "kept", "x_litellm_session_id": "user-provided-session"}
  ]
}

This establishes that user-provided extra_headers are sent through the real SDK request path.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_affinity_probe.py --mode conflict. Relevant output:

{
  "conversation_id": "e0633603-c499-4c66-abc7-e356ef7b63c5",
  "llm_extra_headers_after_run": {
    "x-custom-qa": "kept",
    "x-litellm-session-id": "user-provided-session"
  },
  "requests": [
    {"x_custom": "kept", "x_litellm_session_id": "user-provided-session"}
  ]
}

This confirms the PR preserves existing custom headers and lets a user-supplied session ID win on conflict.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-05-29T15:55:49Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review: `feat: send x-litellm-session-id header for conversation-level routing affinity`

The implementation is clean and correctly follows the _pin_prompt_cache_key() pattern used for OpenAI prefix-cache sharding. The double-injection guard ("x-litellm-session-id" not in existing) and user-header-wins semantics are both correct. Two gaps are worth addressing:

switch_llm() / switch_profile() miss the injection. _pin_prompt_cache_key() is explicitly called again in switch_llm() to re-apply the prompt-cache pin after an LLM swap. The session-affinity header injection has no equivalent hook, so any LLM installed via switch_llm() mid-conversation will be missing x-litellm-session-id and the affinity will silently break for the remainder of that conversation.
Call-site extra_headers bypass. In select_chat_options(), if any caller passes extra_headers explicitly as a completion kwarg, the entire llm.extra_headers dict (including the injected session ID) is skipped in favor of the caller-supplied dict. The OpenRouter-header injection in that same function sidesteps this by merging after the user dict is applied ({**openrouter_headers, **existing}). The session ID would benefit from the same treatment to survive call-site overrides — though for typical conversations that never pass extra_headers at the call level this is a non-issue.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

✅ QA Report: PASS

Verified a real SDK Conversation against a local OpenAI-compatible capture server: the PR sends the conversation ID as x-litellm-session-id on repeated LLM calls and preserves explicit user headers.

Does this PR achieve its stated goal?

Yes. The stated goal is to add conversation-level routing affinity by sending x-litellm-session-id: <conversation_id> on LLM requests. In a real two-turn Conversation.run() flow, the base branch made two /v1/chat/completions calls with no session header, while PR commit b8532c1e8508c7e9c7c2f65281512f2571133f7d made the same two calls with the header set to the exact conversation ID both times.

Phase	Result
Environment Setup	✅ `make build` completed successfully
CI Status	🟡 Observed 32 successful checks, 3 skipped, and 1 pending `qa-changes` check
Functional Verification	✅ Real SDK conversation execution produced the expected HTTP headers

Functional Verification

Test 1: Conversation-level session header on repeated LLM calls

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran a real SDK conversation against a local HTTP server that implements enough of the OpenAI chat completions API to capture incoming request headers:

git switch --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "d02134ed-40a7-476d-9a99-857f9e138eaa",
  "matches_conversation_id": false,
  "paths": ["/v1/chat/completions", "/v1/chat/completions"],
  "request_count": 2,
  "x_litellm_session_id_values": [null, null]
}

This confirms the old behavior: two real LLM HTTP requests were made during one conversation, but neither included x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out PR commit b8532c1e8508c7e9c7c2f65281512f2571133f7d.

Step 3 — Re-run with the fix in place:
Ran the same SDK conversation exercise:

git switch --detach b8532c1e8508c7e9c7c2f65281512f2571133f7d
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "f8349153-2baa-45e2-937c-fd869d59a5ac",
  "matches_conversation_id": true,
  "paths": ["/v1/chat/completions", "/v1/chat/completions"],
  "request_count": 2,
  "x_litellm_session_id_values": [
    "f8349153-2baa-45e2-937c-fd869d59a5ac",
    "f8349153-2baa-45e2-937c-fd869d59a5ac"
  ]
}

This confirms the PR behavior: both LLM requests in the same conversation carried the same x-litellm-session-id, and the value exactly matched the SDK conversation ID.

Test 2: Existing user-supplied headers are preserved

With the PR commit still checked out, I ran the same real conversation while configuring the LLM with extra_headers={"x-litellm-session-id": "manual-session-qa", "x-custom-qa-header": "kept"}:

QA_PRESET_SESSION_ID=manual-session-qa OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_litellm_session_affinity.py

Concise captured result:

{
  "conversation_id": "0a559190-086e-4460-8c75-975ed6197093",
  "request_count": 2,
  "x_custom_qa_header_values": ["kept", "kept"],
  "x_litellm_session_id_values": ["manual-session-qa", "manual-session-qa"]
}

This confirms the documented conflict behavior: explicit user headers were not overwritten, and unrelated custom headers continued to flow to the LLM endpoint.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Extract _pin_session_affinity_header() helper and call it from both _ensure_plugins_loaded (initial setup) and switch_llm (mid-conversation model swap). Without this, switching models mid-conversation would silently lose the x-litellm-session-id header. Also documents the known limitation that call-site extra_headers kwargs bypass llm.extra_headers in select_chat_options — a pre-existing issue shared with OpenRouter headers, out of scope for this PR. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot · 2026-05-29T16:17:01Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

✅ QA Report: PASS

Verified with real SDK conversations against a local OpenAI-compatible HTTP endpoint: the PR now sends a stable x-litellm-session-id per conversation, preserves caller-supplied headers, and also covers switched LLMs.

Does this PR achieve its stated goal?

Yes. The PR's goal is to send x-litellm-session-id: <conversation_id> on LLM requests so LiteLLM can apply conversation-level routing affinity. In the baseline (origin/main), two real conversation turns sent no such header; on this PR, the same two turns sent the conversation UUID on both requests. I also verified the user-supplied header conflict case and the switch_llm() path.

Phase	Result
Environment Setup	✅ `make build` completed successfully; no tests/linters were run.
CI Status	⚠️ 31 success, 3 skipped; `unresolved-review-threads` failing and `qa-changes` in progress at review time.
Functional Verification	✅ Real SDK `Conversation.run()` calls sent the expected headers on repeated and switched LLM requests.

Functional Verification

Test 1: Conversation-level header appears on all LLM requests

Step 1 — Establish baseline without the fix:
Checked out origin/main and ran uv run python /tmp/qa_litellm_session_affinity.py, which creates a real Agent/Conversation, points its LLM at a local OpenAI-compatible HTTP server, sends two user messages, and records the actual HTTP request headers:

{
  "captured_headers": [
    null,
    null
  ],
  "conversation_id": "89f6f4b1-0335-4c55-95a1-7c9a3a303fca",
  "x_header_keys": [
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms the old behavior: real LLM HTTP requests did not include x-litellm-session-id.

Step 2 — Apply the PR's changes:
Checked out csmith/litellm-session-affinity at 06b3beb7f919a68dd2f3e7bb8deb574014b7705f.

Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_litellm_session_affinity.py:

{
  "captured_headers": [
    "15c29138-188c-45ec-8311-779cdd48ec42",
    "15c29138-188c-45ec-8311-779cdd48ec42"
  ],
  "conversation_id": "15c29138-188c-45ec-8311-779cdd48ec42",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms the new behavior: both LLM requests in the same conversation carry the same session-affinity header, and the value matches conversation.state.id.

Test 2: Caller-supplied `extra_headers` still win

Ran uv run python /tmp/qa_litellm_session_affinity.py --preset-header user-session on the PR branch:

{
  "captured_headers": [
    "user-session",
    "user-session"
  ],
  "conversation_id": "eb57b806-2d6a-42c0-b0c9-10dd971543a8",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms existing user-supplied x-litellm-session-id values are preserved rather than overwritten.

Test 3: `switch_llm()` requests also receive the conversation header

Ran uv run python /tmp/qa_litellm_session_affinity.py --switch-llm on the PR branch:

{
  "captured_headers": [
    "331ede81-7cd8-4c78-afbe-b2cdc54029c1",
    "331ede81-7cd8-4c78-afbe-b2cdc54029c1"
  ],
  "conversation_id": "331ede81-7cd8-4c78-afbe-b2cdc54029c1",
  "x_header_keys": [
    "x-litellm-session-id",
    "x-stainless-arch",
    "x-stainless-async",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-title"
  ]
}

This confirms a real conversation using a switched LLM still sends the session-affinity header on subsequent LLM requests.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot

Code Review: `feat: send x-litellm-session-id header for conversation-level routing affinity`

This is a well-scoped, correctly implemented feature. The implementation mirrors the _pin_prompt_cache_key() pattern exactly and handles the key edge cases. Approving.

What's well done

Guard semantics: if "x-litellm-session-id" not in existing ensures user-supplied values always win. The new dict literal {"x-litellm-session-id": ..., **existing} also provides a redundant layer of safety — even if the check somehow failed, the spread would override the leading value — though this path is unreachable in practice.

switch_profile() is transitively covered: The previous code review flagged both switch_llm() and switch_profile() as gaps. switch_llm() now calls _pin_session_affinity_header(new_llm) directly, and switch_profile() is covered transitively because it unconditionally delegates to self.switch_llm(cached) (line 786). No separate fix is needed there.

Idempotent helper: _pin_session_affinity_header is safe to call multiple times. This matters because in _ensure_agent_ready it is placed outside the if llm.usage_id not in registered: guard — every LLM in the loop gets the header pinned regardless of whether it was freshly registered or already existed.

Honest docstring: The note about select_chat_options bypassing llm.extra_headers when extra_headers is passed directly to completion() is the right call for transparency. Deferring that fix as out of scope is appropriate.

Minor suggestion

No unit test directly covers _pin_session_affinity_header. The two QA reports provide strong functional evidence, but a small parametrized test — (1) default injection, (2) user-supplied value preserved, (3) other existing headers retained — would guard against future regressions without needing a full conversation.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Remove note about fixing select_chat_options limitation.

enyst · 2026-05-29T17:17:54Z

@OpenHands Understand the goal of this PR, investigate on the internet if you need; and /codereview this PR. WDYT?

Respond directly on github as a review, with gh api and set 'event' to any of approved, comment, request-changes as you think. Think well and don't make your choice on nitpicking though.

Mention me in the review so I can see it too.

openhands-ai · 2026-05-29T17:18:10Z

I'm on it! enyst can track my progress at all-hands.dev

enyst

@enyst I checked the LiteLLM side too: the proxy extracts x-litellm-session-id into metadata.session_id, and the router's session_affinity pre-call check uses that session key for deployment affinity. So the PR goal and header name are right.

🟡 Taste Rating: Acceptable, but there is one real correctness issue.

I’m leaving this as a COMMENT rather than REQUEST_CHANGES per this repo’s review guidance, but I would not merge as-is. The implementation stores conversation-scoped routing state by mutating LLM.extra_headers, and those LLM/Agent objects can be reused across conversations. See inline.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

The diff is small and the feature is well motivated, but it changes LLM request routing behavior and currently has a silent cross-conversation affinity failure mode when callers reuse SDK objects. It also plausibly affects benchmark/eval behavior through LiteLLM routing/cache locality, so I would want a human maintainer to decide on validation after the correctness fix.

VERDICT:
❌ Needs rework: fix the shared-LLM/session-id reuse case before merging.

KEY INSIGHT:
Conversation-scoped routing state should not be persisted as an unmarked mutation on a potentially shared LLM config object.

This review was created by an AI agent (OpenHands) on behalf of @enyst.

enyst · 2026-05-29T17:22:46Z

+        entirely — the same limitation that affects OpenRouter headers.
+        """
+        existing = llm.extra_headers or {}
+        if "x-litellm-session-id" not in existing:


This guard conflates a truly user-supplied x-litellm-session-id with one that this helper injected into a shared LLM object for a previous conversation.

LocalConversation stores the passed agent/llm by reference. If an application reuses the same Agent (or LLM) for two conversations, the first conversation mutates llm.extra_headers; the second conversation sees the existing key here and keeps the first conversation's ID. I reproduced that locally with:

llm = LLM(model="openai/gpt-4o-mini", api_key="test") agent = Agent(llm=llm, tools=[]) conv1 = LocalConversation(agent=agent, workspace=tmp1) conv1._ensure_agent_ready() conv2 = LocalConversation(agent=agent, workspace=tmp2) conv2._ensure_agent_ready() assert agent.llm.extra_headers["x-litellm-session-id"] == str(conv2.id) # fails; still conv1.id

That silently pins conversation 2 to conversation 1's LiteLLM affinity bucket, defeating the per-conversation routing guarantee and potentially mixing unrelated conversations onto the same upstream/cache shard.

Please either avoid mutating shared LLM objects for this conversation-scoped value, or track enough state to distinguish an SDK-injected header from a caller-supplied conflict. A regression test for reusing the same Agent/LLM across two LocalConversations would catch this.

enyst · 2026-05-29T17:31:31Z

Following up on the shared-LLM question: I don't think this is only theoretical.

References from this repo:

LocalConversation keeps the caller's agent by reference (self.agent = agent), not a defensive copy: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:197.
get_default_agent() and get_planning_agent() put the passed llm directly into the returned Agent: openhands-tools/openhands/tools/preset/default.py:87-89, openhands-tools/openhands/tools/preset/planning.py:169-171.
examples/01_standalone_sdk/24_planning_agent_workflow.py:44-49,67-72,92-98 creates one llm, then uses it for both a planning conversation and an execution conversation.
examples/01_standalone_sdk/31_iterative_refinement.py:373-378,408-412,425-428 creates one llm, then repeatedly creates separate refactoring/critique conversations from it.
LocalConversation.fork() already documents this aliasing hazard for _prompt_cache_key: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:389-392 says the fork deep-copies the agent because __init__ mutates agent.llm._prompt_cache_key and a shared/aliased agent would clobber another conversation's cache key.

My thought: adding x-litellm-session-id by mutating LLM.extra_headers has the same aliasing shape as _prompt_cache_key, but the value is even more explicitly conversation-scoped. So if one shared LLM is used in conversation A and then conversation B, the current if "x-litellm-session-id" not in existing guard can preserve A's injected ID as if it were user-supplied, and B will route under A's session affinity bucket.

I think the fix should either make the injected header per-call/per-conversation instead of persistent on the shared LLM, or mark/track SDK-injected values so a later conversation can safely replace only SDK-injected values while still preserving truly user-supplied headers.

This comment was created by an AI agent (OpenHands) on behalf of the user.

enyst · 2026-05-29T17:35:32Z

HUMAN:

LocalConversation.fork() already documents this aliasing hazard for _prompt_cache_key: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:389-392 says the fork deep-copies the agent because init mutates agent.llm._prompt_cache_key and a shared/aliased agent would clobber another conversation's cache key.

This precedent, and multiple examples with shared LLM, seem like maybe it's worth checking if this is indeed a issue? Like the agent had a small test script in this comment #3437 (comment)

WDYT?

xingyaoww

LGTM! This is greta!

@enyst

Ah yes, i think we should at least add a test following @enyst's test here
#3437 (comment)

csmith49 · 2026-05-29T18:35:28Z

@enyst and @xingyaoww: been looking into this aliasing issue a bit. Definitely a real edge case, but I'm going to suggest fixing it in another PR.

When the session IDs would get weirdly aliased, so would the prompt cache keys. Since these values are doing pretty much the same thing (linking convos to IDs to keep them associated with a particular node) I'd want to keep their behavior identical, and since the prompt cache keys have already been subject to this aliasing for a while it seems the harms are not severe.

But we should definitely fix it. Makes sense to keep these conversation-scoped values outside the shared state in LLM as much as possible. The fix options are all a little fiddly, but here's an issue tracking it that we can resolve separately: #3443

neubig · 2026-05-30T04:20:54Z

I agree that it's probably ok to fix in a follow up!

all-hands-bot reviewed May 29, 2026

View reviewed changes

csmith49 requested a review from all-hands-bot May 29, 2026 15:54

all-hands-bot reviewed May 29, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated

all-hands-bot reviewed May 29, 2026

View reviewed changes

csmith49 requested a review from all-hands-bot May 29, 2026 16:16

all-hands-bot reviewed May 29, 2026

View reviewed changes

all-hands-bot approved these changes May 29, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

Remove unnecessary note in local_conversation.py

9c6449c

Remove note about fixing select_chat_options limitation.

neubig approved these changes May 29, 2026

View reviewed changes

enyst reviewed May 29, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

xingyaoww previously approved these changes May 29, 2026

View reviewed changes

csmith49 mentioned this pull request May 29, 2026

Shared LLM/Agent objects across conversations inherit stale per-conversation state #3443

Open

Conversation

csmith49 commented May 29, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Implementation

No-op when unused

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Conversation session header is sent on all observed LLM requests

Test 2: User-supplied x-litellm-session-id is not overwritten

Issues Found

Uh oh!

all-hands-bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: feat: send x-litellm-session-id header for conversation-level routing affinity

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Conversation-level session header on repeated LLM calls

Test 2: Existing user-supplied headers are preserved

Issues Found

Uh oh!

all-hands-bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Conversation-level header appears on all LLM requests

Test 2: Caller-supplied extra_headers still win

Test 3: switch_llm() requests also receive the conversation header

Issues Found

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review: feat: send x-litellm-session-id header for conversation-level routing affinity

What's well done

Minor suggestion

Uh oh!

Uh oh!

enyst commented May 29, 2026

Uh oh!

openhands-ai Bot commented May 29, 2026

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

enyst May 29, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

enyst commented May 29, 2026

Uh oh!

enyst commented May 29, 2026

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

csmith49 commented May 29, 2026

Uh oh!

neubig commented May 30, 2026

csmith49 commented May 29, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

Test 2: User-supplied `x-litellm-session-id` is not overwritten

all-hands-bot commented May 29, 2026 •

edited

Loading

Code Review: `feat: send x-litellm-session-id header for conversation-level routing affinity`

all-hands-bot commented May 29, 2026 •

edited

Loading

Test 2: Caller-supplied `extra_headers` still win

Test 3: `switch_llm()` requests also receive the conversation header

Code Review: `feat: send x-litellm-session-id header for conversation-level routing affinity`