Skip to content

fix(telemetry): propagate span context in async generators and fix member agent input tracing#1

Open
Eliozhang wants to merge 1 commit into
mainfrom
fix/telemetry-span-context-propagation
Open

fix(telemetry): propagate span context in async generators and fix member agent input tracing#1
Eliozhang wants to merge 1 commit into
mainfrom
fix/telemetry-span-context-propagation

Conversation

@Eliozhang

Copy link
Copy Markdown
Owner

问题

1. Span context 在 async generator 中丢失

start_as_current_span 返回的 context manager 在 async generator 被 cancel 时,__aexit__ 不保证执行(Python async generator 的已知行为)。这导致:

  • context.detach() 未被调用,span context 丢失
  • 子 span(agent_runcall_llmexecute_tool 等)无法正确解析父 span
  • 链路追踪断裂,无法看到完整的调用链

复现路径: TeamAgent 调用 member agent 时,member agent 的执行是 async generator,cancel 时 span context 丢失。

2. Member agent 的 trace input 不准确

trace_agent 函数始终使用 user_content 记录 agent input,但当 member agent 被 TeamAgent 委托时,user_content 仍然是原始用户发给 leader agent 的内容,而不是 leader agent 转发给 member agent 的 override_messages。这导致 trace 中 member agent 的 input 和实际执行不匹配。

修复

Fix 1: runners.py + agents/_base_agent.py — span context 传播

start_span + context_api.attach/detach 替代 start_as_current_span

  • start_span 创建 span 但不自动设为 current
  • context_api.attach(set_span_in_context(span, current_ctx)) 手动将 span 设为 current,返回 token
  • try/finallyfinally 中调用 context_api.detach(token)
  • 关键: try/finallyCancelledError 下也会执行(PEP 492),而 context manager 的 __aexit__ 不保证

Fix 2: telemetry/_trace.py — override_messages 优先

trace_agent 中优先检查 invocation_context.override_messages

  • 如果存在,从 override_messages 提取 text parts 作为 input
  • 否则回退到原有的 user_content 逻辑

改动文件

文件 改动
trpc_agent_sdk/runners.py start_span + attach/detach 传播 span context
trpc_agent_sdk/agents/_base_agent.py 同上
trpc_agent_sdk/telemetry/_trace.py override_messages 优先于 user_content

测试

  • 本地运行 TeamAgent + Member Agent 场景,确认 span 链路完整
  • Cancel member agent 执行,确认 span 正确 close 且无 detach token error
  • Member agent trace 中 input 为 leader 转发的内容而非原始用户输入
  • 非 TeamAgent 场景(单 agent 直接运行)行为不变

@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch 3 times, most recently from d9731fe to ec1abfa Compare June 11, 2026 03:11
…mber agent input tracing

- Use start_span + attach/detach instead of start_as_current_span in
  runners.py and _base_agent.py to properly propagate span context
  in async generators (CancelledError safe per PEP 492)
- Fix trace_agent to prefer override_messages over user_content when
  tracing member agents delegated by TeamAgent
@Eliozhang Eliozhang force-pushed the fix/telemetry-span-context-propagation branch from ec1abfa to 7971826 Compare June 11, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant