Skip to content

feat(insights): autonomous Agent Improvement Loop — self-scheduling feedback daemon#1278

Open
raphael-solace wants to merge 1 commit intoSolaceLabs:mainfrom
raphael-solace:agent-echo
Open

feat(insights): autonomous Agent Improvement Loop — self-scheduling feedback daemon#1278
raphael-solace wants to merge 1 commit intoSolaceLabs:mainfrom
raphael-solace:agent-echo

Conversation

@raphael-solace
Copy link
Copy Markdown

@raphael-solace raphael-solace commented Mar 27, 2026

The Problem This Solves

Right now, when agents fail, apology-loop, or quietly miss what users are asking for, nobody finds out. The data is all there — task_events, tasks, feedback — but there is no closed loop between what happens in production and what gets fixed.

We have heard this from multiple customers. They build agents, deploy them, and then fly blind. They only discover that a tool is broken or that users keep asking for a capability the agent doesn't have when someone escalates a support ticket. That lag kills trust and slows iteration.

This PR closes that loop.


What This PR Does

Adds an autonomous Agent Improvement Loop daemon — a standard SAM agent that wakes up on a configurable schedule, analyses its own deployment's task history and feedback, and writes a structured improvement report as a persistent artifact.

No human needs to prompt it. No new infrastructure is required. It runs alongside your existing agents and uses only what SAM already persists today.

How it works

Agent startup
    ↓
Tool initializer registers a repeating SAC timer (daily by default)
    ↓
Timer fires → agent self-publishes an A2A request to its own topic
    ↓
LLM calls: query_agent_stats → query_tool_stats → query_recent_failures
    ↓
Writes insights_report_YYYY-MM-DD.md artifact
    ↓
Repeat

The self-scheduling mechanism reuses the exact same add_timer + publish_a2a_message path already used for health checks and agent card publishing. No threads, no external cron, no new services.


Files Changed

File What
src/solace_agent_mesh/agent/tools/agent_insights_tools.py Three read-only query tools + scheduler initializer
src/solace_agent_mesh/agent/tools/__init__.py +1 import line to register the tools
preset/agents/agent_insights.yaml Drop-in agent config — start with sam run
tests/unit/agent/tools/test_agent_insights_tools.py 31 passing unit tests (synthetic SQLite fixtures, no broker required)

The Three Analysis Tools

query_agent_stats

Per-user task counts, completion rate, avg latency, token usage, negative-feedback count. Gives the executive view: is the mesh healthy overall?

query_tool_stats

Reads ToolInvocationStartData / ToolResultData signals already stored in task_events.payload. Computes per-tool error rate and p95 latency. Flags tools at ≥20% error rate as flaky, ≥5 s p95 as slow.

query_recent_failures

Returns failed/cancelled tasks and completed tasks with thumbs-down feedback, with the user's original request text. This is the signal for missing capabilities: when users ask for something and the agent can't do it, the pattern shows up here.


What the Report Looks Like

Each run saves a insights_report_YYYY-MM-DD.md artifact structured as:

## Executive Summary
3 of 47 tasks failed in the last 24 hours. Tool error rates are elevated
for web_request. 6 users asked about scheduling-related tasks with no
tool call made.

## Key Metrics
| Metric | Value |
|---|---|
| Total tasks | 47 |
| Completion rate | 93.6% |
| Avg latency | 4.2 s |
| Negative feedback | 2 (4.3%) |

## Issues Detected
- **High** — web_request: 35% error rate on 20 calls
- **Medium** — 6 tasks asked about scheduling, zero tool calls made

## Recommendations
1. Investigate web_request network/timeout config — 7 consecutive failures since 14:00 UTC.
2. Add a calendar/scheduling tool — users consistently ask about meeting scheduling.
3. Review the SQL agent instruction for aggregate queries — 4 of 5 failures share the pattern "total / sum / count".

## Next Steps
Monitor web_request error rate. If still elevated after fix, consider circuit-breaker config.

Configuration

# All defaults shown. Override via env vars.
INSIGHTS_DATABASE_URL=sqlite:///gateway.db   # or postgresql://...
INSIGHTS_INTERVAL_S=86400                    # daily (604800 = weekly, 0 = disabled)
INSIGHTS_LOOKBACK_H=24                       # window per report

To start:

sam run --config preset/agents/agent_insights.yaml

That's it. The agent connects to the broker, registers its timer, and runs autonomously from that point on.


Why This Is a Game Changer

Every agent platform eventually needs to answer: how do I know my agents are actually working well, and how do I improve them without manually reviewing logs?

Today there is no answer in SAM. Customers who ask for observability get told to look at logs. Customers who ask for improvement suggestions get told to review session history manually. This is not scalable and it is not what enterprise customers expect from an AI platform.

This PR turns the deployment model from fire-and-forget into a continuous improvement loop:

  • Flaky tools get flagged before users escalate tickets
  • Missing capabilities get surfaced as data, not guesses
  • The LLM does the log analysis — the same LLM already running the agents — so no external analytics service is needed
  • Reports are artifacts in the same artifact store operators already use
  • The whole thing is opt-in, disabled by default, and adds zero overhead to deployments that don't enable it

We have had direct requests from customers for exactly this: "tell me what my agents can't do", "show me where the mesh is breaking", "give me a weekly report I can act on". This delivers all three with a single YAML file.


Backward Compatibility

  • Off by default — the agent only runs if you start it explicitly with sam run --config preset/agents/agent_insights.yaml
  • No changes to the gateway, broker protocol, or any existing agent
  • No new database tables or migrations
  • No new dependencies — uses only SQLAlchemy and json, both already in pyproject.toml
  • Existing tests: 525 passing, 0 regressions

Test Plan

  • 31 unit tests cover all three tools, the scheduler initializer, timer registration, self-publish payload shape, and all edge cases (empty DB, window filtering, min-call threshold, limit, flaky detection, p95 latency)
  • Tests use in-memory SQLite with synthetic fixtures — no broker, no LLM, no network
  • Full existing test suite: 525 passed, 0 regressions
  • Integration test against a running SAM instance (manual — requires broker + DB)

🤖 Generated with Claude Code

Introduces a self-scheduling feedback-loop agent that continuously
analyses the existing SAM persistence layer (tasks, task_events,
feedback tables) and produces structured improvement reports as
artifacts — no operator intervention required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@raphael-solace
Copy link
Copy Markdown
Author

This is the ticket:
https://sol-jira.atlassian.net/browse/DATAGO-130444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant