feat(insights): autonomous Agent Improvement Loop — self-scheduling feedback daemon by raphael-solace · Pull Request #1278 · SolaceLabs/solace-agent-mesh

raphael-solace · 2026-03-27T11:20:37Z

The Problem This Solves

Right now, when agents fail, apology-loop, or quietly miss what users are asking for, nobody finds out. The data is all there — task_events, tasks, feedback — but there is no closed loop between what happens in production and what gets fixed.

We have heard this from multiple customers. They build agents, deploy them, and then fly blind. They only discover that a tool is broken or that users keep asking for a capability the agent doesn't have when someone escalates a support ticket. That lag kills trust and slows iteration.

This PR closes that loop.

What This PR Does

Adds an autonomous Agent Improvement Loop daemon — a standard SAM agent that wakes up on a configurable schedule, analyses its own deployment's task history and feedback, and writes a structured improvement report as a persistent artifact.

No human needs to prompt it. No new infrastructure is required. It runs alongside your existing agents and uses only what SAM already persists today.

How it works

Agent startup
    ↓
Tool initializer registers a repeating SAC timer (daily by default)
    ↓
Timer fires → agent self-publishes an A2A request to its own topic
    ↓
LLM calls: query_agent_stats → query_tool_stats → query_recent_failures
    ↓
Writes insights_report_YYYY-MM-DD.md artifact
    ↓
Repeat

The self-scheduling mechanism reuses the exact same add_timer + publish_a2a_message path already used for health checks and agent card publishing. No threads, no external cron, no new services.

Files Changed

File	What
`src/solace_agent_mesh/agent/tools/agent_insights_tools.py`	Three read-only query tools + scheduler initializer
`src/solace_agent_mesh/agent/tools/__init__.py`	+1 import line to register the tools
`preset/agents/agent_insights.yaml`	Drop-in agent config — start with `sam run`
`tests/unit/agent/tools/test_agent_insights_tools.py`	31 passing unit tests (synthetic SQLite fixtures, no broker required)

The Three Analysis Tools

`query_agent_stats`

Per-user task counts, completion rate, avg latency, token usage, negative-feedback count. Gives the executive view: is the mesh healthy overall?

`query_tool_stats`

Reads ToolInvocationStartData / ToolResultData signals already stored in task_events.payload. Computes per-tool error rate and p95 latency. Flags tools at ≥20% error rate as flaky, ≥5 s p95 as slow.

`query_recent_failures`

Returns failed/cancelled tasks and completed tasks with thumbs-down feedback, with the user's original request text. This is the signal for missing capabilities: when users ask for something and the agent can't do it, the pattern shows up here.

What the Report Looks Like

Each run saves a insights_report_YYYY-MM-DD.md artifact structured as:

## Executive Summary
3 of 47 tasks failed in the last 24 hours. Tool error rates are elevated
for web_request. 6 users asked about scheduling-related tasks with no
tool call made.

## Key Metrics
| Metric | Value |
|---|---|
| Total tasks | 47 |
| Completion rate | 93.6% |
| Avg latency | 4.2 s |
| Negative feedback | 2 (4.3%) |

## Issues Detected
- **High** — web_request: 35% error rate on 20 calls
- **Medium** — 6 tasks asked about scheduling, zero tool calls made

## Recommendations
1. Investigate web_request network/timeout config — 7 consecutive failures since 14:00 UTC.
2. Add a calendar/scheduling tool — users consistently ask about meeting scheduling.
3. Review the SQL agent instruction for aggregate queries — 4 of 5 failures share the pattern "total / sum / count".

## Next Steps
Monitor web_request error rate. If still elevated after fix, consider circuit-breaker config.

Configuration

# All defaults shown. Override via env vars.
INSIGHTS_DATABASE_URL=sqlite:///gateway.db   # or postgresql://...
INSIGHTS_INTERVAL_S=86400                    # daily (604800 = weekly, 0 = disabled)
INSIGHTS_LOOKBACK_H=24                       # window per report

To start:

sam run --config preset/agents/agent_insights.yaml

That's it. The agent connects to the broker, registers its timer, and runs autonomously from that point on.

Why This Is a Game Changer

Every agent platform eventually needs to answer: how do I know my agents are actually working well, and how do I improve them without manually reviewing logs?

Today there is no answer in SAM. Customers who ask for observability get told to look at logs. Customers who ask for improvement suggestions get told to review session history manually. This is not scalable and it is not what enterprise customers expect from an AI platform.

This PR turns the deployment model from fire-and-forget into a continuous improvement loop:

Flaky tools get flagged before users escalate tickets
Missing capabilities get surfaced as data, not guesses
The LLM does the log analysis — the same LLM already running the agents — so no external analytics service is needed
Reports are artifacts in the same artifact store operators already use
The whole thing is opt-in, disabled by default, and adds zero overhead to deployments that don't enable it

We have had direct requests from customers for exactly this: "tell me what my agents can't do", "show me where the mesh is breaking", "give me a weekly report I can act on". This delivers all three with a single YAML file.

Backward Compatibility

Off by default — the agent only runs if you start it explicitly with sam run --config preset/agents/agent_insights.yaml
No changes to the gateway, broker protocol, or any existing agent
No new database tables or migrations
No new dependencies — uses only SQLAlchemy and json, both already in pyproject.toml
Existing tests: 525 passing, 0 regressions

Test Plan

31 unit tests cover all three tools, the scheduler initializer, timer registration, self-publish payload shape, and all edge cases (empty DB, window filtering, min-call threshold, limit, flaky detection, p95 latency)
Tests use in-memory SQLite with synthetic fixtures — no broker, no LLM, no network
Full existing test suite: 525 passed, 0 regressions
Integration test against a running SAM instance (manual — requires broker + DB)

🤖 Generated with Claude Code

Introduces a self-scheduling feedback-loop agent that continuously analyses the existing SAM persistence layer (tasks, task_events, feedback tables) and produces structured improvement reports as artifacts — no operator intervention required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

raphael-solace · 2026-03-31T12:38:32Z

This is the ticket:
https://sol-jira.atlassian.net/browse/DATAGO-130444

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(insights): autonomous Agent Improvement Loop — self-scheduling feedback daemon#1278

feat(insights): autonomous Agent Improvement Loop — self-scheduling feedback daemon#1278
raphael-solace wants to merge 1 commit intoSolaceLabs:mainfrom
raphael-solace:agent-echo

raphael-solace commented Mar 27, 2026 •

edited

Loading

Uh oh!

raphael-solace commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raphael-solace commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem This Solves

What This PR Does

How it works

Files Changed

The Three Analysis Tools

query_agent_stats

query_tool_stats

query_recent_failures

What the Report Looks Like

Configuration

Why This Is a Game Changer

Backward Compatibility

Test Plan

Uh oh!

raphael-solace commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raphael-solace commented Mar 27, 2026 •

edited

Loading

`query_agent_stats`

`query_tool_stats`

`query_recent_failures`