Skip to content

ai-bot: the assistant can read a skill directly from the realm server#5344

Open
jurgenwerk wants to merge 18 commits into
mainfrom
cs-11554-loadskill-tool-in-ai-bot-body-references
Open

ai-bot: the assistant can read a skill directly from the realm server#5344
jurgenwerk wants to merge 18 commits into
mainfrom
cs-11554-loadskill-tool-in-ai-bot-body-references

Conversation

@jurgenwerk

@jurgenwerk jurgenwerk commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Here is a video demonstrating:

  1. Informing the AI about a set of available skills and where to get them if needed. In next work, we will create curated skill index files and attach it at the beginning of the system prompt. This is so that AI will be aware of the available skills but won't load them until it actually needs them
  2. The bot pulling the skills on demand from the realm server directly (but still visible as command results in the chat UI). The reasons we want to fetch from the server directly: no Matrix uploads at enable time, no dependency on the user's host tab being open, content is always fresh
  3. AI then receives the skills on its own demand (in a "pull" mechanism, as opposed to our current "push" mechanism where we inject all the attached skills upfront in the prompt)
Screen.Recording.2026-06-30.at.23.01.39.mov

More details:

What this does

The assistant can now grab a file from your realm by itself, while it's answering — usually a skill's instructions, but really any file. Before, every enabled skill's full text was stuffed into the prompt up front. Now the bot reads only what it needs, when it needs it.

How it works (the simple version)

  1. Mid-answer, the bot thinks "I need that file" and calls a tool: readRealmFile(realm, url).
  2. The bot fetches the file itself, using a short-lived, read-only pass made out in your name — so it can only read what you already could. No browser, no round-trip; it happens in the same turn.
  3. It reads the file's contents into its own working memory, keeps going, and answers you.
  4. You see a pill in the chat — "Read file: demo/SKILL.md" — that spins while loading and turns to ✓ when done (or red with a reason if it failed). That pill is just a record; the file's text goes to the model, not into the chat.

Why readRealmFile and not loadSkill

There's nothing skill-specific about it — it just reads a file. So it gets an honest, general name. (It's distinct from the existing host readFile, which runs in your browser; this one runs server-side with a delegated token.)

Mixed tool calls

Bot-run tools (like this) finish this turn; host commands finish next turn. Those two timings don't mix. So if the model asks for a read and a host command in the same round, the read is explicitly rejected with a clear message ("readRealmFile cannot be mixed with host-dispatched commands in the same round") instead of being silently dropped — and the host command still runs.

Safety

  • Only in single-human rooms, so "you" is unambiguous.
  • Read-only and scoped to you — can't read what you can't, can't write.
  • Always live — the current file, never a stale snapshot.
  • A missing file / no access / bad URL comes back as an error the model can read (and a red pill), never a crash.
  • Inert unless the delegation secret is configured (already set in our deployed envs).

Also in here (unrelated)

A one-line CSS fix removing a stray border line that appeared above the AI action bar during diff-apply.

Tests

Unit tests for the fetch/executor (delegated-token retry, 404 / disabled / forbidden), the tool-call classifier (bot vs host, mixed), and the read loop; verified live with single and multiple reads, rendered in order.

jurgenwerk and others added 4 commits June 26, 2026 11:50
First slice of the pull-model loadSkill tool (CS-11554): the bot-side core that
fetches a skill's instructions on demand.

- lib/load-skill.ts: the `loadSkill` tool schema the model will be offered, plus
  `executeLoadSkill` — mints a delegated, user-scoped realm token (CS-11553) and
  GETs skills/<name>/SKILL.md (or references/<path>) as raw source. Read-only and
  scoped to the requesting human, so the bot can't read anything they couldn't.
  Never throws; returns an ok/error result the caller hands back to the model.
- tests/load-skill-test.ts: URL building, token minting, success, 404, and the
  disabled / forbidden delegation paths.

Not yet wired into the response loop (advertise the tool + intercept the call +
feed the result back to the model) — that lands next in this ticket; see the PR
description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Describe what the loadSkill module does as timeless fact rather than citing
issue identifiers, per the evergreen-comments convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make loadSkill a real, bot-executed tool instead of just a building block:

- getResponse offers loadSkill whenever delegated realm sessions are
  configured (gated on the manager being enabled), and accepts a messages
  override so a turn can be re-run with tool results appended.
- response-state drops loadSkill from the emitted command requests, so it is
  never handed to the host to execute (the bot runs it itself).
- main.ts runs the generation as a bounded loop: when a round's only tool
  calls are loadSkill, fetch the skills and generate again with the results in
  context, up to LOAD_SKILL_MAX_ROUNDS; cost is summed across rounds and the
  response is finalized once.
- load-skill-loop.ts holds the pure decision (buildLoadSkillFollowup) with
  tests; the whole path is inert unless AI_BOT_DELEGATION_SECRET is set, so
  behavior is unchanged where delegation is off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@jurgenwerk jurgenwerk left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code 🤖] Self-review — bugs/risks I can substantiate, worst first.

1. (High, security) No single-human-room guard before minting delegation

main.ts runs the loadSkill loop whenever delegatedRealmSessions.enabled, with onBehalfOf: senderMatrixUserId (main.ts:597). In a room with more than one human, the bot mints a read token for whoever sent the message and reads their realms — the confused-deputy case the security design calls out. It's gated by the secret and the guard is a known follow-up, but this must land before the secret is enabled anywhere. Right now nothing stops it.

2. (Medium, correctness) A rejected delegated token is never invalidated

executeLoadSkill (lib/load-skill.ts) returns an error on a non-2xx file response but never calls delegatedRealmSessions.invalidate(...). Tokens are cached for ~30 min; if the user's read is revoked inside that window, the cached token gets a 401/403 and every loadSkill for that (user, realm) keeps failing for up to 30 minutes. Should invalidate on 401/403 and retry once. Note the deps are Pick<…, 'getToken'>, so the fix also has to widen them to include invalidate.

3. (Medium) loadSkill calls can be silently dropped — no result, no error

response-state drops loadSkill from the emitted command requests, and the bot only executes it when buildLoadSkillFollowup returns a followup. But it returns [] when the model mixes loadSkill with a host command, and the loop stops at LOAD_SKILL_MAX_ROUNDS. In those cases the loadSkill call is neither executed nor handed to the host — the model "called a tool" and gets nothing back, so the turn can end with empty/confused content. At minimum it should feed back an error tool result (or stop advertising the tool once at the cap).

4. (Medium, UX — the untested path) Multi-round rendering into one Matrix message

The responder is reused across rounds and ResponseState.updateContent replaces latestContent. Round 1 (the loadSkill round) and round 2 (the answer) stream into the same message, so any visible round-1 content is overwritten and the message can shrink/reset mid-stream. This is the multi-round behavior I haven't exercised live (the single-round happy path was verified). Needs a real >1-round chat before trusting it.

5. (Low) executeLoadSkill follows redirects

Unlike the manual script (redirect: 'manual'), the executor uses default fetch, so a stray 30x is followed silently. A file GET shouldn't redirect, but worth pinning to surface surprises.

6. (Low) No size cap on fetched content

executeLoadSkill returns await response.text() unbounded; a large references/ file is injected wholesale into the next prompt. Fine for SKILL.md, risky for arbitrary reference files.

7. (Low) generationId reflects only the last round

Cost is summed across rounds (good), but generationId used for credit tracking is the last round's id only — minor attribution skew across a multi-round turn.

Nothing here is a crash on the normal (delegation-off) path — that stays byte-identical. #1 gates enabling the feature; #2/#3 are the substantive correctness fixes; #4 is the thing to verify live.

- Single-human-room guard: count non-bot members and only offer/run loadSkill
  when exactly one human is present, so the bot never reads a realm on an
  ambiguous "user"'s behalf in a shared room. The tool is no longer advertised
  there either (getResponse takes an explicit offerLoadSkill flag).
- Invalidate-and-retry: a cached delegated token rejected with 401/403 (access
  revoked inside its staleness window) is dropped and re-minted once, instead
  of failing for the rest of the token's TTL.
- Stop following redirects on the skill fetch (redirect: 'manual') and cap
  fetched content length so a large references file can't blow up the next
  round's prompt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk

jurgenwerk commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

[Claude Code 🤖] Fixes pushed for the review (numbering matches the review items above; written (n) so GitHub doesn't auto-link them as PRs — sorry about the earlier mangled links):

  • (1) single-human-room guard — counts non-bot members; loadSkill is neither offered nor executed unless exactly one human is in the room, so the bot never reads a realm on an ambiguous user's behalf. (getResponse now takes an explicit offerLoadSkill flag.)
  • (2) token invalidation — a cached token rejected with 401/403 is invalidated and re-minted once before failing, instead of being wedged for the rest of its TTL.
  • (5) redirects — the skill fetch uses redirect: 'manual'; (6) size cap — fetched content is bounded so a large references file can't blow up the next round's prompt.

New tests cover the retry and the cap; types/lint/responder suite green.

Still open (intentionally not patched here):

  • (3) — when the model mixes loadSkill with a host command, or exceeds the round cap, the loadSkill call is dropped rather than answered. Closing it cleanly needs a host-vs-bot tool-precedence decision, so I left it as a known limitation.
  • (4) — multi-round rendering into one Matrix message still needs a real >1-round chat to verify; the single-round path is confirmed.

- Remove the fetched-content length cap and the loadSkill round cap: the model
  loads what it asks for and loops until it stops, no arbitrary limits.
- Rename the DelegatedRealmSession* symbols to DelegatedUserRealmSession* (and
  the manager field to delegatedUserRealmSessions) to say plainly that the
  session is delegated for a specific user on a realm. Code-only; the wire
  header values are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Preview deployments

Host Test Results

154 tests   154 ✅  5m 41s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit faf693f.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   10m 56s ⏱️ +40s
1 686 tests +7  1 686 ✅ +7  0 💤 ±0  0 ❌ ±0 
1 765 runs  +7  1 765 ✅ +7  0 💤 ±0  0 ❌ ±0 

Results for commit faf693f. ± Comparison against earlier commit 417a669.

Discovery hands the bot full skill URLs (entry points are refs by realm URL),
so the tool now takes the file `url` directly instead of a skill `name` + an
optional references `path` it had to construct. `realm` stays — the delegated
token is realm-scoped — and the url must be inside it (checked up front).
Drops the skillFileUrl path-construction and its layout assumption.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk jurgenwerk requested a review from Copilot June 29, 2026 11:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@jurgenwerk jurgenwerk requested a review from a team June 29, 2026 11:32
@jurgenwerk jurgenwerk marked this pull request as ready for review June 29, 2026 11:32
@jurgenwerk jurgenwerk requested a review from lukemelia June 29, 2026 11:32
@jurgenwerk

jurgenwerk commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

idea by @lukemelia: when AI requests a tool call, the ai bot can make a command request with a special property named something like handledByServer and then the host can know it doesn't have to execute it (because the ai bot already loaded the skill directly from the server). This is so that we can show a visual indicator in the chat history telling the user the necessary skills got loaded (will also be helpful to debug in case of problems)

@lukemelia

lukemelia commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

idea by @lukemelia: when AI requests a tool call, the ai bot can make a command request with a special property named something like handledByServer and then the host can know it doesn't have to execute it (because the ai bot already loaded the skill directly from the server).

Or, even better might be executedBy: "ai-bot". A little more information dense, and could possibly help us when we scale to multi-user or multi-bot rooms.

@lukemelia lukemelia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-implement with the approach we discussed before merging.

@lukemelia

lukemelia commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

One other question to consider: there is nothing specific to skills in the loadSkill tool AFAIK. Should it just be a readFile tool?

The bot-executed tool that pulls a file from a realm at tool-call time had
no skill-specific logic, so rename it loadSkill -> readRealmFile: a generic,
delegated, read-only realm file read. Keeps {realm, url}, the user-scoped
token, and the in-process load-then-continue loop.

Surface each read in the timeline as an executedBy: 'ai-bot' command request:
the host records it but never executes it (skipped across validate/run/
auto-exec/stuck). The bot repurposes the in-flight "thinking" message into a
loading marker and rotates to a fresh event for the answer, so the marker
renders above the answer and animates applying -> applied (or invalid + reason
on a failed read).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copy link
Copy Markdown
Contributor Author

(codex) Review note: I think the current readRealmFile path is carrying too much special-case control flow.

The main issue I see is that mixed tool rounds are still lossy. readRealmFileOnlyToolCalls() only handles the round when every tool call is readRealmFile, while ResponseState also strips readRealmFile out of the host-visible command requests. So if the model emits readRealmFile plus a normal host command in the same round, the read is neither executed nor surfaced to the host.

I think the simpler shape is a single post-completion tool-call pass:

  • if ai-bot has a local executor for a tool, run it and append a tool result
  • otherwise emit a normal host command request
  • keep executedBy: "ai-bot" only as timeline/debug metadata, not as a separate control path

That would remove most of the branching between read-realm-file-loop.ts and main.ts, and it seems closer to the re-implementation direction already being discussed here.

jurgenwerk and others added 2 commits June 30, 2026 10:03
…opping

A round mixing readRealmFile with a host-dispatched command used to drop the
read silently: the all-or-nothing gate ran nothing, and ResponseState strips
readRealmFile from host-visible command requests, so it was neither executed
nor surfaced.

Replace the gate with classifyToolCalls() -> { botToolCalls, hostToolCalls }
and branch explicitly in the generation loop:
- bot-only: run them in-process and loop (marker applying -> done, then answer)
- host-only: the normal host-command path
- mixed: don't loop; emit a visible failed marker for the reads ("readRealmFile
  cannot be mixed with host-dispatched commands in the same round") and let the
  host commands proceed

Bot tools are same-turn, host tools next-turn; keeping those time models
separate (rather than a fake unified dispatch) avoids dragging history
reconstruction into this change. executedBy: 'ai-bot' stays as timeline/debug
metadata, not a control path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The action bar appears above the composer while a diff or command is applying;
its top border drew a stray horizontal line above the bar. Remove just the top
edge — the side and bottom border still tuck it into the composer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk jurgenwerk changed the title ai-bot: loadSkill — pull a skill's instructions on demand ai-bot: the assistant can read a realm file on its own, mid-answer (readRealmFile) Jun 30, 2026
jurgenwerk and others added 2 commits June 30, 2026 10:56
Spell out the reasoning in the doc comment: reusing the current (Thinking)
event keeps the marker in this turn's earliest timeline slot, so it renders
before the answer that streams into the rotated fresh event.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A readRealmFile marker is an executedBy command the host never runs, but
buildMessageCommand still resolved it through the skill list — an
`await store.get` per enabled skill — which left the marker pill blank for
~a second while those reads ran. Short-circuit executedBy commands before the
skill lookup and build them directly (no codeRef, no approval, status straight
from the result event), so the pill paints immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk jurgenwerk changed the title ai-bot: the assistant can read a realm file on its own, mid-answer (readRealmFile) ai-bot: the assistant can read a skill directly from the realm server Jun 30, 2026
Comment thread packages/host/app/lib/matrix-classes/message-builder.ts Outdated
Comment thread packages/host/app/lib/command-auto-execute.ts Outdated
Comment thread packages/host/app/services/command-service.ts Outdated
Comment thread packages/host/app/services/command-service.ts Outdated
Comment thread packages/host/app/services/command-service.ts Outdated
jurgenwerk and others added 2 commits June 30, 2026 12:43
Per review: the host skip checks used `if (command.executedBy)` (truthiness),
which would wrongly skip a command whose executedBy names a different actor —
e.g. 'host', a natural future value for host-run commands. Compare against
AI_BOT_EXECUTOR at all six sites so the host only skips what ai-bot ran itself,
and a 'host' (or any other) executor is evaluated normally. Adds a test that a
non-ai-bot executor isn't short-circuited.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standardize the nomenclature for the timeline element that shows a bot-executed
tool call (readRealmFile) and its result. Drops the informal "pill" and the
ad-hoc "marker" in favor of "command-result indicator" throughout: methods
beginCommandResultIndicator / sendCommandResultIndicator, the indicatorEventId
locals, comments, and test names. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk

Copy link
Copy Markdown
Contributor Author

@lukemelia the implementation should now be more in line with what we discussed - I also attached a video in the PR description

@jurgenwerk jurgenwerk requested a review from lukemelia June 30, 2026 11:43
@jurgenwerk

Copy link
Copy Markdown
Contributor Author

@lukemelia regarding the 2 different flows, it's because of an ordering problem

The ordering problem was where bot does the read and the answer in one turn, and that turn has a single "Thinking…" bubble that the answer streams into. When the bot also tried to show a "Read file…" command request/result for the read, that landed below the answer, which looked weird and out of place

Host commands get correct ordering for free: the bot ends its turn, the host runs the command and posts the command result, and that starts a new turn = a new message. So the request, result, and continuation naturally land in order.

On the other hand, readRealmFile stays in one turn: the bot runs it itself and keeps writing the answer in the same turn. So we replace the "Thinking…" message with the command request(s), post the command result(s), and push the answer into a new message after manually arranging the order

@lukemelia

Copy link
Copy Markdown
Contributor

The ordering problem was where bot does the read and the answer in one turn

I'm going to try creating a patch to this PR we can review that uses a multi-turn approach, keeping host-executed commands and aibot-executed commands more similar to each other and how they are treated my ai-bot's main processing engine. I think it could simpilify things a lot.

…and path

Re-architect readRealmFile so the bot no longer runs reads in an inline,
same-turn loop. Instead it surfaces each read as an `executedBy: 'ai-bot'`
command request, fetches the file, uploads it to the Matrix media repo, and
posts a command-result event carrying the file as `data.attachedFiles`. The
existing host-command result path then reconstructs the content on the next
turn (via the same attachment-download the host `readFile` uses), and the
bot's own result event re-triggers generation.

Because reads now resolve next-turn like host commands, the two timing models
collapse into one: an answer may freely mix reads and host commands, so the
mixed-round rejection, the inline `for(;;)` loop, the "thinking"-event
indicator rotation, and the in-memory followup splicing are all deleted.

Uploads dedupe by content hash, so the same skill read repeatedly is stored
in Matrix once; keying on the hash keeps it version-correct (changed content
re-uploads).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jurgenwerk

Copy link
Copy Markdown
Contributor Author

I noticed another a real problem in this implementation - the skill content is not persisted, and it's only available for the current turn.

jurgenwerk and others added 2 commits June 30, 2026 18:34
The bot fulfills a readRealmFile read by posting a command-result event,
which is meant to re-trigger the bot for the continuation turn. Two things
prevented that continuation from running:

- Fulfillment happened while the room lock was still held. The re-trigger
  has to acquire that lock, so it was dropped. Fulfill after the lock is
  released instead.

- The re-trigger runs on the result event's local echo, before the
  homeserver has indexed it into /messages. getRoomEvents is a server fetch,
  so the continuation's history missed the just-posted result and the read
  looked unresolved (shouldRespond=false). Splice the in-hand result event
  into the history when the fetch didn't include it. Host command results
  arrive via sync (already server-side), so this only affects the bot's own
  results.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-attachment

ai-bot: fulfill readRealmFile via Matrix attachments on the host-command path
@jurgenwerk

Copy link
Copy Markdown
Contributor Author

Working now:

image

@jurgenwerk jurgenwerk marked this pull request as ready for review June 30, 2026 17:01
@jurgenwerk jurgenwerk requested review from lukemelia and removed request for lukemelia June 30, 2026 17:02

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: faf693f9fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

await publish(deps, {
msgtype: APP_BOXEL_COMMAND_RESULT_WITH_NO_OUTPUT_MSGTYPE,
commandRequestId,
failureReason: error,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface readRealmFile failures to the model

When a realm read fails (for example 404, forbidden, or malformed arguments), this stores the useful reason only in failureReason on a no-output command result. The prompt reconstruction path for ordinary tool results ignores failureReason and emits only Tool call invalid. (packages/runtime-common/ai/prompt.ts), so the continuation model cannot see why readRealmFile failed or choose a different URL/access path. Please include the error in the tool-result content for this command, or teach prompt reconstruction to surface failureReason for these results.

Useful? React with 👍 / 👎.

@lukemelia lukemelia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-implement with the approach we discussed before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants