ai-bot: the assistant can read a skill directly from the realm server#5344
ai-bot: the assistant can read a skill directly from the realm server#5344jurgenwerk wants to merge 18 commits into
Conversation
First slice of the pull-model loadSkill tool (CS-11554): the bot-side core that fetches a skill's instructions on demand. - lib/load-skill.ts: the `loadSkill` tool schema the model will be offered, plus `executeLoadSkill` — mints a delegated, user-scoped realm token (CS-11553) and GETs skills/<name>/SKILL.md (or references/<path>) as raw source. Read-only and scoped to the requesting human, so the bot can't read anything they couldn't. Never throws; returns an ok/error result the caller hands back to the model. - tests/load-skill-test.ts: URL building, token minting, success, 404, and the disabled / forbidden delegation paths. Not yet wired into the response loop (advertise the tool + intercept the call + feed the result back to the model) — that lands next in this ticket; see the PR description. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Describe what the loadSkill module does as timeless fact rather than citing issue identifiers, per the evergreen-comments convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make loadSkill a real, bot-executed tool instead of just a building block: - getResponse offers loadSkill whenever delegated realm sessions are configured (gated on the manager being enabled), and accepts a messages override so a turn can be re-run with tool results appended. - response-state drops loadSkill from the emitted command requests, so it is never handed to the host to execute (the bot runs it itself). - main.ts runs the generation as a bounded loop: when a round's only tool calls are loadSkill, fetch the skills and generate again with the results in context, up to LOAD_SKILL_MAX_ROUNDS; cost is summed across rounds and the response is finalized once. - load-skill-loop.ts holds the pure decision (buildLoadSkillFollowup) with tests; the whole path is inert unless AI_BOT_DELEGATION_SECRET is set, so behavior is unchanged where delegation is off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ol-in-ai-bot-body-references
jurgenwerk
left a comment
There was a problem hiding this comment.
[Claude Code 🤖] Self-review — bugs/risks I can substantiate, worst first.
1. (High, security) No single-human-room guard before minting delegation
main.ts runs the loadSkill loop whenever delegatedRealmSessions.enabled, with onBehalfOf: senderMatrixUserId (main.ts:597). In a room with more than one human, the bot mints a read token for whoever sent the message and reads their realms — the confused-deputy case the security design calls out. It's gated by the secret and the guard is a known follow-up, but this must land before the secret is enabled anywhere. Right now nothing stops it.
2. (Medium, correctness) A rejected delegated token is never invalidated
executeLoadSkill (lib/load-skill.ts) returns an error on a non-2xx file response but never calls delegatedRealmSessions.invalidate(...). Tokens are cached for ~30 min; if the user's read is revoked inside that window, the cached token gets a 401/403 and every loadSkill for that (user, realm) keeps failing for up to 30 minutes. Should invalidate on 401/403 and retry once. Note the deps are Pick<…, 'getToken'>, so the fix also has to widen them to include invalidate.
3. (Medium) loadSkill calls can be silently dropped — no result, no error
response-state drops loadSkill from the emitted command requests, and the bot only executes it when buildLoadSkillFollowup returns a followup. But it returns [] when the model mixes loadSkill with a host command, and the loop stops at LOAD_SKILL_MAX_ROUNDS. In those cases the loadSkill call is neither executed nor handed to the host — the model "called a tool" and gets nothing back, so the turn can end with empty/confused content. At minimum it should feed back an error tool result (or stop advertising the tool once at the cap).
4. (Medium, UX — the untested path) Multi-round rendering into one Matrix message
The responder is reused across rounds and ResponseState.updateContent replaces latestContent. Round 1 (the loadSkill round) and round 2 (the answer) stream into the same message, so any visible round-1 content is overwritten and the message can shrink/reset mid-stream. This is the multi-round behavior I haven't exercised live (the single-round happy path was verified). Needs a real >1-round chat before trusting it.
5. (Low) executeLoadSkill follows redirects
Unlike the manual script (redirect: 'manual'), the executor uses default fetch, so a stray 30x is followed silently. A file GET shouldn't redirect, but worth pinning to surface surprises.
6. (Low) No size cap on fetched content
executeLoadSkill returns await response.text() unbounded; a large references/ file is injected wholesale into the next prompt. Fine for SKILL.md, risky for arbitrary reference files.
7. (Low) generationId reflects only the last round
Cost is summed across rounds (good), but generationId used for credit tracking is the last round's id only — minor attribution skew across a multi-round turn.
Nothing here is a crash on the normal (delegation-off) path — that stays byte-identical. #1 gates enabling the feature; #2/#3 are the substantive correctness fixes; #4 is the thing to verify live.
- Single-human-room guard: count non-bot members and only offer/run loadSkill when exactly one human is present, so the bot never reads a realm on an ambiguous "user"'s behalf in a shared room. The tool is no longer advertised there either (getResponse takes an explicit offerLoadSkill flag). - Invalidate-and-retry: a cached delegated token rejected with 401/403 (access revoked inside its staleness window) is dropped and re-minted once, instead of failing for the rest of the token's TTL. - Stop following redirects on the skill fetch (redirect: 'manual') and cap fetched content length so a large references file can't blow up the next round's prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
[Claude Code 🤖] Fixes pushed for the review (numbering matches the review items above; written
New tests cover the retry and the cap; types/lint/responder suite green. Still open (intentionally not patched here):
|
- Remove the fetched-content length cap and the loadSkill round cap: the model loads what it asks for and loops until it stops, no arbitrary limits. - Rename the DelegatedRealmSession* symbols to DelegatedUserRealmSession* (and the manager field to delegatedUserRealmSessions) to say plainly that the session is delegated for a specific user on a realm. Code-only; the wire header values are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Preview deploymentsHost Test Results154 tests 154 ✅ 5m 41s ⏱️ Results for commit faf693f. Realm Server Test Results 1 files ±0 1 suites ±0 10m 56s ⏱️ +40s Results for commit faf693f. ± Comparison against earlier commit 417a669. |
Discovery hands the bot full skill URLs (entry points are refs by realm URL), so the tool now takes the file `url` directly instead of a skill `name` + an optional references `path` it had to construct. `realm` stays — the delegated token is realm-scoped — and the url must be inside it (checked up front). Drops the skillFileUrl path-construction and its layout assumption. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
idea by @lukemelia: when AI requests a tool call, the ai bot can make a command request with a special property named something like |
Or, even better might be |
lukemelia
left a comment
There was a problem hiding this comment.
Let's re-implement with the approach we discussed before merging.
|
One other question to consider: there is nothing specific to skills in the |
The bot-executed tool that pulls a file from a realm at tool-call time had
no skill-specific logic, so rename it loadSkill -> readRealmFile: a generic,
delegated, read-only realm file read. Keeps {realm, url}, the user-scoped
token, and the in-process load-then-continue loop.
Surface each read in the timeline as an executedBy: 'ai-bot' command request:
the host records it but never executes it (skipped across validate/run/
auto-exec/stuck). The bot repurposes the in-flight "thinking" message into a
loading marker and rotates to a fresh event for the answer, so the marker
renders above the answer and animates applying -> applied (or invalid + reason
on a failed read).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
(codex) Review note: I think the current The main issue I see is that mixed tool rounds are still lossy. I think the simpler shape is a single post-completion tool-call pass:
That would remove most of the branching between |
…opping
A round mixing readRealmFile with a host-dispatched command used to drop the
read silently: the all-or-nothing gate ran nothing, and ResponseState strips
readRealmFile from host-visible command requests, so it was neither executed
nor surfaced.
Replace the gate with classifyToolCalls() -> { botToolCalls, hostToolCalls }
and branch explicitly in the generation loop:
- bot-only: run them in-process and loop (marker applying -> done, then answer)
- host-only: the normal host-command path
- mixed: don't loop; emit a visible failed marker for the reads ("readRealmFile
cannot be mixed with host-dispatched commands in the same round") and let the
host commands proceed
Bot tools are same-turn, host tools next-turn; keeping those time models
separate (rather than a fake unified dispatch) avoids dragging history
reconstruction into this change. executedBy: 'ai-bot' stays as timeline/debug
metadata, not a control path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The action bar appears above the composer while a diff or command is applying; its top border drew a stray horizontal line above the bar. Remove just the top edge — the side and bottom border still tuck it into the composer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spell out the reasoning in the doc comment: reusing the current (Thinking) event keeps the marker in this turn's earliest timeline slot, so it renders before the answer that streams into the rotated fresh event. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A readRealmFile marker is an executedBy command the host never runs, but buildMessageCommand still resolved it through the skill list — an `await store.get` per enabled skill — which left the marker pill blank for ~a second while those reads ran. Short-circuit executedBy commands before the skill lookup and build them directly (no codeRef, no approval, status straight from the result event), so the pill paints immediately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ol-in-ai-bot-body-references
Per review: the host skip checks used `if (command.executedBy)` (truthiness), which would wrongly skip a command whose executedBy names a different actor — e.g. 'host', a natural future value for host-run commands. Compare against AI_BOT_EXECUTOR at all six sites so the host only skips what ai-bot ran itself, and a 'host' (or any other) executor is evaluated normally. Adds a test that a non-ai-bot executor isn't short-circuited. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standardize the nomenclature for the timeline element that shows a bot-executed tool call (readRealmFile) and its result. Drops the informal "pill" and the ad-hoc "marker" in favor of "command-result indicator" throughout: methods beginCommandResultIndicator / sendCommandResultIndicator, the indicatorEventId locals, comments, and test names. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@lukemelia the implementation should now be more in line with what we discussed - I also attached a video in the PR description |
|
@lukemelia regarding the 2 different flows, it's because of an ordering problem The ordering problem was where bot does the read and the answer in one turn, and that turn has a single "Thinking…" bubble that the answer streams into. When the bot also tried to show a "Read file…" command request/result for the read, that landed below the answer, which looked weird and out of place Host commands get correct ordering for free: the bot ends its turn, the host runs the command and posts the command result, and that starts a new turn = a new message. So the request, result, and continuation naturally land in order. On the other hand, readRealmFile stays in one turn: the bot runs it itself and keeps writing the answer in the same turn. So we replace the "Thinking…" message with the command request(s), post the command result(s), and push the answer into a new message after manually arranging the order |
I'm going to try creating a patch to this PR we can review that uses a multi-turn approach, keeping host-executed commands and aibot-executed commands more similar to each other and how they are treated my ai-bot's main processing engine. I think it could simpilify things a lot. |
…and path Re-architect readRealmFile so the bot no longer runs reads in an inline, same-turn loop. Instead it surfaces each read as an `executedBy: 'ai-bot'` command request, fetches the file, uploads it to the Matrix media repo, and posts a command-result event carrying the file as `data.attachedFiles`. The existing host-command result path then reconstructs the content on the next turn (via the same attachment-download the host `readFile` uses), and the bot's own result event re-triggers generation. Because reads now resolve next-turn like host commands, the two timing models collapse into one: an answer may freely mix reads and host commands, so the mixed-round rejection, the inline `for(;;)` loop, the "thinking"-event indicator rotation, and the in-memory followup splicing are all deleted. Uploads dedupe by content hash, so the same skill read repeatedly is stored in Matrix once; keying on the hash keeps it version-correct (changed content re-uploads). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
I noticed another a real problem in this implementation - the skill content is not persisted, and it's only available for the current turn. |
The bot fulfills a readRealmFile read by posting a command-result event, which is meant to re-trigger the bot for the continuation turn. Two things prevented that continuation from running: - Fulfillment happened while the room lock was still held. The re-trigger has to acquire that lock, so it was dropped. Fulfill after the lock is released instead. - The re-trigger runs on the result event's local echo, before the homeserver has indexed it into /messages. getRoomEvents is a server fetch, so the continuation's history missed the just-posted result and the read looked unresolved (shouldRespond=false). Splice the in-hand result event into the history when the fetch didn't include it. Host command results arrive via sync (already server-side), so this only affects the bot's own results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-attachment ai-bot: fulfill readRealmFile via Matrix attachments on the host-command path
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faf693f9fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await publish(deps, { | ||
| msgtype: APP_BOXEL_COMMAND_RESULT_WITH_NO_OUTPUT_MSGTYPE, | ||
| commandRequestId, | ||
| failureReason: error, |
There was a problem hiding this comment.
Surface readRealmFile failures to the model
When a realm read fails (for example 404, forbidden, or malformed arguments), this stores the useful reason only in failureReason on a no-output command result. The prompt reconstruction path for ordinary tool results ignores failureReason and emits only Tool call invalid. (packages/runtime-common/ai/prompt.ts), so the continuation model cannot see why readRealmFile failed or choose a different URL/access path. Please include the error in the tool-result content for this command, or teach prompt reconstruction to surface failureReason for these results.
Useful? React with 👍 / 👎.
lukemelia
left a comment
There was a problem hiding this comment.
Let's re-implement with the approach we discussed before merging.

Here is a video demonstrating:
Screen.Recording.2026-06-30.at.23.01.39.mov
More details:
What this does
The assistant can now grab a file from your realm by itself, while it's answering — usually a skill's instructions, but really any file. Before, every enabled skill's full text was stuffed into the prompt up front. Now the bot reads only what it needs, when it needs it.
How it works (the simple version)
readRealmFile(realm, url).Why
readRealmFileand notloadSkillThere's nothing skill-specific about it — it just reads a file. So it gets an honest, general name. (It's distinct from the existing host
readFile, which runs in your browser; this one runs server-side with a delegated token.)Mixed tool calls
Bot-run tools (like this) finish this turn; host commands finish next turn. Those two timings don't mix. So if the model asks for a read and a host command in the same round, the read is explicitly rejected with a clear message ("readRealmFile cannot be mixed with host-dispatched commands in the same round") instead of being silently dropped — and the host command still runs.
Safety
Also in here (unrelated)
A one-line CSS fix removing a stray border line that appeared above the AI action bar during diff-apply.
Tests
Unit tests for the fetch/executor (delegated-token retry, 404 / disabled / forbidden), the tool-call classifier (bot vs host, mixed), and the read loop; verified live with single and multiple reads, rendered in order.