You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️ SCOPE REVISED 2026-05-28 (v4) — pending RE-APPROVAL. Evolution captured in this issue's comments: v2 reframed to the achievable design; v3 added design-review corrections (role-chaining 1h cap, Bedrock credential chain, full client surface); v4 makes the design backend-agnostic across both compute backends (AgentCore Runtime + ECS Fargate).
Reduce cross-task blast radius from a compromised agent session by scoping each task's agent tenant-data access (DynamoDB, trace S3) to short-lived, task_id/user_id-tagged IAM credentials — on both compute backends — instead of the long-lived shared compute role.
Roadmap: "Per-session IAM scoping" (Credentials and authorization). Design: docs/design/SECURITY.md. Resolves the TODO at cdk/src/stacks/agent.ts:376-389.
Backend context
The platform runs the agent under a ComputeStrategy:
AgentCore Runtime (live): agent boots under the per-runtime ExecutionRole. Bedrock already ARN-scoped via grantInvoke.
Both base credentials are themselves assumed-role creds → an agent-side sts:AssumeRole is role chaining on both → 1h cap applies on both.
Design (backend-agnostic)
Per-task SessionRole + refreshable agent-side AssumeRole (portable). Agent (agent/src/) assumes a per-task SessionRole with session tags {user_id, repo, task_id} at startup and uses the derived creds for tenant-data clients only. Identical Python on both backends. SessionRole self-constrains via aws:PrincipalTag/*.
Refreshable credential provider (botocore RefreshableCredentials) re-assumes before the 1h role-chaining cap; tasks run to maxLifetime 8h. A one-shot assume_role() would ExpiredToken mid-task — forbidden.
SessionRole trust policy permits both compute roles (AgentCore exec role + ECS task role) as principals allowed to sts:AssumeRole/sts:TagSession, constrained so only they may pass tag values.
task_id leading-key conditions on TaskTable, TaskEventsTable, TaskApprovalsTable, TaskNudgesTable (dynamodb:LeadingKeys/FirstPartitionKeyValues ← aws:PrincipalTag/task_id). Scan denied/omitted. Cross-table approval TransactWriteItems satisfied (shared task_id). On the SessionRole — backend-agnostic.
S3 trace-prefix condition ← aws:PrincipalTag/user_id (resolves agent.ts:376-389); attachments read scoped. On the SessionRole.
ECS task role: replace Resource:'*' on InvokeModel with the explicit model + inference-profile ARNs (parity with AgentCore). Net-new work.
Compute-role slimming + parity: both compute roles keep baseline (Bedrock, logs, secrets, Memory) plussts:AssumeRole/sts:TagSession on the SessionRole. Tenant-data grants move OFF the compute roles onto the SessionRole. ECS task role gains the currently-missing grants (Approvals, Nudges, trace, attachments) — now expressed only via the SessionRole, so parity is achieved by construction.
Tenant-data clients to switch to the session (6) vs. left on compute role (8)
Leave: secrets config.py:33/97 (PAT read once at startup, pre-assume), CloudWatch logs shell.py:67/server.py:153,180/telemetry.py:59,167, AgentCore Memory memory.py:43.
AgentCore Memory session-tag scoping: DEFERRED (not leading-key-able; namespace isolation actorId=repo/sessionId=task_id is the current boundary).
Acceptance criteria
Per-task SessionRole; trust policy admits both the AgentCore exec role and ECS task role as assuming principals (synth-verified).
Agent assumes SessionRole with tags {user_id, repo, task_id} via a refreshable provider; test simulates >1h run → auto re-assume (no ExpiredToken). Verifiable via CloudTrail / sts:GetCallerIdentity.
6 tenant-data boto3 constructions use the session; 8 non-tenant remain on the compute role (agent tests assert).
task_id leading-key conditions on the 4 task tables (on SessionRole); Scan denied/omitted; synth tests assert; approval TransactWriteItems still succeeds.
user_id prefix condition on trace-bucket PutObject; agent.ts:376-389 TODO removed.
Both compute roles slimmed to baseline + sts:AssumeRole/TagSession; tenant-data grants live only on the SessionRole; ECS parity (Approvals/Nudges/trace/attachments) achieved via SessionRole.
In-account validation that ${aws:PrincipalTag/...} drives dynamodb:LeadingKeys (policy simulator or live test).
mise //cdk:test, //cdk:synth, //agent:quality pass; no new unsuppressed cdk-nag findings (incl. the dormant ECS construct path).
Summary
Reduce cross-task blast radius from a compromised agent session by scoping each task's agent tenant-data access (DynamoDB, trace S3) to short-lived,
task_id/user_id-tagged IAM credentials — on both compute backends — instead of the long-lived shared compute role.Roadmap: "Per-session IAM scoping" (Credentials and authorization). Design:
docs/design/SECURITY.md. Resolves the TODO atcdk/src/stacks/agent.ts:376-389.Backend context
The platform runs the agent under a
ComputeStrategy:ExecutionRole. Bedrock already ARN-scoped viagrantInvoke.ecs-agent-cluster.ts, currently commented out inagent.ts:540-583, gating tracked in refactor(compute): gate ECS construct on compute_type context instead of comment toggle #164): agent boots under the Fargate task role (ECS credential endpoint). Bedrock currentlyResource:'*'; task role is also missing Approvals/Nudges/trace/attachments grants.Both base credentials are themselves assumed-role creds → an agent-side
sts:AssumeRoleis role chaining on both → 1h cap applies on both.Design (backend-agnostic)
agent/src/) assumes a per-task SessionRole with session tags{user_id, repo, task_id}at startup and uses the derived creds for tenant-data clients only. Identical Python on both backends. SessionRole self-constrains viaaws:PrincipalTag/*.RefreshableCredentials) re-assumes before the 1h role-chaining cap; tasks run tomaxLifetime8h. A one-shotassume_role()wouldExpiredTokenmid-task — forbidden.sts:AssumeRole/sts:TagSession, constrained so only they may pass tag values.task_idleading-key conditions onTaskTable,TaskEventsTable,TaskApprovalsTable,TaskNudgesTable(dynamodb:LeadingKeys/FirstPartitionKeyValues←aws:PrincipalTag/task_id).Scandenied/omitted. Cross-table approvalTransactWriteItemssatisfied (sharedtask_id). On the SessionRole — backend-agnostic.aws:PrincipalTag/user_id(resolvesagent.ts:376-389); attachments read scoped. On the SessionRole.Resource:'*'onInvokeModelwith the explicit model + inference-profile ARNs (parity with AgentCore). Net-new work.sts:AssumeRole/sts:TagSessionon the SessionRole. Tenant-data grants move OFF the compute roles onto the SessionRole. ECS task role gains the currently-missing grants (Approvals, Nudges, trace, attachments) — now expressed only via the SessionRole, so parity is achieved by construction.Tenant-data clients to switch to the session (6) vs. left on compute role (8)
task_state.py:59/549,nudge_reader.py:82,progress_writer.py:380; S3 tracetelemetry.py:456; S3 attachments readattachments.py:61.config.py:33/97(PAT read once at startup, pre-assume), CloudWatch logsshell.py:67/server.py:153,180/telemetry.py:59,167, AgentCore Memorymemory.py:43.actorId=repo/sessionId=task_idis the current boundary).Acceptance criteria
{user_id, repo, task_id}via a refreshable provider; test simulates >1h run → auto re-assume (noExpiredToken). Verifiable via CloudTrail /sts:GetCallerIdentity.task_idleading-key conditions on the 4 task tables (on SessionRole);Scandenied/omitted; synth tests assert; approvalTransactWriteItemsstill succeeds.user_idprefix condition on trace-bucket PutObject;agent.ts:376-389TODO removed.Resource:'*'replaced with explicit model/inference-profile ARNs; ECS NagSuppressionreason(ecs-agent-cluster.ts:141) updated accordingly.sts:AssumeRole/TagSession; tenant-data grants live only on the SessionRole; ECS parity (Approvals/Nudges/trace/attachments) achieved via SessionRole.${aws:PrincipalTag/...}drivesdynamodb:LeadingKeys(policy simulator or live test).mise //cdk:test,//cdk:synth,//agent:qualitypass; no new unsuppressed cdk-nag findings (incl. the dormant ECS construct path).docs/design/SECURITY.md+docs/guides/ROADMAP.mdupdated; Starlight mirrors regenerated (mise //docs:sync).Out of scope
GitHub App/Token Vault PAT replacement; MicroVM attestation; layered per-tool derivation; principal-to-repo auth; table remodel to PK=
user_id; AgentCore Memory session-tag scoping; enabling the ECS backend itself (#164).Key references
dynamodb:LeadingKeys= base-table PK only: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/specifying-conditions.html , https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazondynamodb.htmlaws:PrincipalTag: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.htmlRisks