From 933366ac4ef3dfae37a2899fc3f2f483131026a2 Mon Sep 17 00:00:00 2001 From: Eric Conklin Date: Fri, 5 Jun 2026 23:42:56 -0500 Subject: [PATCH] Add real-pilot replay addendum --- ...real-pilot-dev-001-human-review-summary.md | 26 +++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/docs/case-studies/real-pilot-dev-001-human-review-summary.md b/docs/case-studies/real-pilot-dev-001-human-review-summary.md index 2abb4a8..9b92e69 100644 --- a/docs/case-studies/real-pilot-dev-001-human-review-summary.md +++ b/docs/case-studies/real-pilot-dev-001-human-review-summary.md @@ -20,7 +20,7 @@ Raw `scenario.json`, `findings.json`, reviewer-label JSON, logs, account IDs, IA - IAMScope verdicts: 15 `validated`, 3 `inconclusive`. - Pattern mix: 15 `cross_account_trust`, 3 `admin_reachability`. - Severity mix: 5 `critical`, 10 `high`, 3 `medium`. -- `collection_context` was not provided because the original findings were generated before PR #66 added per-finding collection-context metadata. +- The original findings predated PR #66, so they did not include per-finding `collection_context`; replayed current-main findings now include complete `collection_context`. ## Finding Summary @@ -38,6 +38,28 @@ The 18 findings were not treated as a score, benchmark pass/fail result, or owne These labels are preliminary and not owner-confirmed. They represent a first-pass reviewer classification of sanitized finding rows, not a final authorization or risk determination. +## Current-Main Replay Addendum + +The frozen real-pilot scenario was replayed on current main after the `collection_context` and trust-safety fixes. The replay preserved the same result shape: 18 findings, 15 `validated`, and 3 `inconclusive`. + +The same human-review labels still applied to all 18 findings: + +- 18 labeled, 0 unlabeled. +- `valid_path`: 11. +- `expected_benign`: 3. +- `inconclusive_needs_context`: 3. +- `needs_more_evidence`: 1. + +The scenario counts were unchanged: 26 nodes, 63 edges, 3 constraints, and 6 edge constraints. The replayed findings now include complete per-finding `collection_context`: + +- `graph_collection_complete`: true. +- `has_collection_failures`: false. +- `has_policy_parse_failures`: false. +- `related_collection_failures`: empty. +- `related_policy_parse_failures`: empty. + +The sanitized review outputs had no raw 12-digit account IDs and no raw IAM/STS ARNs. Raw replay findings are local-only and may contain raw ARNs or account IDs, so no raw replay artifacts are committed. This strengthens evidence hygiene but does not change the non-claims. + ## What the Pilot Supports - Most findings were reviewable and meaningful to a human reviewer. @@ -70,7 +92,7 @@ The review question is whether AWS-managed AdministratorAccess should be treated - Owner-confirm a small subset of trust findings. - Separately test/admin-reachability calibration for AWS-managed AdministratorAccess as a clean admin witness. -- Optionally replay the frozen scenario with current main to regenerate findings with `collection_context` before any future publication. +- Use replayed current-main findings with `collection_context` for any future publication, while keeping raw replay artifacts local-only. ## Non-Claims