governance.v1.GuardrailPlugin: extend with Quarantine + Violation + RuleSetRef for Airlock (evalops/platform#1156)

Parent: evalops/platform#732 (pluggable guardrails plugin architecture).
Driving consumer: evalops/platform#1156 (airlock as the first concrete GuardrailPlugin).
Cross-link: evalops/agentd#39 (binary-archaeology gap analysis vs OpenAI Codex Chronicle).

## Why

`evalops/platform#732` plans a `governance.v1.GuardrailPlugin` plugin contract so `llm-gateway` can call out-of-process safety plugins (input + output rails per #165's NeMo-borrowed 5-stage composition). The first concrete plugin under that contract — Airlock — is being built per `evalops/platform#1156`, which lifts the 13-class prompt-injection taxonomy embedded in OpenAI's Codex Chronicle binary into a pluggable, versioned, audit-emitting Go plugin.

To land Airlock cleanly we need three message types added to the `governance.v1` plugin contract that #732 will define:

1. `Quarantine` — caller-tagged untrusted content with provenance, so the gateway can pass screen-derived OCR text, window titles, document paths, child summaries, tool output, and web-fetch bodies into the plugin without the plugin having to re-derive provenance.
2. `Violation` — structured per-rule fired event, hashes-not-text, suitable for forwarding through `internal/audit/` per evalops/platform#1085.
3. `RuleSetRef` — versioned reference to a rule bundle (e.g. `codex_chronicle_2026_04`), so plugin behavior is reproducible across runs and changes ship as new bundle versions, not silent prompt edits.

These are general enough to belong in `governance.v1` rather than a plugin-specific package; future plugins (NeMo, Presidio, Lakera, Aporia per #732's Portkey reference) reuse the same envelope.

## Proposed shape

```proto
// proto/governance/v1/guardrail_plugin.proto
syntax = "proto3";
package governance.v1;

import \"google/protobuf/timestamp.proto\";
import \"common/v1/evidence_ref.proto\";

service GuardrailPlugin {
  rpc Pre(PreRequest) returns (PreResponse);
  rpc Post(PostRequest) returns (PostResponse);
  rpc Describe(DescribeRequest) returns (DescribeResponse);
}

message PreRequest {
  string org_id = 1;
  Caller caller = 2;
  string model = 3;
  repeated Message messages = 4;          // role-tagged, role enum below
  repeated Quarantine quarantines = 5;    // caller-tagged untrusted content
  RuleSetRef ruleset = 6;
  map<string, string> metadata = 7;
}

message PreResponse {
  bool allow = 1;
  string rewrite = 2;                     // empty = no rewrite
  repeated Violation violations = 3;      // always populated; informational if allow=true
  string refusal_reason = 4;
}

message PostRequest {
  string org_id = 1;
  Caller caller = 2;
  string model = 3;
  repeated Message request_messages = 4;
  repeated Quarantine quarantines = 5;
  string response_output = 6;
  repeated Message response_messages = 7;
  RuleSetRef ruleset = 8;
}

message PostResponse {
  bool allow = 1;
  string rewrite = 2;                     // empty = no rewrite (response output)
  repeated Violation violations = 3;
  string refusal_reason = 4;
}

message DescribeRequest {}
message DescribeResponse {
  string name = 1;                        // \"airlock\"
  string version = 2;
  repeated string supported_phases = 3;   // \"pre\" | \"post\"
  repeated Role inspected_roles = 4;
  repeated RuleSetRef supported_rulesets = 5;
}

enum Role {
  ROLE_UNSPECIFIED = 0;
  ROLE_SYSTEM = 1;
  ROLE_USER = 2;
  ROLE_ASSISTANT = 3;
  ROLE_TOOL = 4;
}

message Message {
  Role role = 1;
  string content = 2;
}

message Caller {
  string subject = 1;                     // user / agent / service token subject
  string surface = 2;                     // \"chronicle\", \"maestro\", \"conductor\", ...
}

message Quarantine {
  string kind = 1;                        // \"ocr\" | \"window_title\" | \"doc_path\" | \"child_summary\" | \"tool_output\" | \"web_fetch\"
  string content = 2;
  string source_bundle_id = 3;
  string source_window_title = 4;
  uint32 source_display_id = 5;
  string source_device_id = 6;
  string chronicle_batch_id = 7;
  string chronicle_frame_id = 8;
  google.protobuf.Timestamp captured_at = 9;
  common.v1.EvidenceRef evidence = 10;    // ties back to chronicle frame provenance per #1085
}

message Violation {
  string rule_id = 1;                     // \"airlock.rule.no_authority_propagation\"
  string rule_version = 2;                // \"2026.04\"
  string taxonomy_class = 3;              // \"authority_boundary\"
  string rationale = 4;
  repeated string excerpt_hashes = 5;     // sha256[:16] hex; never raw text
  google.protobuf.Timestamp at = 6;
  string fired_when = 7;                  // \"pre\" | \"post\"
}

message RuleSetRef {
  string name = 1;                        // \"codex_chronicle_2026_04\"
  string version = 2;
}
```

## Acceptance

- [ ] Proto file lands in `proto/governance/v1/guardrail_plugin.proto`
- [ ] Generated Go types available to evalops/platform consumers (`internal/llmgateway/`, `internal/governance/`, `internal/airlock/`)
- [ ] Buf lint + breaking-change check pass
- [ ] Documented in `docs/governance/` per existing pattern, with an explicit note that Airlock is the reference implementation
- [ ] Cross-linked from `evalops/platform#732` and `evalops/platform#1156`

## Notes

- `Quarantine.evidence` reuses `common.v1.EvidenceRef` so the audit chain (#1085) is unbroken from `chronicle` frame → guardrail violation → audit event.
- `Quarantine.kind` is intentionally a string, not an enum, so new content categories (browser-rendered DOM, MCP tool output, RAG chunk) can be added without a proto bump. Documented value list in the docs page is the soft contract.
- `Role` mirrors LiteLLM's `Role` enum (system / assistant / user / tool) — operators can declare per-rule which roles a rule inspects. We've validated this is the right shape across LiteLLM's 40+ vendor integrations.
- `RuleSetRef.name` includes the source taxonomy lineage by convention (e.g. `codex_chronicle_2026_04`), so the catalog of bundles is itself a public ledger of how the field's threat model is evolving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

governance.v1.GuardrailPlugin: extend with Quarantine + Violation + RuleSetRef for Airlock (evalops/platform#1156) #94

Why

Proposed shape

Acceptance

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

governance.v1.GuardrailPlugin: extend with Quarantine + Violation + RuleSetRef for Airlock (evalops/platform#1156) #94

Description

Why

Proposed shape

Acceptance

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions