Skip to content

governance.v1.GuardrailPlugin: extend with Quarantine + Violation + RuleSetRef for Airlock (evalops/platform#1156) #94

@haasonsaas

Description

@haasonsaas

Parent: evalops/platform#732 (pluggable guardrails plugin architecture).
Driving consumer: evalops/platform#1156 (airlock as the first concrete GuardrailPlugin).
Cross-link: evalops/agentd#39 (binary-archaeology gap analysis vs OpenAI Codex Chronicle).

Why

evalops/platform#732 plans a governance.v1.GuardrailPlugin plugin contract so llm-gateway can call out-of-process safety plugins (input + output rails per #165's NeMo-borrowed 5-stage composition). The first concrete plugin under that contract — Airlock — is being built per evalops/platform#1156, which lifts the 13-class prompt-injection taxonomy embedded in OpenAI's Codex Chronicle binary into a pluggable, versioned, audit-emitting Go plugin.

To land Airlock cleanly we need three message types added to the governance.v1 plugin contract that #732 will define:

  1. Quarantine — caller-tagged untrusted content with provenance, so the gateway can pass screen-derived OCR text, window titles, document paths, child summaries, tool output, and web-fetch bodies into the plugin without the plugin having to re-derive provenance.
  2. Violation — structured per-rule fired event, hashes-not-text, suitable for forwarding through internal/audit/ per evalops/platform#1085.
  3. RuleSetRef — versioned reference to a rule bundle (e.g. codex_chronicle_2026_04), so plugin behavior is reproducible across runs and changes ship as new bundle versions, not silent prompt edits.

These are general enough to belong in governance.v1 rather than a plugin-specific package; future plugins (NeMo, Presidio, Lakera, Aporia per #732's Portkey reference) reuse the same envelope.

Proposed shape

// proto/governance/v1/guardrail_plugin.proto
syntax = "proto3";
package governance.v1;

import \"google/protobuf/timestamp.proto\";
import \"common/v1/evidence_ref.proto\";

service GuardrailPlugin {
  rpc Pre(PreRequest) returns (PreResponse);
  rpc Post(PostRequest) returns (PostResponse);
  rpc Describe(DescribeRequest) returns (DescribeResponse);
}

message PreRequest {
  string org_id = 1;
  Caller caller = 2;
  string model = 3;
  repeated Message messages = 4;          // role-tagged, role enum below
  repeated Quarantine quarantines = 5;    // caller-tagged untrusted content
  RuleSetRef ruleset = 6;
  map<string, string> metadata = 7;
}

message PreResponse {
  bool allow = 1;
  string rewrite = 2;                     // empty = no rewrite
  repeated Violation violations = 3;      // always populated; informational if allow=true
  string refusal_reason = 4;
}

message PostRequest {
  string org_id = 1;
  Caller caller = 2;
  string model = 3;
  repeated Message request_messages = 4;
  repeated Quarantine quarantines = 5;
  string response_output = 6;
  repeated Message response_messages = 7;
  RuleSetRef ruleset = 8;
}

message PostResponse {
  bool allow = 1;
  string rewrite = 2;                     // empty = no rewrite (response output)
  repeated Violation violations = 3;
  string refusal_reason = 4;
}

message DescribeRequest {}
message DescribeResponse {
  string name = 1;                        // \"airlock\"
  string version = 2;
  repeated string supported_phases = 3;   // \"pre\" | \"post\"
  repeated Role inspected_roles = 4;
  repeated RuleSetRef supported_rulesets = 5;
}

enum Role {
  ROLE_UNSPECIFIED = 0;
  ROLE_SYSTEM = 1;
  ROLE_USER = 2;
  ROLE_ASSISTANT = 3;
  ROLE_TOOL = 4;
}

message Message {
  Role role = 1;
  string content = 2;
}

message Caller {
  string subject = 1;                     // user / agent / service token subject
  string surface = 2;                     // \"chronicle\", \"maestro\", \"conductor\", ...
}

message Quarantine {
  string kind = 1;                        // \"ocr\" | \"window_title\" | \"doc_path\" | \"child_summary\" | \"tool_output\" | \"web_fetch\"
  string content = 2;
  string source_bundle_id = 3;
  string source_window_title = 4;
  uint32 source_display_id = 5;
  string source_device_id = 6;
  string chronicle_batch_id = 7;
  string chronicle_frame_id = 8;
  google.protobuf.Timestamp captured_at = 9;
  common.v1.EvidenceRef evidence = 10;    // ties back to chronicle frame provenance per #1085
}

message Violation {
  string rule_id = 1;                     // \"airlock.rule.no_authority_propagation\"
  string rule_version = 2;                // \"2026.04\"
  string taxonomy_class = 3;              // \"authority_boundary\"
  string rationale = 4;
  repeated string excerpt_hashes = 5;     // sha256[:16] hex; never raw text
  google.protobuf.Timestamp at = 6;
  string fired_when = 7;                  // \"pre\" | \"post\"
}

message RuleSetRef {
  string name = 1;                        // \"codex_chronicle_2026_04\"
  string version = 2;
}

Acceptance

  • Proto file lands in proto/governance/v1/guardrail_plugin.proto
  • Generated Go types available to evalops/platform consumers (internal/llmgateway/, internal/governance/, internal/airlock/)
  • Buf lint + breaking-change check pass
  • Documented in docs/governance/ per existing pattern, with an explicit note that Airlock is the reference implementation
  • Cross-linked from evalops/platform#732 and evalops/platform#1156

Notes

  • Quarantine.evidence reuses common.v1.EvidenceRef so the audit chain (#1085) is unbroken from chronicle frame → guardrail violation → audit event.
  • Quarantine.kind is intentionally a string, not an enum, so new content categories (browser-rendered DOM, MCP tool output, RAG chunk) can be added without a proto bump. Documented value list in the docs page is the soft contract.
  • Role mirrors LiteLLM's Role enum (system / assistant / user / tool) — operators can declare per-rule which roles a rule inspects. We've validated this is the right shape across LiteLLM's 40+ vendor integrations.
  • RuleSetRef.name includes the source taxonomy lineage by convention (e.g. codex_chronicle_2026_04), so the catalog of bundles is itself a public ledger of how the field's threat model is evolving.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions