Parent: evalops/platform#732 (pluggable guardrails plugin architecture).
Driving consumer: evalops/platform#1156 (airlock as the first concrete GuardrailPlugin).
Cross-link: evalops/agentd#39 (binary-archaeology gap analysis vs OpenAI Codex Chronicle).
Why
evalops/platform#732 plans a governance.v1.GuardrailPlugin plugin contract so llm-gateway can call out-of-process safety plugins (input + output rails per #165's NeMo-borrowed 5-stage composition). The first concrete plugin under that contract — Airlock — is being built per evalops/platform#1156, which lifts the 13-class prompt-injection taxonomy embedded in OpenAI's Codex Chronicle binary into a pluggable, versioned, audit-emitting Go plugin.
To land Airlock cleanly we need three message types added to the governance.v1 plugin contract that #732 will define:
Quarantine — caller-tagged untrusted content with provenance, so the gateway can pass screen-derived OCR text, window titles, document paths, child summaries, tool output, and web-fetch bodies into the plugin without the plugin having to re-derive provenance.
Violation — structured per-rule fired event, hashes-not-text, suitable for forwarding through internal/audit/ per evalops/platform#1085.
RuleSetRef — versioned reference to a rule bundle (e.g. codex_chronicle_2026_04), so plugin behavior is reproducible across runs and changes ship as new bundle versions, not silent prompt edits.
These are general enough to belong in governance.v1 rather than a plugin-specific package; future plugins (NeMo, Presidio, Lakera, Aporia per #732's Portkey reference) reuse the same envelope.
Proposed shape
// proto/governance/v1/guardrail_plugin.proto
syntax = "proto3";
package governance.v1;
import \"google/protobuf/timestamp.proto\";
import \"common/v1/evidence_ref.proto\";
service GuardrailPlugin {
rpc Pre(PreRequest) returns (PreResponse);
rpc Post(PostRequest) returns (PostResponse);
rpc Describe(DescribeRequest) returns (DescribeResponse);
}
message PreRequest {
string org_id = 1;
Caller caller = 2;
string model = 3;
repeated Message messages = 4; // role-tagged, role enum below
repeated Quarantine quarantines = 5; // caller-tagged untrusted content
RuleSetRef ruleset = 6;
map<string, string> metadata = 7;
}
message PreResponse {
bool allow = 1;
string rewrite = 2; // empty = no rewrite
repeated Violation violations = 3; // always populated; informational if allow=true
string refusal_reason = 4;
}
message PostRequest {
string org_id = 1;
Caller caller = 2;
string model = 3;
repeated Message request_messages = 4;
repeated Quarantine quarantines = 5;
string response_output = 6;
repeated Message response_messages = 7;
RuleSetRef ruleset = 8;
}
message PostResponse {
bool allow = 1;
string rewrite = 2; // empty = no rewrite (response output)
repeated Violation violations = 3;
string refusal_reason = 4;
}
message DescribeRequest {}
message DescribeResponse {
string name = 1; // \"airlock\"
string version = 2;
repeated string supported_phases = 3; // \"pre\" | \"post\"
repeated Role inspected_roles = 4;
repeated RuleSetRef supported_rulesets = 5;
}
enum Role {
ROLE_UNSPECIFIED = 0;
ROLE_SYSTEM = 1;
ROLE_USER = 2;
ROLE_ASSISTANT = 3;
ROLE_TOOL = 4;
}
message Message {
Role role = 1;
string content = 2;
}
message Caller {
string subject = 1; // user / agent / service token subject
string surface = 2; // \"chronicle\", \"maestro\", \"conductor\", ...
}
message Quarantine {
string kind = 1; // \"ocr\" | \"window_title\" | \"doc_path\" | \"child_summary\" | \"tool_output\" | \"web_fetch\"
string content = 2;
string source_bundle_id = 3;
string source_window_title = 4;
uint32 source_display_id = 5;
string source_device_id = 6;
string chronicle_batch_id = 7;
string chronicle_frame_id = 8;
google.protobuf.Timestamp captured_at = 9;
common.v1.EvidenceRef evidence = 10; // ties back to chronicle frame provenance per #1085
}
message Violation {
string rule_id = 1; // \"airlock.rule.no_authority_propagation\"
string rule_version = 2; // \"2026.04\"
string taxonomy_class = 3; // \"authority_boundary\"
string rationale = 4;
repeated string excerpt_hashes = 5; // sha256[:16] hex; never raw text
google.protobuf.Timestamp at = 6;
string fired_when = 7; // \"pre\" | \"post\"
}
message RuleSetRef {
string name = 1; // \"codex_chronicle_2026_04\"
string version = 2;
}
Acceptance
Notes
Quarantine.evidence reuses common.v1.EvidenceRef so the audit chain (#1085) is unbroken from chronicle frame → guardrail violation → audit event.
Quarantine.kind is intentionally a string, not an enum, so new content categories (browser-rendered DOM, MCP tool output, RAG chunk) can be added without a proto bump. Documented value list in the docs page is the soft contract.
Role mirrors LiteLLM's Role enum (system / assistant / user / tool) — operators can declare per-rule which roles a rule inspects. We've validated this is the right shape across LiteLLM's 40+ vendor integrations.
RuleSetRef.name includes the source taxonomy lineage by convention (e.g. codex_chronicle_2026_04), so the catalog of bundles is itself a public ledger of how the field's threat model is evolving.
Parent: evalops/platform#732 (pluggable guardrails plugin architecture).
Driving consumer: evalops/platform#1156 (airlock as the first concrete GuardrailPlugin).
Cross-link: evalops/agentd#39 (binary-archaeology gap analysis vs OpenAI Codex Chronicle).
Why
evalops/platform#732plans agovernance.v1.GuardrailPluginplugin contract sollm-gatewaycan call out-of-process safety plugins (input + output rails per #165's NeMo-borrowed 5-stage composition). The first concrete plugin under that contract — Airlock — is being built perevalops/platform#1156, which lifts the 13-class prompt-injection taxonomy embedded in OpenAI's Codex Chronicle binary into a pluggable, versioned, audit-emitting Go plugin.To land Airlock cleanly we need three message types added to the
governance.v1plugin contract that #732 will define:Quarantine— caller-tagged untrusted content with provenance, so the gateway can pass screen-derived OCR text, window titles, document paths, child summaries, tool output, and web-fetch bodies into the plugin without the plugin having to re-derive provenance.Violation— structured per-rule fired event, hashes-not-text, suitable for forwarding throughinternal/audit/per evalops/platform#1085.RuleSetRef— versioned reference to a rule bundle (e.g.codex_chronicle_2026_04), so plugin behavior is reproducible across runs and changes ship as new bundle versions, not silent prompt edits.These are general enough to belong in
governance.v1rather than a plugin-specific package; future plugins (NeMo, Presidio, Lakera, Aporia per #732's Portkey reference) reuse the same envelope.Proposed shape
Acceptance
proto/governance/v1/guardrail_plugin.protointernal/llmgateway/,internal/governance/,internal/airlock/)docs/governance/per existing pattern, with an explicit note that Airlock is the reference implementationevalops/platform#732andevalops/platform#1156Notes
Quarantine.evidencereusescommon.v1.EvidenceRefso the audit chain (#1085) is unbroken fromchronicleframe → guardrail violation → audit event.Quarantine.kindis intentionally a string, not an enum, so new content categories (browser-rendered DOM, MCP tool output, RAG chunk) can be added without a proto bump. Documented value list in the docs page is the soft contract.Rolemirrors LiteLLM'sRoleenum (system / assistant / user / tool) — operators can declare per-rule which roles a rule inspects. We've validated this is the right shape across LiteLLM's 40+ vendor integrations.RuleSetRef.nameincludes the source taxonomy lineage by convention (e.g.codex_chronicle_2026_04), so the catalog of bundles is itself a public ledger of how the field's threat model is evolving.