Skip to content

Commit cb41fe2

Browse files
committed
define pod sandbox lifecycle contract
Define the contract for the PodSandbox hooks for the NRI plugins. The Sandbox hooks are based on the CRI-API RPCs , since the OCI runtime only specify the container lifecycle. Signed-off-by: Antonio Ojea <aojea@google.com>
1 parent 06a64ce commit cb41fe2

2 files changed

Lines changed: 167 additions & 3 deletions

File tree

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,12 @@ subscription.
130130

131131
NRI plugins can subscribe to the following pod lifecycle events:
132132

133-
- creation
134-
- stopping
135-
- removal
133+
- creation (RunPodSandbox)
134+
- stopping (StopPodSandbox)
135+
- removal (RemovePodSandbox)
136+
137+
For detailed specifications of pod sandbox event timing, state requirements, and plugin
138+
expectations, see [Pod Sandbox Lifecycle Hooks](docs/pod-sandbox-lifecycle.md).
136139

137140
The following table lists the pod sandbox properties exposed to NRI plugins, together with
138141
the first NRI, containerd and CRI-O versions each was available in.

docs/pod-sandbox-lifecycle.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# NRI Pod Sandbox Lifecycle Hooks
2+
3+
## Relationship to CRI API
4+
5+
This specification defines how NRI plugins interact with pod sandbox lifecycle events. The underlying pod sandbox operations are defined by the [Kubernetes CRI API](https://github.com/kubernetes/cri-api):
6+
7+
- **RunPodSandbox (CRI)**: Creates and starts a pod-level sandbox. Runtimes must ensure the sandbox is in the ready state on success.
8+
- **StopPodSandbox (CRI)**: Stops any running process that is part of the sandbox and reclaims network resources.
9+
- **RemovePodSandbox (CRI)**: Removes the sandbox. If there are any running containers, they must be forcibly terminated and removed.
10+
11+
This NRI specification details when and under what conditions NRI plugins receive notifications for these events, ensuring plugins can reliably depend on consistent sandbox state across different runtime implementations.
12+
13+
## Overview
14+
15+
The pod sandbox lifecycle consists of three distinct phases, each with a corresponding NRI event that plugins can subscribe to:
16+
17+
1. **RunPodSandbox**: Fired after the runtime successfully executes CRI RunPodSandbox
18+
2. **StopPodSandbox**: Fired when the runtime initiates CRI StopPodSandbox
19+
3. **RemovePodSandbox**: Fired when the runtime performs CRI RemovePodSandbox
20+
21+
For each event, this specification defines:
22+
23+
- **Sandbox State Contract**: What sandbox infrastructure conditions runtimes MUST satisfy when firing the NRI event
24+
- **Plugin Responsibilities and Capabilities**: What plugins can safely do in response to the event
25+
26+
## RunPodSandbox
27+
28+
**CRI Operation**: RunPodSandbox - Creates and starts a pod-level sandbox.
29+
30+
**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed the CRI RunPodSandbox operation and the sandbox has reached a "Ready" state, but before any workload containers are started.
31+
32+
### Sandbox State Contract
33+
34+
When the runtime fires the RunPodSandbox NRI event, it guarantees:
35+
36+
- The Pod-level cgroup hierarchy has been established
37+
- The Sandbox namespaces (IPC, Network, UTS) are created and active
38+
- Network setup has been fully configured (network interfaces are up and assigned addressing)
39+
- The pod IP address (if applicable) is assigned and available
40+
- The "pause" container (if the runtime uses one) is running
41+
- All prerequisite operations for workload container startup are complete
42+
43+
### Plugin Responsibilities and Capabilities
44+
45+
Upon receiving the RunPodSandbox event, plugins can safely:
46+
47+
- Access the network namespace and inspect network configuration
48+
- Perform network-level operations or monitoring
49+
- Inject sandbox-level hardware configurations (e.g., RDMA, RoCEv2)
50+
- Establish plugin-specific tracking or monitoring for the pod
51+
- Store initial state or baseline metrics for later reference
52+
53+
Plugins should treat this as an initialization phase. The sandbox infrastructure will remain accessible throughout the pod's lifetime until StopPodSandbox is called.
54+
55+
## StopPodSandbox
56+
57+
**CRI Operation**: StopPodSandbox - Stops any running process that is part of the sandbox and reclaims network resources.
58+
59+
**NRI Event Timing**: The StopPodSandbox NRI event is fired when the runtime initiates the CRI StopPodSandbox operation.
60+
61+
### Sandbox State Contract
62+
63+
When the runtime fires the StopPodSandbox NRI event, it guarantees:
64+
65+
- Workload containers within the sandbox are stopped or are stopping
66+
- **CRITICAL**: The sandbox infrastructure still exists and remains fully accessible during this hook
67+
- The network namespace is not unmounted or deleted until this hook completes
68+
- The pod's cgroups remain accessible
69+
- All pod-level resources remain stable until this hook returns
70+
71+
### Plugin Responsibilities and Capabilities
72+
73+
StopPodSandbox is the designated cleanup and observation phase for plugins. Upon receiving this event, plugins can:
74+
75+
- Access the pod's network namespace to read final telemetry or metrics
76+
- Collect final state for observability or troubleshooting
77+
- Detach hardware interfaces or reconfigure resources
78+
- Clean up custom firewall configurations, routing rules, or other network-level state
79+
- Perform graceful cleanup or resource release before sandbox teardown
80+
81+
**Important**: Plugin processing must complete within the configured request timeout. Do not assume sandbox access persists after this hook returns or times out.
82+
83+
## RemovePodSandbox
84+
85+
**CRI Operation**: RemovePodSandbox - Removes the sandbox and forcibly terminates any remaining containers.
86+
87+
**NRI Event Timing**: The RemovePodSandbox NRI event is fired when the runtime initiates the CRI RemovePodSandbox operation, during final garbage collection.
88+
89+
### Sandbox State Contract
90+
91+
When the runtime fires the RemovePodSandbox NRI event:
92+
93+
- All workload containers have been removed
94+
- The StopPodSandbox operation has completed
95+
- Network setup teardown may be underway or complete
96+
- The pod's namespaces (Network, IPC, UTS) may have already been deleted
97+
- Pod-level cgroups may be destroyed
98+
- Sandbox infrastructure access is **not guaranteed**
99+
100+
### Plugin Responsibilities and Capabilities
101+
102+
RemovePodSandbox is strictly for plugin-internal cleanup. Plugins MUST NOT attempt to access pod infrastructure (namespaces, cgroups, network configuration) during this hook, as their existence is not guaranteed.
103+
104+
Plugins receiving this event should only:
105+
106+
- Clean up plugin-internal memory caches or object tracking associated with the podSandboxID
107+
- Remove host-level tracking files, database entries, or other locally stored pod references
108+
- Release any plugin resources held for this specific pod
109+
- Perform final accounting or bookkeeping
110+
111+
**Important**: This hook is informational only. Plugins should not assume any pod infrastructure exists. Only clean up information the plugin created or stored internally.
112+
113+
## Event Ordering and Guarantees
114+
115+
Runtimes MUST guarantee the following ordering:
116+
117+
1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution
118+
2. **StopPodSandbox** NRI event fires during CRI StopPodSandbox execution
119+
3. **RemovePodSandbox** NRI event fires during CRI RemovePodSandbox execution
120+
4. These events MUST fire in strict order: RunPodSandbox → StopPodSandbox → RemovePodSandbox
121+
5. No workload containers will be started until after RunPodSandbox hook completes
122+
6. All workload containers will be stopped before StopPodSandbox hook is called
123+
7. No network resource reclamation should occur during StopPodSandbox hook execution
124+
125+
See the [CRI API specification](https://github.com/kubernetes/cri-api) for details on each CRI operation.
126+
127+
## Plugin Implementation Guidance
128+
129+
### Subscribing to Events
130+
131+
Plugins subscribe to these events during the Configure phase by returning the appropriate event flags in the ConfigureResponse:
132+
133+
- `Event_RUN_POD_SANDBOX` (1 << 0) for RunPodSandbox
134+
- `Event_STOP_POD_SANDBOX` (1 << 1) for StopPodSandbox
135+
- `Event_REMOVE_POD_SANDBOX` (1 << 2) for RemovePodSandbox
136+
137+
These events are notified via the StateChange RPC call.
138+
139+
### Timeout Handling
140+
141+
All plugin processing must complete within the configured request timeout. Plugins should plan accordingly:
142+
143+
- **RunPodSandbox**: Failure may result in pod creation failure
144+
- **StopPodSandbox**: Non-blocking for subsequent operations; the plugin should not depend on completion of subsequent teardown
145+
- **RemovePodSandbox**: Non-blocking; removal will proceed regardless of plugin timeout
146+
147+
### Error Handling
148+
149+
Plugins should handle errors gracefully and avoid leaving the pod or system in an inconsistent state. Error recovery strategies:
150+
151+
- **RunPodSandbox errors**: Problematic; may block pod creation depending on failure severity and runtime policy
152+
- **StopPodSandbox errors**: May not prevent scenario termination depending on runtime policy
153+
- **RemovePodSandbox errors**: Should not prevent sandbox removal
154+
155+
### Multi-Plugin Coordination
156+
157+
When multiple plugins are active:
158+
159+
- All RunPodSandbox hooks complete before first workload container starts
160+
- Hooks execute in plugin index order; later plugins should not assume earlier plugins' modifications will persist
161+
- RemovePodSandbox hooks are independent; plugins should not rely on side effects from other plugins

0 commit comments

Comments
 (0)