Problem
ActorTemplate currently requires all workload images to be digest-pinned:
spec.pauseImage
- every
spec.containers[].image
The validation message says this is required because changing the image invalidates snapshots. That concern is real: on restore, atelet recreates OCI rootfs bundles before calling runsc restore, so restoring a checkpoint against a different base image can be incorrect or fail.
However, requiring users to provide digest-pinned image references makes demos and day-to-day development awkward. Users often naturally write repo/image:tag, and tooling has to resolve tags before applying ActorTemplates.
Proposal
Allow ActorTemplate images to be specified by tag, but make snapshot restore deterministic by persisting the resolved image reference alongside snapshot data.
When atelet pulls an image for Run or Restore, it should record the exact digest-resolved image reference that was used to build the OCI rootfs. When atelet later checkpoints the actor, it should upload that image-resolution metadata under the snapshot prefix, alongside the existing checkpoint files.
On subsequent restore from that snapshot, atelet should read the snapshot metadata and use the stored digest-resolved references when rebuilding OCI bundles, instead of resolving the current ActorTemplate tags again.
This preserves the safety property the digest requirement was trying to enforce, without forcing users to pre-resolve every image in the ActorTemplate.
Current code shape
Today:
- The CRD validation requires
@ in both Container.Image and ActorTemplateSpec.PauseImage.
- ateapi materializes an atelet
WorkloadSpec directly from the current ActorTemplate.
- atelet
Restore downloads checkpoint files, then calls prepareOCIBundles using that current WorkloadSpec.
memorypullcache.Fetch only uses its digest-keyed cache when the requested ref already contains a digest.
- snapshot storage currently contains
checkpoint.img.zstd, pages.img.zstd, and pages_meta.img.zstd; there is no snapshot manifest containing image metadata.
Implementation sketch
-
Relax ActorTemplate image validation:
- remove the
self.contains(@) XValidation from spec.pauseImage
- remove the same validation from
spec.containers[].image
- update generated CRDs, Helm CRDs, tests, docs, and demos
-
Extend the image pull path:
- change
memorypullcache.Fetch or a wrapper around it to return the resolved digest reference in addition to the rootfs tar stream
- for tag refs, resolve to the platform-specific image digest used by
remote.Image(..., remote.WithPlatform(...))
- for already-pinned refs, preserve the caller’s pinned digest semantics where appropriate, especially for multi-arch/index digests
-
Persist local runtime image metadata:
- when
prepareOCIDirectory prepares a bundle, write metadata recording:
- container name (
pause or workload container name)
- original image ref from the WorkloadSpec
- resolved digest ref actually used for the rootfs
- checkpoint should upload this metadata before deleting local actor dirs
-
Add a snapshot manifest:
- store a small JSON object under the snapshot prefix, for example
snapshot-manifest.json
- include a version field for future compatibility
- include pause image and per-container resolved refs keyed by container name
-
Use the manifest on restore:
- before
prepareOCIBundles, fetch the snapshot manifest if present
- rewrite the restore WorkloadSpec image refs to the manifest’s resolved refs
- fail clearly if the manifest is missing for snapshots that require tag resolution, or define an explicit backwards-compatibility fallback for old snapshots
-
Keep checkpoint accurate:
- do not resolve tags at checkpoint time from the current ActorTemplate
- checkpoint should persist the image refs that were used when the actor was actually started/restored
Acceptance criteria
- ActorTemplates can be created with tag-based image refs, e.g.
busybox:latest.
- The first run resolves tags and records digest-pinned refs.
- Golden snapshots persist the resolved refs used to create the golden actor.
- Restoring from a golden snapshot or actor snapshot uses the stored digest refs, not whatever the tag points to later.
- Existing digest-pinned ActorTemplates continue to work.
- Existing snapshots either continue to restore through a documented fallback or fail with a clear actionable error.
- Unit tests cover:
- validation accepts tag refs
- resolved refs are persisted
- restore prefers snapshot manifest refs over current ActorTemplate refs
- already-pinned multi-arch refs preserve the intended digest behavior
Problem
ActorTemplate currently requires all workload images to be digest-pinned:
spec.pauseImagespec.containers[].imageThe validation message says this is required because changing the image invalidates snapshots. That concern is real: on restore, atelet recreates OCI rootfs bundles before calling
runsc restore, so restoring a checkpoint against a different base image can be incorrect or fail.However, requiring users to provide digest-pinned image references makes demos and day-to-day development awkward. Users often naturally write
repo/image:tag, and tooling has to resolve tags before applying ActorTemplates.Proposal
Allow ActorTemplate images to be specified by tag, but make snapshot restore deterministic by persisting the resolved image reference alongside snapshot data.
When atelet pulls an image for
RunorRestore, it should record the exact digest-resolved image reference that was used to build the OCI rootfs. When atelet later checkpoints the actor, it should upload that image-resolution metadata under the snapshot prefix, alongside the existing checkpoint files.On subsequent restore from that snapshot, atelet should read the snapshot metadata and use the stored digest-resolved references when rebuilding OCI bundles, instead of resolving the current ActorTemplate tags again.
This preserves the safety property the digest requirement was trying to enforce, without forcing users to pre-resolve every image in the ActorTemplate.
Current code shape
Today:
@in bothContainer.ImageandActorTemplateSpec.PauseImage.WorkloadSpecdirectly from the current ActorTemplate.Restoredownloads checkpoint files, then callsprepareOCIBundlesusing that currentWorkloadSpec.memorypullcache.Fetchonly uses its digest-keyed cache when the requested ref already contains a digest.checkpoint.img.zstd,pages.img.zstd, andpages_meta.img.zstd; there is no snapshot manifest containing image metadata.Implementation sketch
Relax ActorTemplate image validation:
self.contains(@)XValidation fromspec.pauseImagespec.containers[].imageExtend the image pull path:
memorypullcache.Fetchor a wrapper around it to return the resolved digest reference in addition to the rootfs tar streamremote.Image(..., remote.WithPlatform(...))Persist local runtime image metadata:
prepareOCIDirectoryprepares a bundle, write metadata recording:pauseor workload container name)Add a snapshot manifest:
snapshot-manifest.jsonUse the manifest on restore:
prepareOCIBundles, fetch the snapshot manifest if presentKeep checkpoint accurate:
Acceptance criteria
busybox:latest.