Skip to content

Allow tag-based ActorTemplate images by persisting resolved digests with snapshots #223

@EItanya

Description

Problem

ActorTemplate currently requires all workload images to be digest-pinned:

  • spec.pauseImage
  • every spec.containers[].image

The validation message says this is required because changing the image invalidates snapshots. That concern is real: on restore, atelet recreates OCI rootfs bundles before calling runsc restore, so restoring a checkpoint against a different base image can be incorrect or fail.

However, requiring users to provide digest-pinned image references makes demos and day-to-day development awkward. Users often naturally write repo/image:tag, and tooling has to resolve tags before applying ActorTemplates.

Proposal

Allow ActorTemplate images to be specified by tag, but make snapshot restore deterministic by persisting the resolved image reference alongside snapshot data.

When atelet pulls an image for Run or Restore, it should record the exact digest-resolved image reference that was used to build the OCI rootfs. When atelet later checkpoints the actor, it should upload that image-resolution metadata under the snapshot prefix, alongside the existing checkpoint files.

On subsequent restore from that snapshot, atelet should read the snapshot metadata and use the stored digest-resolved references when rebuilding OCI bundles, instead of resolving the current ActorTemplate tags again.

This preserves the safety property the digest requirement was trying to enforce, without forcing users to pre-resolve every image in the ActorTemplate.

Current code shape

Today:

  • The CRD validation requires @ in both Container.Image and ActorTemplateSpec.PauseImage.
  • ateapi materializes an atelet WorkloadSpec directly from the current ActorTemplate.
  • atelet Restore downloads checkpoint files, then calls prepareOCIBundles using that current WorkloadSpec.
  • memorypullcache.Fetch only uses its digest-keyed cache when the requested ref already contains a digest.
  • snapshot storage currently contains checkpoint.img.zstd, pages.img.zstd, and pages_meta.img.zstd; there is no snapshot manifest containing image metadata.

Implementation sketch

  1. Relax ActorTemplate image validation:

    • remove the self.contains(@) XValidation from spec.pauseImage
    • remove the same validation from spec.containers[].image
    • update generated CRDs, Helm CRDs, tests, docs, and demos
  2. Extend the image pull path:

    • change memorypullcache.Fetch or a wrapper around it to return the resolved digest reference in addition to the rootfs tar stream
    • for tag refs, resolve to the platform-specific image digest used by remote.Image(..., remote.WithPlatform(...))
    • for already-pinned refs, preserve the caller’s pinned digest semantics where appropriate, especially for multi-arch/index digests
  3. Persist local runtime image metadata:

    • when prepareOCIDirectory prepares a bundle, write metadata recording:
      • container name (pause or workload container name)
      • original image ref from the WorkloadSpec
      • resolved digest ref actually used for the rootfs
    • checkpoint should upload this metadata before deleting local actor dirs
  4. Add a snapshot manifest:

    • store a small JSON object under the snapshot prefix, for example snapshot-manifest.json
    • include a version field for future compatibility
    • include pause image and per-container resolved refs keyed by container name
  5. Use the manifest on restore:

    • before prepareOCIBundles, fetch the snapshot manifest if present
    • rewrite the restore WorkloadSpec image refs to the manifest’s resolved refs
    • fail clearly if the manifest is missing for snapshots that require tag resolution, or define an explicit backwards-compatibility fallback for old snapshots
  6. Keep checkpoint accurate:

    • do not resolve tags at checkpoint time from the current ActorTemplate
    • checkpoint should persist the image refs that were used when the actor was actually started/restored

Acceptance criteria

  • ActorTemplates can be created with tag-based image refs, e.g. busybox:latest.
  • The first run resolves tags and records digest-pinned refs.
  • Golden snapshots persist the resolved refs used to create the golden actor.
  • Restoring from a golden snapshot or actor snapshot uses the stored digest refs, not whatever the tag points to later.
  • Existing digest-pinned ActorTemplates continue to work.
  • Existing snapshots either continue to restore through a documented fallback or fail with a clear actionable error.
  • Unit tests cover:
    • validation accepts tag refs
    • resolved refs are persisted
    • restore prefers snapshot manifest refs over current ActorTemplate refs
    • already-pinned multi-arch refs preserve the intended digest behavior

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/apiUser-facing API changes
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions