From 08d970b7fa1dd58ed1debea8bec6bae9d99d80bb Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Wed, 3 Jun 2026 19:01:37 +0200 Subject: [PATCH 1/6] docs: add JEP-0014 Virtual Scalable Exporters proposal Propose a Virtual Scalable Exporter subsystem for Jumpstarter that manages pools of virtual targets with configurable autoscaling via per-provider CRDs (QEMUExporterPool, AndroidExporterPool, etc.). Co-authored-by: Cursor --- .../JEP-0014-virtual-scalable-exporters.md | 878 ++++++++++++++++++ python/docs/source/contributing/jeps/index.md | 3 +- 2 files changed, 879 insertions(+), 2 deletions(-) create mode 100644 python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md new file mode 100644 index 000000000..974bc8bb9 --- /dev/null +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -0,0 +1,878 @@ +# JEP-0014: Virtual Scalable Exporters + +| Field | Value | +| ----------------- | -------------------------------------------------------------- | +| **JEP** | 0014 | +| **Title** | Virtual Scalable Exporters | +| **Author(s)** | @mangelajo (Miguel Angel Ajo Pelayo) | +| **Status** | Draft | +| **Type** | Standards Track | +| **Created** | 2026-06-03 | +| **Updated** | 2026-06-03 | +| **Discussion** | https://github.com/jumpstarter-dev/jumpstarter/issues/41 | +| **Requires** | | +| **Supersedes** | | +| **Superseded-By** | | + +--- + +## Abstract + +This JEP proposes a Virtual Scalable Exporter subsystem for Jumpstarter that +manages pools of virtual targets with configurable autoscaling. Each virtual +target definition declares a minimum and maximum number of instances; the system +maintains a warm pool of pre-spawned exporters ready for immediate lease +fulfillment, and scales up or down based on demand. This enables low-latency +lease acquisition, massive scalability, resource efficiency, and simplified +orchestration of mixed physical/virtual test topologies — while allowing +administrators to tune the trade-off between responsiveness and resource +consumption on a per-target basis. + +## Motivation + +Jumpstarter currently excels at managing scarce, physical hardware targets. +However, testing and development often require a mix of physical devices and +scalable, virtual resources. Today, virtual targets must be manually deployed +as static exporters with a fixed count — there is no mechanism for the system +to maintain or scale a pool of virtual instances based on demand. + +This model has several limitations: + +- **Artificial scarcity:** Virtual targets are treated as a fixed-size pool, + just like physical ones, which defeats their "virtually unlimited" potential. +- **No elasticity:** The pool cannot grow when demand spikes (CI burst) or + shrink when idle, leading to either queuing or waste. +- **Manual lifecycle:** Administrators must manually deploy, monitor, and scale + virtual exporter instances — there is no declarative "desired state" for a + virtual target pool. +- **Cold-start penalty vs. waste trade-off:** Users must choose between + pre-spawning many instances (wasting resources when idle) or spawning on + demand (high latency at lease time). There is no middle ground. + +The core problem is that virtual targets lack a pool manager that can maintain a +configurable warm pool while autoscaling to meet demand. + +### User Stories + +- **As a** CI pipeline author, **I want to** lease N virtual targets instantly + from a warm pool, **so that** my pipeline doesn't block on provisioning + latency during burst periods. + +- **As a** developer, **I want to** lease a virtual target matching a known + physical board's properties with near-zero wait time, **so that** I can + iterate quickly without waiting for scarce hardware. + +- **As a** platform engineer, **I want to** declare a virtual target pool with + `minInstances: 2, maxInstances: 20`, **so that** there are always warm + instances ready while the system scales up on demand and scales down when idle. + +- **As a** cost-conscious operator, **I want to** set `minInstances: 0` for + rarely-used target types, **so that** they consume no resources until actually + requested, accepting a cold-start delay. + +## Proposal + +The proposal introduces **Virtual Scalable Exporters** — a controller-managed +pool of virtual target instances with configurable autoscaling. Rather than +treating virtual targets as purely on-demand or purely static, each virtual +target definition declares scaling parameters that let administrators tune the +trade-off between instant availability and resource consumption. + +### Core Concept: Managed Pools with Scaling + +Each provider type has its own CRD (e.g., `QEMUExporterPool`, +`AndroidExporterPool`, `CorelliumExporterPool`) with provider-specific +configuration fields alongside shared scaling parameters. This gives each +provider a strongly-typed schema rather than a generic bag of config. + +**Example: QEMU pool** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: QEMUExporterPool +metadata: + name: rpi4-virtual + namespace: jumpstarter +spec: + # Scaling configuration (shared across all pool CRDs) + minInstances: 2 # Always keep 2 warm instances ready + maxInstances: 20 # Scale up to 20 under load + + # Node scheduling (shared across all pool CRDs, optional) + nodeSelector: + node.kubernetes.io/instance-type: bare-metal + jumpstarter.dev/nested-virt: "true" + + # Labels exposed on each instance (for lease matching) + labels: + board: rpi4 + arch: aarch64 + virtual: "true" + + # Pod overrides (shared across all pool CRDs, optional) + podTemplate: + resources: + requests: + cpu: "4" + memory: 5Gi + limits: + cpu: "4" + memory: 5Gi + + # QEMU-specific configuration + machineType: virt + firmware: registry.example.com/firmware/rpi4:latest + resources: + cpu: 4 + memory: 4Gi + storage: 16Gi + + # Exporter template (drivers exposed by each instance) + exporterTemplate: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +``` + +**Example: Android Emulator pool** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: AndroidExporterPool +metadata: + name: pixel7-emulator + namespace: jumpstarter +spec: + minInstances: 0 # Fully on-demand (cold-start OK for this target) + maxInstances: 10 + + labels: + device: pixel7 + os: android + api-level: "34" + virtual: "true" + + # Android-specific configuration + systemImage: system-images;android-34;google_apis;arm64-v8a + avdProfile: pixel_7 + gpu: swiftshader + + exporterTemplate: + drivers: + - type: jumpstarter_driver_android.driver.AdbDriver + - type: jumpstarter_driver_power.driver.EmulatorPower +``` + +**Example: Corellium pool** + +The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages +the full virtual instance lifecycle through the Corellium REST API — it creates +instances on power-on and destroys them on power-off. It exposes a power +interface and a websocket-based serial console. The pool controller manages the +exporter Pod and injects API credentials via environment variables +(`CORELLIUM_API_HOST`, `CORELLIUM_API_TOKEN`) from the referenced Secret. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: CorelliumExporterPool +metadata: + name: rd1ae-corellium + namespace: jumpstarter +spec: + minInstances: 1 + maxInstances: 5 + + labels: + board: rd1ae + flavor: kronos + virtual: "true" + + # Corellium-specific configuration + apiHost: app.corellium.com + apiCredentialsSecret: corellium-api-credentials # Secret with keys: token + projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" + + # Device/instance parameters + deviceFlavor: kronos + deviceOs: "1.1.1" + deviceBuild: "Critical Application Monitor (Baremetal)" + consoleName: "Primary Compute Non-Secure" + + exporterTemplate: + drivers: + - type: jumpstarter_driver_corellium.driver.Corellium + config: + device_name: "{{ .InstanceName }}" +``` + +The pool controller automatically injects the Corellium-specific CRD spec fields +(`projectId`, `deviceFlavor`, `deviceOs`, `deviceBuild`, `consoleName`) into the +driver config at instance creation time. Only fields that vary per instance (like +`device_name` using the `{{ .InstanceName }}` template variable) need to be +specified explicitly in `exporterTemplate`. + +A pool with `minInstances: 0` consumes no resources until a lease is +requested, accepting cold-start latency. A pool with `minInstances: 3` +always has 3 ready-to-lease instances — leases are fulfilled instantly from +the warm pool, and the controller scales up if more are needed. + +### User Experience + +From the user's perspective, virtual scalable exporters appear as regular +exporters in the pool. The lease experience is unchanged: + +```bash +# Lease any rpi4 target — may match physical or virtual +jmp lease -l board=rpi4 + +# Lease explicitly virtual targets +jmp lease -l board=rpi4,virtual=true + +``` + +The guiding principle is: **"Get me a target that matches my requirements."** The +distinction between physical and virtual is an implementation detail, not a +primary concern for the user. Virtual exporters simply appear in the same pool +as physical ones, differentiated only by labels. + +### Architecture Overview + +``` + ┌─────────────────────────┐ + │ jumpstarter-controller │ + │ (creates Leases, │ + │ assigns Exporters) │ + └──────────┬──────────────┘ + │ + creates/updates Lease & Exporter objects + │ + ▼ + ┌────────────────────────────────────┐ + │ Kubernetes API │ + │ (Lease CRs, Exporter CRs, │ + │ *ExporterPool CRs) │ + └─┬──────────────┬──────────────┬────┘ + │ │ │ + watches │ watches │ watches │ + Leases + │ Leases + │ Leases + │ + Exporters │ Exporters │ Exporters │ + │ │ │ + ┌─────────────────▼┐ ┌───────────▼──────────┐┌──▼──────────────────────┐ + │ QEMUExporterPool │ │ AndroidExporterPool │ │ CorelliumExporterPool │ + │ Controller │ │ Controller │ │ Controller │ + └────────┬─────────┘ └──────────┬──────────┘ └────────────┬────────────┘ + │ │ │ + │ manages │ manages │ manages + ▼ ▼ ▼ + ┌──────────────────┐ ┌───────────────────────┐ ┌────────────────────────┐ + │ Warm Pool │ │ Warm Pool │ │ Warm Pool │ + │ [inst1][inst2].. │ │ [inst1][inst2].. │ │ [inst1].. │ + └────────┬─────────┘ └───────────┬───────────┘ └────────────┬───────────┘ + │ │ │ + └───────────────────────┼──────────────────────────┘ + │ register as standard Exporter CRs + ▼ + Kubernetes API (Exporters) +``` + +**Scaling Inputs — Watches on Leases and Exporters:** + +Each pool controller watches two key resources to make scaling decisions: + +1. **Leases** — The controller watches for pending Leases whose label selectors + match the pool's labels. Pending leases with no available exporter signal + demand and trigger scale-up. +2. **Exporters** — The controller watches the Exporter objects it owns to track + which instances are available (no active lease) vs. occupied (leased). This + determines the current pool utilization. + +Together these inputs feed the scaling logic: if there are pending leases that +match this pool and no available instances to serve them, scale up. If there are +excess idle instances beyond `minInstances` for a sustained period, scale down. + +**Per-Provider Deployments (single image by default):** All provider +controllers are compiled into a single binary. Each Deployment in the cluster +passes a `--provider=` flag to activate the corresponding reconciler. +This gives each provider isolated logs and independent restarts while +maintaining a single image to build and release. The per-provider `image` +override in the operator CR allows administrators to substitute a custom image +for a specific provider (e.g., a third-party provider distributed as its own +image) without affecting other providers. + +The Jumpstarter operator deploys pool controllers based on the `Jumpstarter` +CR configuration. A new `exporterPools` section lists which providers to +enable, following the same pattern as `controller` and `routers`: + +```yaml +apiVersion: operator.jumpstarter.dev/v1alpha1 +kind: Jumpstarter +metadata: + name: jumpstarter + namespace: jumpstarter +spec: + # ... existing controller, routers, authentication config ... + + # Pool controllers configuration (new) + exporterPools: + # Default image shared by all pool controllers (can be overridden per provider) + image: quay.io/jumpstarter-dev/pool-controller:latest + imagePullPolicy: IfNotPresent + + # List of providers to deploy controllers for + providers: + - name: qemu + enabled: true + resources: + requests: + cpu: 100m + memory: 256Mi + - name: android + enabled: true + resources: + requests: + cpu: 100m + memory: 256Mi + - name: corellium + enabled: false + # Override the default image for this provider + image: quay.io/jumpstarter-dev/pool-controller-corellium:latest + imagePullPolicy: Always +``` + +The operator creates one Deployment per enabled provider, passing +`--provider=` to the shared binary. This gives administrators a single +knob to enable/disable pool controllers, and the operator handles RBAC, +service accounts, and Deployment lifecycle. + +**Scaling Logic:** Each pool controller monitors its instances and scales based +on available (unleased) instances: + +- If available instances drop below a threshold (e.g., `minInstances`), scale up. +- If available instances exceed demand for a cooldown period, scale down (never + below `minInstances`). +- Never exceed `maxInstances`. + +**Instance Lifecycle:** + +1. Pool controller creates a Pod from the pool spec using provider-specific + templates. +2. The Pod starts the virtual target (e.g., QEMU VM, Android emulator, or + Corellium API call) and runs the Jumpstarter exporter, registering with + the controller like any other exporter. +3. The instance becomes available in the pool for lease assignment. +4. When a lease is released, the exporter internally handles cleanup/reset + (this is existing exporter behavior). The instance returns to the available + pool automatically. + + +### API / Protocol Changes + +**New CRDs: `*ExporterPool` (one per provider type)** + +Each provider type defines its own CRD. All share a common scaling spec +(embedded struct in Go) but have provider-specific configuration fields: + +```yaml +# Common fields shared by all *ExporterPool CRDs: +spec: + # Scaling (common) + minInstances: # Minimum warm pool size (default: 0) + maxInstances: # Maximum pool size (required) + scaleUpThreshold: # Scale up when available < this (default: minInstances) + scaleDownCooldown: # Wait before scaling down (default: 5m) + + # Node scheduling (common, optional) + # Applied to instance Pods — use to target baremetal nodes, nodes with + # nested virtualization, GPU nodes, specific architectures, etc. + nodeSelector: + : + + # Pod overrides (common, optional) + # Customize the exporter Pod container image and resource requests/limits. + # Providers set sensible defaults; these fields allow administrators to + # override them per pool. + podTemplate: + image: # Override the default exporter container image + resources: + requests: + cpu: + memory: + limits: + cpu: + memory: + + # Labels applied to all instances (common) + labels: + : + + # Exporter driver configuration template (common) + exporterTemplate: + drivers: + - type: + config: { ... } + +# Provider-specific fields differ per CRD: +# - QEMUExporterPool: machineType, firmware, resources (cpu/mem/storage), ... +# - AndroidExporterPool: systemImage, avdProfile, gpu, ... +# - CorelliumExporterPool: apiHost, apiCredentialsSecret, projectId, ... +# (CorelliumExporterPool typically does not use nodeSelector/podTemplate +# since it provisions instances via external API, so local pods connect to the +# corellium api, and the architecture/characteristics of the running node do not +# matter.) +``` + +**Status subresource (common to all pool CRDs):** + +```yaml +status: + totalInstances: 5 + readyInstances: 3 + leasedInstances: 2 + conditions: + - type: PoolHealthy + status: "True" + - type: ScalingLimited + status: "False" +``` + +**Changes to existing CRDs:** + +**Lease — new `parameters` field:** + +Leases gain an optional `parameters` dictionary that allows the requester to +pass provider-specific hints (e.g., requesting more CPU, RAM, or other +resources). The semantics of these parameters are entirely defined by the +virtual provider — the `jumpstarter-controller` passes them through without +interpretation. Physical exporters ignore this field. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: Lease +metadata: + name: my-lease +spec: + selector: + matchLabels: + board: rpi4 + virtual: "true" + # Provider-specific parameters (opaque dictionary) + parameters: + cpu: "8" + memory: "16Gi" + storage: "64Gi" +``` + +**Exporter — new `enabled` field:** + +Exporters gain an `enabled` boolean field (default: `true`). When set to +`false`, the `jumpstarter-controller` will not assign new leases to this +exporter. This is useful for: + +- **Lab operations:** Temporarily taking a physical exporter offline for + maintenance without deleting it. +- **Graceful scale-down:** Pool controllers set `enabled: false` before + terminating an instance, ensuring the controller doesn't race to assign a + lease to an exporter that is about to be deleted. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: Exporter +metadata: + name: qemu-rpi4-instance-3 +spec: + enabled: false # Controller will not assign new leases to this exporter +``` + +The graceful scale-down sequence becomes: + +1. Pool controller sets `enabled: false` on the target exporter. +2. Pool controller waits to confirm no lease was assigned (watches for + `status.leaseRef` to remain empty). +3. Pool controller deletes the Pod and Exporter CR. + +### Hardware Considerations + +This proposal is specifically designed to reduce reliance on physical hardware +for scalable testing. However: + +- Virtual targets must faithfully emulate the interfaces exposed by physical + hardware (serial, network, storage, power) through the existing driver model. +- Providers like QEMU/Renode require `/dev/kvm` access for acceptable + performance on the host nodes. +- Timing-sensitive tests (USB/IP latency, boot ROM timeouts) may behave + differently on virtual targets — the system should expose labels indicating + whether a target is physical or virtual so users can filter when fidelity + matters. + +## Design Decisions + +### DD-1: Pool-based scaling vs. purely on-demand provisioning + +**Alternatives considered:** + +1. **Pool-based with configurable min/max** — Maintain a warm pool of + pre-spawned instances; scale between `minInstances` and `maxInstances`. +2. **Purely on-demand** — Spawn a new instance only when a lease request arrives; + destroy it when the lease is released. + +**Decision:** Pool-based with configurable min/max. + +**Rationale:** Purely on-demand provisioning introduces unacceptable latency for +CI pipelines (VM boot + exporter registration can take 30-120s). A warm pool +provides instant lease fulfillment for the common case. Setting `minInstances: 0` +still allows purely on-demand behavior for rarely-used targets, giving operators +full control over the trade-off. + +### DD-2: Pool controller deployment model + +**Alternatives considered:** + +1. **Separate binary per provider** — Each provider is a completely independent + binary/image (e.g., `jumpstarter-qemu-pool-controller`). +2. **Single binary, one deployment per provider** — One image contains all + provider reconcilers; a CLI flag (`--provider=qemu`) selects which one to + activate. Each provider gets its own Deployment in the cluster. +3. **Single binary, single deployment** — One Deployment runs all provider + reconcilers together. +4. **Integrated into jumpstarter-controller** — Add pool reconcilers directly + into the existing operator. + +**Decision:** Option 2 — single binary, one Deployment per provider. + +**Rationale:** A single image is cheaper to build, test, and productize — there +is one CI pipeline, one vulnerability scan, one release artifact. Deploying it +as separate Deployments (one per provider) gives operational benefits: each +provider has isolated logs, independent scaling, and can be restarted without +affecting other providers. The `--provider` flag makes it explicit which CRD +a given Deployment reconciles. Adding a new provider type means adding a new +Deployment manifest pointing to the same image with a different flag — no new +image build required. + +### DD-3: CRD per provider vs. generic CRD + +**Alternatives considered:** + +1. **CRD per provider** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) — + Strongly typed, schema-validated, provider-specific fields at the top level. +2. **Single generic CRD** (`VirtualExporterPool`) with a `provider.type` field + and opaque `provider.config` map. +3. **Generic CRD + ConfigMap reference** — Pool CRD references a ConfigMap + containing provider-specific configuration. + +**Decision:** CRD per provider. + +**Rationale:** Strongly-typed CRDs give better UX (IDE completion, webhook +validation, clear documentation per provider). Each provider has fundamentally +different configuration (QEMU needs machine types and firmware images; Corellium +needs API credentials and device models) — a generic map loses type safety and +discoverability. New providers add a new CRD without touching existing ones. + + +## Design Details + +### Reconciliation Loop + +Each pool controller runs a continuous reconciliation loop for its CRD, +triggered by changes to the pool CR, owned Exporters, or matching Leases: + +``` +for each *ExporterPool CR: + ownedExporters = list Exporters owned by this CR + currentInstances = count ownedExporters in Ready state + leasedInstances = count ownedExporters with an active LeaseRef + availableInstances = currentInstances - leasedInstances + pendingLeases = count pending Leases whose labels match this pool's labels + + # Invariant: always maintain minInstances + if currentInstances < spec.minInstances: + scale up to spec.minInstances + + # Demand-driven scale-up: pending leases that we could serve + elif pendingLeases > 0 AND currentInstances < spec.maxInstances: + scale up by min(pendingLeases, spec.maxInstances - currentInstances) + + # Threshold-based scale-up: available pool running low + elif availableInstances < spec.scaleUpThreshold AND currentInstances < spec.maxInstances: + scale up (add instances to restore available pool) + + # Scale-down: excess idle instances beyond what we need + elif availableInstances > spec.scaleUpThreshold AND cooldown elapsed: + graceful scale down: + 1. set exporter.spec.enabled = false + 2. wait until confirmed no lease was assigned (leaseRef remains empty) + 3. delete Pod and Exporter CR + (never below minInstances) +``` + +### Instance States + +Each virtual exporter instance transitions through: + +``` +Provisioning → Ready (warm pool) → Leased → Ready + └→ Terminating → (deleted if available instances>min) +``` + +- **Provisioning:** Pod is starting, VM booting, exporter registering. +- **Ready:** Instance is registered and available for lease. +- **Leased:** Instance is assigned to an active lease. +- **Terminating:** Instance being deleted (scale-down). + +### Component Interaction + +1. Administrator creates a `*ExporterPool` CR (e.g., `QEMUExporterPool`). +2. The corresponding pool controller provisions `minInstances` Pods. +3. Each Pod boots the virtual target and runs the Jumpstarter exporter, + registering with the existing `jumpstarter-controller`. +4. Instances appear in the pool as regular exporters with the specified labels. +5. Users lease them normally — the existing controller handles assignment. +6. On lease release, the exporter handles internal cleanup. The instance + returns to the available pool. +7. The controller continuously monitors pool utilization and scales accordingly. + +### Failure Modes + +- **Pod crash:** Controller detects the failure via Pod status, replaces the + instance, maintains `minInstances` invariant. +- **Resource exhaustion:** Cannot scale beyond cluster capacity; pool stays at + current size, new leases queue as they would for physical targets. +- **Provider startup failure:** Instance marked as failed, controller retries + with backoff, alerts via conditions on the pool status. +- **Scaling storm:** Rate limiting on scale-up prevents creating too many + instances simultaneously. + +## Test Plan + + + +### Unit Tests +Unit tests should meet the project test coverage requirements. + +### Integration Tests + +- End-to-end lease lifecycle with a QEMU provider in a test cluster +- Mixed physical/virtual lease orchestration +- Provider failure and recovery scenarios + +## Acceptance Criteria + +- [ ] `QEMUExporterPool` CRD is defined and validated by the operator +- [ ] Pool controller maintains `minInstances` warm instances for each pool CR +- [ ] Pool controller scales up when available pool is depleted (up to + `maxInstances`) +- [ ] Pool controller scales down idle instances after cooldown (never below + `minInstances`) +- [ ] At least one provider (`QEMUExporterPool`) is fully implemented and tested +- [ ] Virtual instances register as standard exporters and are leasable without + changes to the existing lease flow +- [ ] Pod failures are detected and reported in the pool status. +- [ ] A pool with `minInstances: 0` provisions instances only on demand +- [ ] Pool status subresource reports instance counts and health conditions +- [ ] Documentation covers pool CRD configuration and provider setup +- [ ] Shared scaling logic is reusable for new provider CRDs + +## Graduation Criteria + +### Experimental + +- `QEMUExporterPool` functional in a development cluster +- Basic pool lifecycle works end-to-end (scale up, lease, release, scale down) +- Community feedback on CRD schema and scaling behavior + +### Stable + +- At least two provider CRDs implemented (e.g., `QEMUExporterPool` + + `AndroidExporterPool`) +- Production usage by at least one team for >1 month +- Performance benchmarks documented (cold-start latency, scaling responsiveness) +- Provider authoring guide published (how to add a new `*ExporterPool` CRD) + +## Backward Compatibility + +- Existing physical-only workflows are unaffected; lease requests without + virtual-specific labels continue to work as before. +- No changes to the existing gRPC protocol for physical exporters. +- New `*ExporterPool` CRDs are additive. +- **Exporter `enabled` field:** Defaults to `true`, so all existing Exporters + continue to behave exactly as before. The `jumpstarter-controller` must be + updated to respect this field (skip disabled exporters during lease + assignment). +- **Lease `parameters` field:** Optional and ignored by the existing controller + for physical exporters. Only pool controllers interpret it. Existing leases + without parameters work unchanged. +- Administrators upgrading to a pool-enabled version see no behavior change + until they explicitly deploy a `*ExporterPool` resource. + +## Consequences + +### Positive + +- **Instant lease fulfillment:** Warm pools eliminate provisioning latency for + virtual targets, making CI pipelines faster and more predictable. +- **Elastic scaling:** Pools grow and shrink with demand, avoiding both + resource waste (idle VMs) and artificial queuing. +- **Unified user experience:** Virtual and physical targets are leased through + the same mechanism — users do not need to learn a separate workflow. +- **Operator control:** `minInstances` / `maxInstances` give administrators a + simple, declarative knob to tune the cost-vs-responsiveness trade-off per + target type. +- **Extensible provider model:** New virtual providers (Renode, Qemu, Corellium, Android, + etc.) can be added by defining a new CRD and reconciler without modifying + the core controller or existing providers. + +### Negative + +- **Increased operator complexity:** Pool controllers, scaling logic, and + per-provider CRDs add operational surface area — more components to deploy, + monitor, and debug. +- **Resource consumption:** Warm pools consume cluster resources even when not + actively leased. Misconfigured `minInstances` can lead to waste. +- **New CRD proliferation:** Each provider type adds a CRD; clusters with + many providers will have many CRDs to manage and version. + +### Risks + +- **Scaling storms:** A burst of pending leases could trigger rapid scale-up, + exhausting cluster resources. Rate limiting mitigates this but may delay + lease fulfillment under extreme load. +- **Provider startup reliability:** If a virtual provider frequently fails to + start (e.g., firmware download issues, QEMU misconfiguration), the pool + controller may enter a tight crash-replace loop, consuming resources without + making progress. + +## Rejected Alternatives + +- **Static fixed-size pools (status quo):** Cannot scale with demand. Operators + must manually adjust pool sizes, leading to either waste or queuing. + +- **External orchestration (Terraform/Ansible):** Pushes complexity to the user, + breaks the single-pane-of-glass experience, and cannot integrate with + Jumpstarter's lease semantics. + +## Prior Art + +- **LAVA (Linaro Automated Validation Architecture):** Supports virtual DUTs via + QEMU but with static configuration; no on-demand scaling. + +## Unresolved Questions + +- What is the exact scaling algorithm (proportional, step-based, predictive)? + +### Resolved + +- **Observability (JEP-0013):** Pool controllers and virtual exporter instances + emit metrics and logs using the same mechanisms defined in JEP-0013. + Pool-specific metrics (pool size, available/leased counts, scale-up/down + events) are additional metric series following the same conventions. +- **Lease release detection:** Pool controllers watch Lease objects directly. + When a Lease referencing one of their managed Exporters is deleted or + transitions to a released state, the controller triggers scale-down + evaluation if needed. + +## Future Possibilities + +The following extensions are explicitly **not** part of this JEP but are +natural follow-ups enabled by the pool infrastructure: + +- **Corellium provider:** A `CorelliumExporterPool` CRD that provisions + virtual instances via the Corellium REST API, with credentials injected + from Kubernetes Secrets. +- **Renode provider:** A `RenodeExporterPool` CRD leveraging JEP-0010's Renode + integration as another virtual provider type. + +## Implementation Plan + +The implementation is broken into phases. Each phase delivers a usable +increment and can be merged independently. + +### Phase 1: Exporter `enabled` field + +Add the `enabled` boolean field to the Exporter CRD and update the +`jumpstarter-controller` lease assignment logic to skip disabled exporters. + +**Deliverables:** + +- [ ] Add `spec.enabled` field to Exporter CRD (default: `true`) +- [ ] Update lease assignment in `jumpstarter-controller` to filter out + disabled exporters +- [ ] Unit tests for the filtering logic +- [ ] Integration test: disable an exporter, verify it gets no new leases + +**Why first:** This is a small, self-contained change that is independently +useful for lab operations (maintenance mode) and is a prerequisite for +graceful scale-down in later phases. + +### Phase 2: Pool controller scaffold and `QEMUExporterPool` CRD + +Build the pool controller binary with the `--provider` flag, define the +`QEMUExporterPool` CRD, and implement the core reconciliation loop. + +**Deliverables:** + +- [ ] Define `QEMUExporterPool` CRD schema (scaling fields, nodeSelector, + podTemplate, labels, QEMU-specific fields, exporterTemplate) +- [ ] Implement the pool controller binary with `--provider=qemu` flag +- [ ] Implement core scaling logic: maintain `minInstances`, scale up when + pool is depleted, graceful scale-down (disable → wait → delete) +- [ ] Instance provisioning: create Pods running the Jumpstarter exporter + with QEMU provider configuration +- [ ] Instance Pods register as standard Exporter CRs +- [ ] Pool status subresource (totalInstances, readyInstances, leasedInstances, + conditions) +- [ ] Watch Leases and Exporters for scaling decisions +- [ ] Add `exporterPools` section to the `Jumpstarter` operator CR spec +- [ ] Operator deploys pool controller Deployments based on enabled providers + (RBAC, service accounts, Deployment lifecycle) +- [ ] Unit tests for reconciliation logic +- [ ] Integration test: deploy a `QEMUExporterPool`, verify instances come up, + lease one, release it, observe scale behavior + +### Phase 3: Additional providers + +Add support for additional provider types using the same binary with different +`--provider` flags. + +**Deliverables:** + +- [ ] `AndroidExporterPool` CRD and reconciler +- [ ] Provider authoring guide documenting how to add a new `*ExporterPool` + +### Phase 4: Lease `parameters` + +Add the optional `parameters` dictionary to the Lease CRD, allowing users to +pass provider-specific resource requests. Pool controllers interpret these +parameters when provisioning new instances on demand. + +**Deliverables:** + +- [ ] Add `spec.parameters` field to Lease CRD (optional map[string]string) +- [ ] Update pool controllers to read parameters from pending Leases when + deciding how to provision a new instance (e.g., override CPU/memory) +- [ ] Define how parameters interact with pool defaults (override vs. merge) +- [ ] Tests: lease with parameters triggers a customized instance +- [ ] Documentation on supported parameters per provider + +**Why last:** This feature builds on top of a working pool system. It requires +the pool controller to understand per-lease customization, which adds +complexity. The system is fully functional without it — parameters are a +power-user feature for dynamic resource sizing. + +## Implementation History + +- 2025-10-30: RFE filed upstream (GitHub #41) +- 2026-06-03: JEP proposed + +## References + +- [GitHub Issue #41: RFE: On-Demand Virtual Target Provisioning](https://github.com/jumpstarter-dev/jumpstarter/issues/41) +- [PITCREW-409: jumpstarter JEP: virtual scalable exporters](https://redhat.atlassian.net/browse/PITCREW-409) +- [JEP-0010: Renode Integration](JEP-0010-renode-integration.md) — Related provider +- [JEP-0013: Observability](JEP-0013-observability-telemetry-logs.md) — Integration point + +--- + +This JEP is licensed under the +[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0), diff --git a/python/docs/source/contributing/jeps/index.md b/python/docs/source/contributing/jeps/index.md index 05e205eb8..fa4c90e1f 100644 --- a/python/docs/source/contributing/jeps/index.md +++ b/python/docs/source/contributing/jeps/index.md @@ -37,6 +37,7 @@ For the full process definition, see [JEP-0000](JEP-0000-jep-process.md). | 0010 | [Renode Integration](JEP-0010-renode-integration.md) | Implemented | @vtz (Vinicius Zein) | | 0011 | [Protobuf Introspection and Interface Generation](JEP-0011-protobuf-introspection-interface-generation.md) | Accepted | @kirkbrauer (Kirk Brauer) | | 0013 | [Metrics, Tracing, and Log Observability](JEP-0013-observability-telemetry-logs.md) | Accepted | @mangelajo (Miguel Angel Ajo Pelayo) | +| 0014 | [Virtual Scalable Exporters](JEP-0014-virtual-scalable-exporters.md) | Draft | @mangelajo (Miguel Angel Ajo Pelayo) | ### Informational JEPs @@ -62,5 +63,3 @@ For the full process definition, see [JEP-0000](JEP-0000-jep-process.md). | Withdrawn | Author voluntarily withdrew | | Active | Living document, actively maintained (Process JEPs only) | | Superseded | Replaced by a newer JEP | - - From 52056f450883b656fed0e088c8ba1ea46af586ec Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Thu, 4 Jun 2026 10:37:11 +0200 Subject: [PATCH 2/6] docs(jep-0014): reject per-lease parameters in favor of pool flavors Add DD-4 explaining why per-lease parameters are not included in this JEP. The same use case is served by creating separate pools with different resource profiles, avoiding complexity across the Lease CRD, controller, pool controllers, and driver templates. Co-authored-by: Cursor --- .../JEP-0014-virtual-scalable-exporters.md | 80 ++++++++----------- 1 file changed, 32 insertions(+), 48 deletions(-) diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md index 974bc8bb9..76e10e124 100644 --- a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -440,31 +440,6 @@ status: **Changes to existing CRDs:** -**Lease — new `parameters` field:** - -Leases gain an optional `parameters` dictionary that allows the requester to -pass provider-specific hints (e.g., requesting more CPU, RAM, or other -resources). The semantics of these parameters are entirely defined by the -virtual provider — the `jumpstarter-controller` passes them through without -interpretation. Physical exporters ignore this field. - -```yaml -apiVersion: jumpstarter.dev/v1alpha1 -kind: Lease -metadata: - name: my-lease -spec: - selector: - matchLabels: - board: rpi4 - virtual: "true" - # Provider-specific parameters (opaque dictionary) - parameters: - cpu: "8" - memory: "16Gi" - storage: "64Gi" -``` - **Exporter — new `enabled` field:** Exporters gain an `enabled` boolean field (default: `true`). When set to @@ -570,6 +545,31 @@ different configuration (QEMU needs machine types and firmware images; Corellium needs API credentials and device models) — a generic map loses type safety and discoverability. New providers add a new CRD without touching existing ones. +### DD-4: Per-lease parameters vs. pool flavors + +**Alternatives considered:** + +1. **Per-lease `parameters` dictionary** — Leases carry an opaque + `map[string]string` that pool controllers interpret when provisioning + instances (e.g., override CPU, memory, or storage). The controller passes + parameters through without interpretation; only pool controllers read them. +2. **Multiple pool flavors** — Administrators create separate pool CRs for + different resource profiles (e.g., `rpi4-virtual-small` with 2 CPU / 2 Gi + and `rpi4-virtual-large` with 8 CPU / 16 Gi). Users select a profile via + label matching at lease time. + +**Decision:** Option 2 — multiple pool flavors via separate pool CRs. + +**Rationale:** Per-lease parameters add complexity across every layer: the Lease +CRD gains a new field, the controller must pass it through, pool controllers +must parse and validate provider-specific keys, driver templates must support +runtime overrides, and the interaction between parameters and pool defaults +(override vs. merge) must be defined and tested. All of this for a use case +that is already satisfied by creating multiple pools with different resource +profiles and letting users select via labels. The pool-flavors approach keeps +the Lease API unchanged, requires no controller modifications, and is +immediately understandable. Per-lease parameters can be revisited in a future +JEP if the pool-flavors model proves insufficient. ## Design Details @@ -700,9 +700,6 @@ Unit tests should meet the project test coverage requirements. continue to behave exactly as before. The `jumpstarter-controller` must be updated to respect this field (skip disabled exporters during lease assignment). -- **Lease `parameters` field:** Optional and ignored by the existing controller - for physical exporters. Only pool controllers interpret it. Existing leases - without parameters work unchanged. - Administrators upgrading to a pool-enabled version see no behavior change until they explicitly deploy a `*ExporterPool` resource. @@ -752,6 +749,13 @@ Unit tests should meet the project test coverage requirements. breaks the single-pane-of-glass experience, and cannot integrate with Jumpstarter's lease semantics. +- **Per-lease `parameters` dictionary on the Lease CRD:** Would allow users to + pass provider-specific resource hints (CPU, memory, storage) per lease. + Rejected because it adds complexity to every layer (Lease CRD, controller + pass-through, pool controller parsing, driver template overrides) for a use + case already served by creating separate pools with different resource + profiles. See DD-4. + ## Prior Art - **LAVA (Linaro Automated Validation Architecture):** Supports virtual DUTs via @@ -840,26 +844,6 @@ Add support for additional provider types using the same binary with different - [ ] `AndroidExporterPool` CRD and reconciler - [ ] Provider authoring guide documenting how to add a new `*ExporterPool` -### Phase 4: Lease `parameters` - -Add the optional `parameters` dictionary to the Lease CRD, allowing users to -pass provider-specific resource requests. Pool controllers interpret these -parameters when provisioning new instances on demand. - -**Deliverables:** - -- [ ] Add `spec.parameters` field to Lease CRD (optional map[string]string) -- [ ] Update pool controllers to read parameters from pending Leases when - deciding how to provision a new instance (e.g., override CPU/memory) -- [ ] Define how parameters interact with pool defaults (override vs. merge) -- [ ] Tests: lease with parameters triggers a customized instance -- [ ] Documentation on supported parameters per provider - -**Why last:** This feature builds on top of a working pool system. It requires -the pool controller to understand per-lease customization, which adds -complexity. The system is fully functional without it — parameters are a -power-user feature for dynamic resource sizing. - ## Implementation History - 2025-10-30: RFE filed upstream (GitHub #41) From b397149365b2c06394b4eb2b70b0a4094b193f0a Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Thu, 4 Jun 2026 13:01:05 +0200 Subject: [PATCH 3/6] docs(jep-0014): address PR review feedback - Clarify warm pool rationale and cold-start latency range (10-60s) - Rename minInstances/maxInstances to minWarmInstances/maxTotalInstances - Make maxTotalInstances optional (0 or omitted means unlimited) - Add Crossplane to Prior Art with rationale - Resolve scheduled leases question via existing BeginTime mechanism - Add DD-5: built-in scaling vs HPA/KEDA - Add DD-4: per-lease parameters rejected in favor of pool flavors - Add composite exporters and Corellium to Future Possibilities - Clarify instance reuse with recycleStrategy field (ExitAndReplace default) - Add language identifiers to untyped fenced code blocks - Add Apache 2.0 license footer Co-authored-by: Cursor --- .../JEP-0014-virtual-scalable-exporters.md | 148 ++++++++++++------ 1 file changed, 103 insertions(+), 45 deletions(-) diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md index 76e10e124..0fc9bf772 100644 --- a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -22,7 +22,8 @@ This JEP proposes a Virtual Scalable Exporter subsystem for Jumpstarter that manages pools of virtual targets with configurable autoscaling. Each virtual target definition declares a minimum and maximum number of instances; the system maintains a warm pool of pre-spawned exporters ready for immediate lease -fulfillment, and scales up or down based on demand. This enables low-latency +fulfillment — avoiding the 10-60s cold-start latency of VM boot and exporter +registration — and scales up or down based on demand. This enables low-latency lease acquisition, massive scalability, resource efficiency, and simplified orchestration of mixed physical/virtual test topologies — while allowing administrators to tune the trade-off between responsiveness and resource @@ -63,10 +64,10 @@ configurable warm pool while autoscaling to meet demand. iterate quickly without waiting for scarce hardware. - **As a** platform engineer, **I want to** declare a virtual target pool with - `minInstances: 2, maxInstances: 20`, **so that** there are always warm + `minWarmInstances: 2, maxTotalInstances: 20`, **so that** there are always warm instances ready while the system scales up on demand and scales down when idle. -- **As a** cost-conscious operator, **I want to** set `minInstances: 0` for +- **As a** cost-conscious operator, **I want to** set `minWarmInstances: 0` for rarely-used target types, **so that** they consume no resources until actually requested, accepting a cold-start delay. @@ -95,8 +96,8 @@ metadata: namespace: jumpstarter spec: # Scaling configuration (shared across all pool CRDs) - minInstances: 2 # Always keep 2 warm instances ready - maxInstances: 20 # Scale up to 20 under load + minWarmInstances: 2 # Always keep 2 warm instances ready + maxTotalInstances: 20 # Scale up to 20 under load # Node scheduling (shared across all pool CRDs, optional) nodeSelector: @@ -146,8 +147,8 @@ metadata: name: pixel7-emulator namespace: jumpstarter spec: - minInstances: 0 # Fully on-demand (cold-start OK for this target) - maxInstances: 10 + minWarmInstances: 0 # Fully on-demand (cold-start OK for this target) + maxTotalInstances: 10 labels: device: pixel7 @@ -182,8 +183,8 @@ metadata: name: rd1ae-corellium namespace: jumpstarter spec: - minInstances: 1 - maxInstances: 5 + minWarmInstances: 1 + maxTotalInstances: 5 labels: board: rd1ae @@ -214,8 +215,8 @@ driver config at instance creation time. Only fields that vary per instance (lik `device_name` using the `{{ .InstanceName }}` template variable) need to be specified explicitly in `exporterTemplate`. -A pool with `minInstances: 0` consumes no resources until a lease is -requested, accepting cold-start latency. A pool with `minInstances: 3` +A pool with `minWarmInstances: 0` consumes no resources until a lease is +requested, accepting cold-start latency. A pool with `minWarmInstances: 3` always has 3 ready-to-lease instances — leases are fulfilled instantly from the warm pool, and the controller scales up if more are needed. @@ -240,7 +241,7 @@ as physical ones, differentiated only by labels. ### Architecture Overview -``` +```text ┌─────────────────────────┐ │ jumpstarter-controller │ │ (creates Leases, │ @@ -291,7 +292,7 @@ Each pool controller watches two key resources to make scaling decisions: Together these inputs feed the scaling logic: if there are pending leases that match this pool and no available instances to serve them, scale up. If there are -excess idle instances beyond `minInstances` for a sustained period, scale down. +excess idle instances beyond `minWarmInstances` for a sustained period, scale down. **Per-Provider Deployments (single image by default):** All provider controllers are compiled into a single binary. Each Deployment in the cluster @@ -350,10 +351,10 @@ service accounts, and Deployment lifecycle. **Scaling Logic:** Each pool controller monitors its instances and scales based on available (unleased) instances: -- If available instances drop below a threshold (e.g., `minInstances`), scale up. +- If available instances drop below a threshold (e.g., `minWarmInstances`), scale up. - If available instances exceed demand for a cooldown period, scale down (never - below `minInstances`). -- Never exceed `maxInstances`. + below `minWarmInstances`). +- Never exceed `maxTotalInstances` (if set; 0 or omitted means no upper bound). **Instance Lifecycle:** @@ -379,10 +380,10 @@ Each provider type defines its own CRD. All share a common scaling spec # Common fields shared by all *ExporterPool CRDs: spec: # Scaling (common) - minInstances: # Minimum warm pool size (default: 0) - maxInstances: # Maximum pool size (required) - scaleUpThreshold: # Scale up when available < this (default: minInstances) + minWarmInstances: # Minimum available (unleased) instances (default: 0) + maxTotalInstances: # Maximum total instances, warm + leased (0 or omitted = no limit) scaleDownCooldown: # Wait before scaling down (default: 5m) + recycleStrategy: # "ExitAndReplace" (default) or "InPlaceReuse" # Node scheduling (common, optional) # Applied to instance Pods — use to target baremetal nodes, nodes with @@ -489,15 +490,17 @@ for scalable testing. However: **Alternatives considered:** 1. **Pool-based with configurable min/max** — Maintain a warm pool of - pre-spawned instances; scale between `minInstances` and `maxInstances`. + pre-spawned instances; scale between `minWarmInstances` and `maxTotalInstances`. 2. **Purely on-demand** — Spawn a new instance only when a lease request arrives; destroy it when the lease is released. **Decision:** Pool-based with configurable min/max. -**Rationale:** Purely on-demand provisioning introduces unacceptable latency for -CI pipelines (VM boot + exporter registration can take 30-120s). A warm pool -provides instant lease fulfillment for the common case. Setting `minInstances: 0` +**Rationale:** Purely on-demand provisioning introduces noticeable latency for +CI pipelines (Pod scheduling + image pull + VM boot + exporter registration +typically takes 10-15s, and up to 60s with cold image pulls or heavy +providers). A warm pool +provides instant lease fulfillment for the common case. Setting `minWarmInstances: 0` still allows purely on-demand behavior for rarely-used targets, giving operators full control over the trade-off. @@ -571,6 +574,31 @@ the Lease API unchanged, requires no controller modifications, and is immediately understandable. Per-lease parameters can be revisited in a future JEP if the pool-flavors model proves insufficient. +### DD-5: Built-in scaling vs. HPA / KEDA + +**Alternatives considered:** + +1. **Built-in scaling logic** — Each pool controller implements its own + reconciliation loop that watches pending Leases and owned Exporters to + make scaling decisions. +2. **Kubernetes HPA** — Use the Horizontal Pod Autoscaler with custom metrics + (e.g., pending lease count) to scale exporter Pods. +3. **KEDA** — Use KEDA's event-driven autoscaler with a custom scaler that + reads Jumpstarter lease state. + +**Decision:** Option 1 — built-in scaling logic. + +**Rationale:** Pool controllers need Jumpstarter-specific knowledge that +generic autoscalers cannot express: label matching between pending Leases and +pool labels, the graceful disable-before-delete sequence (`enabled: false` -> +verify no lease assigned -> delete), awareness of exporter readiness states, +and the `minWarmInstances` invariant. HPA and KEDA operate on numeric metrics +and target averages — they cannot implement the multi-step graceful shutdown or +lease-aware matching without a custom controller wrapping them, which would +negate their simplicity advantage. Exposing pool metrics for HPA/KEDA-driven +scaling is listed in *Future Possibilities* as a complementary option once the +core pool controller is stable. + ## Design Details ### Reconciliation Loop @@ -578,7 +606,7 @@ JEP if the pool-flavors model proves insufficient. Each pool controller runs a continuous reconciliation loop for its CRD, triggered by changes to the pool CR, owned Exporters, or matching Leases: -``` +```text for each *ExporterPool CR: ownedExporters = list Exporters owned by this CR currentInstances = count ownedExporters in Ready state @@ -586,32 +614,32 @@ for each *ExporterPool CR: availableInstances = currentInstances - leasedInstances pendingLeases = count pending Leases whose labels match this pool's labels - # Invariant: always maintain minInstances - if currentInstances < spec.minInstances: - scale up to spec.minInstances + # Invariant: always maintain minWarmInstances + if currentInstances < spec.minWarmInstances: + scale up to spec.minWarmInstances # Demand-driven scale-up: pending leases that we could serve - elif pendingLeases > 0 AND currentInstances < spec.maxInstances: - scale up by min(pendingLeases, spec.maxInstances - currentInstances) + elif pendingLeases > 0 AND currentInstances < spec.maxTotalInstances: + scale up by min(pendingLeases, spec.maxTotalInstances - currentInstances) # Threshold-based scale-up: available pool running low - elif availableInstances < spec.scaleUpThreshold AND currentInstances < spec.maxInstances: + elif availableInstances < spec.minWarmInstances AND currentInstances < spec.maxTotalInstances: scale up (add instances to restore available pool) # Scale-down: excess idle instances beyond what we need - elif availableInstances > spec.scaleUpThreshold AND cooldown elapsed: + elif availableInstances > spec.minWarmInstances AND cooldown elapsed: graceful scale down: 1. set exporter.spec.enabled = false 2. wait until confirmed no lease was assigned (leaseRef remains empty) 3. delete Pod and Exporter CR - (never below minInstances) + (never below minWarmInstances) ``` ### Instance States Each virtual exporter instance transitions through: -``` +```text Provisioning → Ready (warm pool) → Leased → Ready └→ Terminating → (deleted if available instances>min) ``` @@ -624,19 +652,29 @@ Provisioning → Ready (warm pool) → Leased → Ready ### Component Interaction 1. Administrator creates a `*ExporterPool` CR (e.g., `QEMUExporterPool`). -2. The corresponding pool controller provisions `minInstances` Pods. +2. The corresponding pool controller provisions `minWarmInstances` Pods. 3. Each Pod boots the virtual target and runs the Jumpstarter exporter, registering with the existing `jumpstarter-controller`. 4. Instances appear in the pool as regular exporters with the specified labels. 5. Users lease them normally — the existing controller handles assignment. -6. On lease release, the exporter handles internal cleanup. The instance - returns to the available pool. -7. The controller continuously monitors pool utilization and scales accordingly. +6. On lease release, the instance is recycled. Two strategies are supported: + - **Exit-and-replace (default):** The exporter exits after cleanup. The + pool controller detects the Pod termination and creates a fresh + replacement, ensuring a clean state between leases. The cold-start + latency is absorbed by the warm pool — replacement instances are + provisioned proactively to maintain `minWarmInstances`. + - **In-place reuse:** The exporter handles internal cleanup (e.g., power + off the VM, reset state) without exiting. The Pod and exporter process + remain running and the instance transitions back to Ready immediately. + Useful when cold-start latency is high and the provider guarantees + clean state after reset. +7. The pool controller continuously monitors pool utilization and scales + accordingly. ### Failure Modes - **Pod crash:** Controller detects the failure via Pod status, replaces the - instance, maintains `minInstances` invariant. + instance, maintains `minWarmInstances` invariant. - **Resource exhaustion:** Cannot scale beyond cluster capacity; pool stays at current size, new leases queue as they would for physical targets. - **Provider startup failure:** Instance marked as failed, controller retries @@ -660,16 +698,16 @@ Unit tests should meet the project test coverage requirements. ## Acceptance Criteria - [ ] `QEMUExporterPool` CRD is defined and validated by the operator -- [ ] Pool controller maintains `minInstances` warm instances for each pool CR +- [ ] Pool controller maintains `minWarmInstances` warm instances for each pool CR - [ ] Pool controller scales up when available pool is depleted (up to - `maxInstances`) + `maxTotalInstances`) - [ ] Pool controller scales down idle instances after cooldown (never below - `minInstances`) + `minWarmInstances`) - [ ] At least one provider (`QEMUExporterPool`) is fully implemented and tested - [ ] Virtual instances register as standard exporters and are leasable without changes to the existing lease flow - [ ] Pod failures are detected and reported in the pool status. -- [ ] A pool with `minInstances: 0` provisions instances only on demand +- [ ] A pool with `minWarmInstances: 0` provisions instances only on demand - [ ] Pool status subresource reports instance counts and health conditions - [ ] Documentation covers pool CRD configuration and provider setup - [ ] Shared scaling logic is reusable for new provider CRDs @@ -713,7 +751,7 @@ Unit tests should meet the project test coverage requirements. resource waste (idle VMs) and artificial queuing. - **Unified user experience:** Virtual and physical targets are leased through the same mechanism — users do not need to learn a separate workflow. -- **Operator control:** `minInstances` / `maxInstances` give administrators a +- **Operator control:** `minWarmInstances` / `maxTotalInstances` give administrators a simple, declarative knob to tune the cost-vs-responsiveness trade-off per target type. - **Extensible provider model:** New virtual providers (Renode, Qemu, Corellium, Android, @@ -726,7 +764,7 @@ Unit tests should meet the project test coverage requirements. per-provider CRDs add operational surface area — more components to deploy, monitor, and debug. - **Resource consumption:** Warm pools consume cluster resources even when not - actively leased. Misconfigured `minInstances` can lead to waste. + actively leased. Misconfigured `minWarmInstances` can lead to waste. - **New CRD proliferation:** Each provider type adds a CRD; clusters with many providers will have many CRDs to manage and version. @@ -761,6 +799,14 @@ Unit tests should meet the project test coverage requirements. - **LAVA (Linaro Automated Validation Architecture):** Supports virtual DUTs via QEMU but with static configuration; no on-demand scaling. +- **Crossplane:** A CNCF project for composing cloud infrastructure as + Kubernetes CRDs. While Crossplane shares the CRD-driven provisioning pattern, + it targets general-purpose cloud resource composition and has no awareness of + Jumpstarter's lease semantics, warm pool management, or exporter registration. + Jumpstarter already has its own CRD model (Leases, Exporters) and operator + framework; adopting Crossplane would add a heavyweight dependency without + replacing the pool-specific scaling and lifecycle logic this JEP requires. + ## Unresolved Questions - What is the exact scaling algorithm (proportional, step-based, predictive)? @@ -775,6 +821,14 @@ Unit tests should meet the project test coverage requirements. When a Lease referencing one of their managed Exporters is deleted or transitions to a released state, the controller triggers scale-down evaluation if needed. +- **Scheduled (future-dated) leases:** The existing `jumpstarter-controller` + already supports `Spec.BeginTime` for scheduled leases. The controller does + not attempt to acquire an exporter until `BeginTime` arrives; it requeues + with a delay. Once `BeginTime` passes and no exporter is available, the + controller sets a `Pending` condition (e.g., reason `NotAvailable`). Pool + controllers watch for pending leases with matching labels as their scaling + input, so they naturally do not scale up for future-dated leases until the + controller makes them effective. ## Future Possibilities @@ -786,6 +840,10 @@ natural follow-ups enabled by the pool infrastructure: from Kubernetes Secrets. - **Renode provider:** A `RenodeExporterPool` CRD leveraging JEP-0010's Renode integration as another virtual provider type. +- **Composite leases:** Linking multiple exporters (potentially from + different pools) into a single logical target — e.g., a QEMU VM paired + with a network emulator as one leasable unit. This would require + multi-exporter lease semantics and coordinated lifecycle management. ## Implementation Plan @@ -819,7 +877,7 @@ Build the pool controller binary with the `--provider` flag, define the - [ ] Define `QEMUExporterPool` CRD schema (scaling fields, nodeSelector, podTemplate, labels, QEMU-specific fields, exporterTemplate) - [ ] Implement the pool controller binary with `--provider=qemu` flag -- [ ] Implement core scaling logic: maintain `minInstances`, scale up when +- [ ] Implement core scaling logic: maintain `minWarmInstances`, scale up when pool is depleted, graceful scale-down (disable → wait → delete) - [ ] Instance provisioning: create Pods running the Jumpstarter exporter with QEMU provider configuration From 0c8b7431db95b71f40064c0f8eeb9611fe9cee71 Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Fri, 5 Jun 2026 10:47:10 +0200 Subject: [PATCH 4/6] docs(jeps): re-add toctree to fix Sphinx warning for new JEP files New JEP files not listed in any toctree cause Sphinx build warnings, which fail the check-warnings CI job. Co-authored-by: Cursor --- python/docs/source/contributing/jeps/index.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/python/docs/source/contributing/jeps/index.md b/python/docs/source/contributing/jeps/index.md index fa4c90e1f..352e6eb23 100644 --- a/python/docs/source/contributing/jeps/index.md +++ b/python/docs/source/contributing/jeps/index.md @@ -63,3 +63,13 @@ For the full process definition, see [JEP-0000](JEP-0000-jep-process.md). | Withdrawn | Author voluntarily withdrew | | Active | Living document, actively maintained (Process JEPs only) | | Superseded | Replaced by a newer JEP | + +```{toctree} +:hidden: + +JEP-0000-jep-process.md +JEP-0010-renode-integration.md +JEP-0011-protobuf-introspection-interface-generation.md +JEP-0013-observability-telemetry-logs.md +JEP-0014-virtual-scalable-exporters.md +``` From 355013d733c0440e11ec436e35d7470b5da42776 Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Thu, 18 Jun 2026 15:56:47 +0200 Subject: [PATCH 5/6] docs(jep-0014): add end-to-end flow and revised ExporterSet model Document admin, controller, and user actions for the QEMU warm-pool scenario. Adopt ExporterSet + VirtualTargetClass reference graph, simplify homogeneous QEMU pools to avoid per-instance claims, and align examples with the phased lifecycle. Co-authored-by: Cursor --- .../JEP-0014-virtual-scalable-exporters.md | 1361 +++++++++++------ 1 file changed, 861 insertions(+), 500 deletions(-) diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md index 0fc9bf772..85ff2e650 100644 --- a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -8,7 +8,7 @@ | **Status** | Draft | | **Type** | Standards Track | | **Created** | 2026-06-03 | -| **Updated** | 2026-06-03 | +| **Updated** | 2026-06-18 | | **Discussion** | https://github.com/jumpstarter-dev/jumpstarter/issues/41 | | **Requires** | | | **Supersedes** | | @@ -19,15 +19,16 @@ ## Abstract This JEP proposes a Virtual Scalable Exporter subsystem for Jumpstarter that -manages pools of virtual targets with configurable autoscaling. Each virtual -target definition declares a minimum and maximum number of instances; the system -maintains a warm pool of pre-spawned exporters ready for immediate lease -fulfillment — avoiding the 10-60s cold-start latency of VM boot and exporter -registration — and scales up or down based on demand. This enables low-latency -lease acquisition, massive scalability, resource efficiency, and simplified -orchestration of mixed physical/virtual test topologies — while allowing -administrators to tune the trade-off between responsiveness and resource -consumption on a per-target basis. +manages pools of virtual targets with configurable autoscaling. Conceptually, +the system scales **virtual targets**; the **Exporter** is the scheduling and +leasing unit (the Pod analog). Each `ExporterSet` declares scaling bounds using +familiar Kubernetes vocabulary (`minReplicas`, `maxReplicas`, +`minAvailableReplicas`); the controller maintains a warm pool of ready exporters +to absorb the 10-60s cold-start latency of VM boot and exporter registration. +This enables low-latency lease acquisition, massive scalability, resource +efficiency, and simplified orchestration of mixed physical/virtual test +topologies — while allowing administrators to tune the trade-off between +responsiveness and resource consumption on a per-target basis. ## Motivation @@ -53,6 +54,24 @@ This model has several limitations: The core problem is that virtual targets lack a pool manager that can maintain a configurable warm pool while autoscaling to meet demand. +### Fidelity / Cost Ladder + +One logical target can be served by multiple backends at different fidelity and +cost tiers. Users select via labels through `jmp lease`; the same workflow +applies regardless of backend: + +| class (provisioner) | fidelity | scale/cost | role | +| --- | --- | --- | --- | +| container sim (`qemu.jumpstarter.dev`) | low | cheap / CI-scale | functional checks | +| cloud virtual device (`corellium.jumpstarter.dev`) | high | metered | higher-fidelity behavior | +| real hardware (Exporter) | full | scarce | ground truth | + +For example, a target that needs GPU or specialized I/O can run functional +checks cheaply on a QEMU class in CI, validate higher-fidelity behavior on a +cloud-backed virtual device, and use real hardware as ground truth. The +`VirtualTargetClass` / `*VirtualTarget` abstraction makes this ladder explicit +without changing the lease experience. + ### User Stories - **As a** CI pipeline author, **I want to** lease N virtual targets instantly @@ -63,162 +82,264 @@ configurable warm pool while autoscaling to meet demand. physical board's properties with near-zero wait time, **so that** I can iterate quickly without waiting for scarce hardware. -- **As a** platform engineer, **I want to** declare a virtual target pool with - `minWarmInstances: 2, maxTotalInstances: 20`, **so that** there are always warm +- **As a** platform engineer, **I want to** declare an `ExporterSet` with + `minAvailableReplicas: 2, maxReplicas: 20`, **so that** there are always warm instances ready while the system scales up on demand and scales down when idle. -- **As a** cost-conscious operator, **I want to** set `minWarmInstances: 0` for - rarely-used target types, **so that** they consume no resources until actually - requested, accepting a cold-start delay. +- **As a** cost-conscious operator, **I want to** set `minAvailableReplicas: 0` + for rarely-used target types, **so that** they consume no resources until + actually requested, accepting a cold-start delay. ## Proposal The proposal introduces **Virtual Scalable Exporters** — a controller-managed pool of virtual target instances with configurable autoscaling. Rather than -treating virtual targets as purely on-demand or purely static, each virtual -target definition declares scaling parameters that let administrators tune the +treating virtual targets as purely on-demand or purely static, each +`ExporterSet` declares scaling parameters that let administrators tune the trade-off between instant availability and resource consumption. -### Core Concept: Managed Pools with Scaling +### Resource Hierarchy + +Virtual scalable exporters are modeled on familiar Kubernetes workload +primitives: + +```text +VirtualTargetClass ←── referenced by ── ExporterSet + │ + ▼ + Exporter ──► Pod + (exporter sidecar + target runtime) + +# API-backed / static / multi-device cases may also use: +VirtualTargetClass ←── referenced by ── *VirtualTarget (typed claim) + ↑ +ExporterSet ──► Exporter ────────────────────┘ +``` + +- **`VirtualTargetClass`** — cluster-scoped configuration for a backend + (`provisioner`, credentials, scheduling, binding mode, device parameters). + Admins own classes; claim authors never touch credentials. +- **`*VirtualTarget`** — optional strongly-typed claim for backends where each + instance has distinct identity (API-backed devices, static benches). Not + required for homogeneous container-backed pools. +- **`ExporterSet`** — generic scaling resource with `selector` + inline + `template`. References a `VirtualTargetClass` (or optionally a `*VirtualTarget` + claim). One mental model for all backends. +- **`Exporter`** — the minimum leased unit. Exposes drivers that connect to the + virtual target provisioned from the class (or claim). -Each provider type has its own CRD (e.g., `QEMUExporterPool`, -`AndroidExporterPool`, `CorelliumExporterPool`) with provider-specific -configuration fields alongside shared scaling parameters. This gives each -provider a strongly-typed schema rather than a generic bag of config. +### Core Concept: ExporterSet with Kubernetes-Native Scaling -**Example: QEMU pool** +`ExporterSet` is a generic CRD (ReplicaSet + HPA analog) with familiar scaling +vocabulary. Provider typing lives in `VirtualTargetClass` and `*VirtualTarget`, +not in the pool CRD itself. + +**Example: VirtualTargetClass (cluster-scoped, StorageClass analog)** ```yaml apiVersion: jumpstarter.dev/v1alpha1 -kind: QEMUExporterPool +kind: VirtualTargetClass metadata: - name: rpi4-virtual - namespace: jumpstarter + name: qemu-rpi4 spec: - # Scaling configuration (shared across all pool CRDs) - minWarmInstances: 2 # Always keep 2 warm instances ready - maxTotalInstances: 20 # Scale up to 20 under load - - # Node scheduling (shared across all pool CRDs, optional) - nodeSelector: - node.kubernetes.io/instance-type: bare-metal - jumpstarter.dev/nested-virt: "true" - - # Labels exposed on each instance (for lease matching) - labels: - board: rpi4 - arch: aarch64 - virtual: "true" - - # Pod overrides (shared across all pool CRDs, optional) - podTemplate: + provisioner: qemu.jumpstarter.dev + bindingMode: Immediate # warm pool; WaitForFirstConsumer = on-demand + reclaimPolicy: Delete + scheduling: # inherited by rendered exporter Pods + nodeSelector: + kubernetes.io/arch: arm64 + tolerations: + - key: jumpstarter.dev/kvm + operator: Exists + effect: NoSchedule resources: - requests: - cpu: "4" - memory: 5Gi limits: - cpu: "4" - memory: 5Gi - - # QEMU-specific configuration - machineType: virt - firmware: registry.example.com/firmware/rpi4:latest - resources: + devices.kubevirt.io/kvm: "1" + parameters: + machineType: virt + firmware: registry.example.com/firmware/rpi4:latest cpu: 4 memory: 4Gi storage: 16Gi - - # Exporter template (drivers exposed by each instance) - exporterTemplate: - drivers: - - type: jumpstarter_driver_power.driver.QemuPower - - type: jumpstarter_driver_network.driver.TcpNetwork - config: - port: 22 - - type: jumpstarter_driver_serial.driver.QemuSerial ``` -**Example: Android Emulator pool** +**Example: QEMUVirtualTarget (optional typed claim)** + +For homogeneous QEMU pools, admins configure `VirtualTargetClass` + `ExporterSet` +only (see *End-to-End Flow*). A per-instance claim is optional — useful for +static benches or when per-instance sizing differs from the class defaults: ```yaml apiVersion: jumpstarter.dev/v1alpha1 -kind: AndroidExporterPool +kind: QEMUVirtualTarget metadata: - name: pixel7-emulator + name: rpi4-target-01 namespace: jumpstarter spec: - minWarmInstances: 0 # Fully on-demand (cold-start OK for this target) - maxTotalInstances: 10 - - labels: - device: pixel7 - os: android - api-level: "34" - virtual: "true" - - # Android-specific configuration - systemImage: system-images;android-34;google_apis;arm64-v8a - avdProfile: pixel_7 - gpu: swiftshader - - exporterTemplate: - drivers: - - type: jumpstarter_driver_android.driver.AdbDriver - - type: jumpstarter_driver_power.driver.EmulatorPower + virtualTargetClassName: qemu-rpi4 + resources: + cpu: 8 # override class default + memory: 8Gi + storage: 32Gi ``` -**Example: Corellium pool** +**Example: ExporterSet (generic scaling resource)** -The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages -the full virtual instance lifecycle through the Corellium REST API — it creates -instances on power-on and destroys them on power-off. It exposes a power -interface and a websocket-based serial console. The pool controller manages the -exporter Pod and injects API credentials via environment variables -(`CORELLIUM_API_HOST`, `CORELLIUM_API_TOKEN`) from the referenced Secret. +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: rpi4-virtual + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 20 + minAvailableReplicas: 2 # PDB-style warm buffer (ready & unleased) + scaleDownCooldown: 5m + recycleStrategy: ExitAndReplace # or InPlaceReuse + virtualTargetClassName: qemu-rpi4 # references VirtualTargetClass above + selector: + matchLabels: + board: rpi4 + template: # embedded template (Deployment idiom) + metadata: + labels: + board: rpi4 + arch: aarch64 + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +status: + replicas: 5 + readyReplicas: 3 + availableReplicas: 1 # warm (ready & unleased) + leasedReplicas: 2 +# scale subresource: specReplicasPath=.spec.maxReplicas +``` + +**Example: Corellium VirtualTargetClass + claim** ```yaml apiVersion: jumpstarter.dev/v1alpha1 -kind: CorelliumExporterPool +kind: VirtualTargetClass +metadata: + name: corellium-kronos +spec: + provisioner: corellium.jumpstarter.dev + credentialsSecretRef: + name: corellium-creds + namespace: jumpstarter + bindingMode: WaitForFirstConsumer # provision on lease + reclaimPolicy: Delete + parameters: + apiHost: app.corellium.com + projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" +--- +apiVersion: jumpstarter.dev/v1alpha1 +kind: CorelliumVirtualTarget metadata: - name: rd1ae-corellium + name: rd1ae-kronos-01 namespace: jumpstarter spec: - minWarmInstances: 1 - maxTotalInstances: 5 - - labels: - board: rd1ae - flavor: kronos - virtual: "true" - - # Corellium-specific configuration - apiHost: app.corellium.com - apiCredentialsSecret: corellium-api-credentials # Secret with keys: token - projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" - - # Device/instance parameters + virtualTargetClassName: corellium-kronos deviceFlavor: kronos deviceOs: "1.1.1" deviceBuild: "Critical Application Monitor (Baremetal)" consoleName: "Primary Compute Non-Secure" - - exporterTemplate: - drivers: - - type: jumpstarter_driver_corellium.driver.Corellium - config: - device_name: "{{ .InstanceName }}" ``` -The pool controller automatically injects the Corellium-specific CRD spec fields -(`projectId`, `deviceFlavor`, `deviceOs`, `deviceBuild`, `consoleName`) into the -driver config at instance creation time. Only fields that vary per instance (like -`device_name` using the `{{ .InstanceName }}` template variable) need to be -specified explicitly in `exporterTemplate`. +The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages +the full virtual instance lifecycle through the Corellium REST API — it creates +instances on power-on and destroys them on power-off. The provisioner injects +API credentials from `VirtualTargetClass.credentialsSecretRef` into the +exporter Pod; claim authors never see credentials. + +**Example: Android ExporterSet** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: pixel7-emulator + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 10 + minAvailableReplicas: 0 # fully on-demand + selector: + matchLabels: + device: pixel7 + template: + metadata: + labels: + device: pixel7 + os: android + api-level: "34" + virtual: "true" + spec: + virtualTargetRef: + apiVersion: jumpstarter.dev/v1alpha1 + kind: AndroidVirtualTarget + name: pixel7-template + drivers: + - type: jumpstarter_driver_android.driver.AdbDriver + - type: jumpstarter_driver_power.driver.EmulatorPower +``` + +An `ExporterSet` with `minAvailableReplicas: 0` consumes no resources until a +lease is requested, accepting cold-start latency. An `ExporterSet` with +`minAvailableReplicas: 3` always has 3 ready-to-lease exporters — leases are +fulfilled instantly from the warm pool, and the controller scales up if more are +needed. + +### Container-Backed Targets: Sidecar Pattern + +For container-backed provisioners (`qemu.jumpstarter.dev`, Android emulator, etc.), +the provisioner renders each instance Pod from independently shipped artifacts: + +```yaml +# rendered by qemu.jumpstarter.dev provisioner +spec: + initContainers: + - name: exporter # native sidecar (starts first, drains last) + restartPolicy: Always + image: quay.io/jumpstarter-dev/exporter:latest + containers: + - name: target-runtime # QEMU/Cuttlefish — independent image + image: quay.io/jumpstarter-dev/qemu-runtime:latest + volumeMounts: + - name: os + mountPath: /os + - name: shared + mountPath: /shared + volumes: + - name: os + image: + reference: registry.example.com/os/rpi4:latest # OS as OCI artifact + - name: shared + emptyDir: {} +``` + +Benefits: + +- **Independent release cadence** — exporter, runtime, and OS image version + independently. +- **Fault isolation** — exporter survives target-runtime crashes and can drain + or report failure. +- **Standard interfaces** — drivers attach over virtio (serial/SPI/CAN/GPIO) or + Unix sockets on shared volumes; same driver code works physical + virtual. +- **Unprivileged Pods** — virtio-backed guests avoid privileged containers when + the host supports it. -A pool with `minWarmInstances: 0` consumes no resources until a lease is -requested, accepting cold-start latency. A pool with `minWarmInstances: 3` -always has 3 ready-to-lease instances — leases are fulfilled instantly from -the warm pool, and the controller scales up if more are needed. +The exporter sidecar communicates with the target-runtime container via Unix +sockets on a shared `emptyDir` volume (QMP for QEMU control, serial console, +launcher socket for dynamic argv). API-backed provisioners (`corellium`, `ec2`) +skip the runtime container and connect out to external APIs. ### User Experience @@ -232,6 +353,8 @@ jmp lease -l board=rpi4 # Lease explicitly virtual targets jmp lease -l board=rpi4,virtual=true +# Prefer ground truth when fidelity matters +jmp lease -l board=rpi4,fidelity=full ``` The guiding principle is: **"Get me a target that matches my requirements."** The @@ -239,6 +362,293 @@ distinction between physical and virtual is an implementation detail, not a primary concern for the user. Virtual exporters simply appear in the same pool as physical ones, differentiated only by labels. +### End-to-End Flow (QEMU Example) + +This section walks through a complete QEMU warm-pool scenario: what each actor +does, which CRDs are involved, and how control passes between components. It +uses the **reference graph** (not a strict ownership tree) for relationships +between resources: + +```text +VirtualTargetClass ←── referenced by ── ExporterSet + │ + ▼ + Exporter ──► Pod + (exporter sidecar + QEMU runtime) +``` + +For **homogeneous QEMU pools** (same CPU/RAM/disk for every replica, no +per-lease parameterization), configuration flows through `VirtualTargetClass` + +`ExporterSet` only. The provisioner materializes Pods from those two resources; +per-instance `QEMUVirtualTarget` claims are **not** required in this case (they +remain useful for API-backed backends, static benches, or future multi-device +exporters — see *Future Possibilities*). + +#### Actors + +| Actor | Component | Responsibility | +| --- | --- | --- | +| **Administrator** | Human / GitOps | Cluster bootstrap, class + set CRs | +| **Jumpstarter operator** | `Jumpstarter` CR | Deploys `jumpstarter-controller`, routers, exporter-set controllers | +| **Exporter-set controller** | `qemu.jumpstarter.dev` Deployment | Reconciles `ExporterSet`, creates Exporters/Pods, scales pool | +| **Jumpstarter controller** | Existing controller | Assigns `Lease` → `Exporter`, unchanged lease semantics | +| **User** | CLI / CI (`jmp lease`, drivers) | Requests leases, flashes images, runs tests | + +#### Phase 0 — Cluster bootstrap (admin, one-time) + +**Admin actions:** + +1. Install Jumpstarter operator (if not already present). +2. Configure the `Jumpstarter` CR with `spec.exporterSets.provisioners` listing + `qemu.jumpstarter.dev` (and any other provisioners). + +**Controller actions:** + +- Operator creates the exporter-set controller Deployment + (`--provisioner=qemu.jumpstarter.dev`). +- Operator ensures `jumpstarter-controller` is running (existing behavior). + +**Result:** Provisioner controller is watching for `ExporterSet` CRs whose +templates reference QEMU virtual targets (via `virtualTargetClassName` or +`*VirtualTarget` claims). + +#### Phase 1 — Define the virtual target profile (admin) + +**Admin actions:** + +1. Create a cluster-scoped `VirtualTargetClass` describing the QEMU backend: + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: qemu-rpi4 +spec: + provisioner: qemu.jumpstarter.dev + bindingMode: Immediate + reclaimPolicy: Delete + scheduling: + nodeSelector: + kubernetes.io/arch: arm64 + tolerations: + - key: jumpstarter.dev/kvm + operator: Exists + effect: NoSchedule + resources: + limits: + devices.kubevirt.io/kvm: "1" + parameters: + machineType: virt + firmware: registry.example.com/firmware/rpi4:latest + cpu: 4 + memory: 4Gi + storage: 16Gi +``` + +2. Create an `ExporterSet` that references the class and declares scaling + + lease-matching labels: + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: rpi4-virtual + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 20 + minAvailableReplicas: 2 + scaleDownCooldown: 5m + recycleStrategy: ExitAndReplace + virtualTargetClassName: qemu-rpi4 + selector: + matchLabels: + board: rpi4 + template: + metadata: + labels: + board: rpi4 + arch: aarch64 + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +``` + +**User actions:** None. + +**Controller actions:** None yet (waiting for `ExporterSet` to be observed). + +#### Phase 2 — Warm pool provisioning (exporter-set controller) + +**Trigger:** `ExporterSet` CR created or updated; `minAvailableReplicas: 2`. + +**Exporter-set controller actions (reconcile loop):** + +1. Read `ExporterSet` spec and referenced `VirtualTargetClass`. +2. Count owned `Exporter` CRs: `replicas`, `readyReplicas`, `leasedReplicas`, + `availableReplicas` (= ready − leased). +3. If `availableReplicas < minAvailableReplicas` and `replicas < maxReplicas`, + scale up by creating new instances. For each new instance: + - Create an `Exporter` CR with labels from `spec.template.metadata` and + drivers from `spec.template.spec`. + - Render a Kubernetes Pod (sidecar pattern): + - **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts + first, registers with `jumpstarter-controller`. + - **QEMU runtime container** — started by provisioner; exporter talks to + it via Unix sockets on a shared `emptyDir` (QMP, serial, launcher). + - Apply scheduling from `VirtualTargetClass.scheduling` to the Pod. + - Apply device parameters from `VirtualTargetClass.parameters` when + constructing the QEMU command line. +4. Update `ExporterSet.status` (`replicas`, `readyReplicas`, `availableReplicas`, + `leasedReplicas`, conditions). + +**Jumpstarter-controller actions:** + +- Accepts exporter registrations from the sidecar processes (existing gRPC flow). +- Marks exporters as available for lease assignment when ready. + +**User actions:** None. + +**Result:** Two warm exporters appear in the pool, labeled `board=rpi4, +virtual=true`. `ExporterSet.status.availableReplicas: 2`. + +```text +ExporterSet rpi4-virtual +├── Exporter rpi4-virtual-aaa [Ready, unleased] → Pod (exporter + QEMU) +└── Exporter rpi4-virtual-bbb [Ready, unleased] → Pod (exporter + QEMU) +``` + +#### Phase 3 — User requests a lease (user + jumpstarter-controller) + +**User actions:** + +```bash +jmp lease -l board=rpi4,virtual=true +``` + +**Jumpstarter-controller actions:** + +1. Create a `Lease` CR with `spec.selector.matchLabels: {board: rpi4, virtual: "true"}`. +2. Scan available `Exporter` CRs matching the selector (enabled, no active + `leaseRef`, ready). +3. Pick one (e.g. `rpi4-virtual-aaa`) and set `Exporter.status.leaseRef` to the + lease name. +4. Return connection details to the user (existing flow). + +**Exporter-set controller actions:** + +- Observes `leasedReplicas` increased, `availableReplicas` decreased. +- If `availableReplicas < minAvailableReplicas`, begins scale-up (create another + instance to refill the warm buffer). +- Does **not** participate in lease assignment. + +**Result:** User holds an active lease on `rpi4-virtual-aaa`. Pool still +maintains warm capacity via background scale-up. + +#### Phase 4 — User session (user + exporter sidecar) + +**User actions** (via leased client — same as physical targets): + +```python +with env() as client: + client.storage.flash("/path/to/image.raw") # write disk image + client.power.on() # boot QEMU via QemuPower driver + client.serial.read() # interact over serial + # ... run tests ... +``` + +**Exporter sidecar actions:** + +- `storage.flash` writes the image to shared storage (or tells QEMU runtime via + QMP/`blockdev-add` in sidecar mode). +- `power.on` sends QEMU start via QMP or launcher socket on shared volume. +- Serial/network drivers proxy to the QEMU runtime container. + +**Controller actions:** None during the session (lease is held). + +#### Phase 5 — Lease release and recycle (user + controllers) + +**User actions:** + +```bash +jmp delete-lease # or lease TTL expires +``` + +**Jumpstarter-controller actions:** + +1. Clear `Exporter.status.leaseRef` on `rpi4-virtual-aaa`. +2. Mark lease as released. + +**Exporter-set controller actions:** + +1. Observe exporter is unleased; update `availableReplicas` / `leasedReplicas`. +2. Apply `recycleStrategy`: + - **ExitAndReplace (default):** exporter sidecar exits after cleanup → Pod + terminates → controller deletes `Exporter` CR → creates a fresh replacement + to maintain `minAvailableReplicas`. + - **InPlaceReuse:** exporter resets QEMU state in place → same Pod returns + to Ready without restart. +3. If `availableReplicas > minAvailableReplicas` for longer than + `scaleDownCooldown`, gracefully scale down an excess replica: + - Set `Exporter.spec.enabled: false` + - Wait until no lease assigned + - Delete Pod + `Exporter` CR + +**Result:** Pool returns to steady state with `minAvailableReplicas` warm, +unleased exporters. + +#### Phase 6 — Demand spike (scale-up under load) + +**Trigger:** Three users (or CI jobs) request leases simultaneously; only one +warm exporter remains. + +**User actions:** Three concurrent `jmp lease -l board=rpi4,virtual=true`. + +**Jumpstarter-controller actions:** + +- Assigns the one available exporter immediately. +- Sets `Pending` condition on the other two leases (existing behavior when no + exporter is available). + +**Exporter-set controller actions:** + +1. Sees pending leases matching `spec.selector` with no available exporters. +2. Scales up: creates new `Exporter` + Pod instances (up to `maxReplicas`). +3. As new exporters register and become ready, jumpstarter-controller assigns + pending leases. + +**Result:** Pool grows to meet demand, then shrinks back after cooldown when +leases are released. + +#### Summary: who touches which CRD + +| CRD | Created by | Observed by | User-visible? | +| --- | --- | --- | --- | +| `Jumpstarter` | Admin | Operator | No | +| `VirtualTargetClass` | Admin | Exporter-set controller | No | +| `ExporterSet` | Admin | Exporter-set controller | No (admin/kubectl) | +| `Exporter` | Exporter-set controller | Jumpstarter-controller, exporter-set controller | Indirectly (via lease) | +| `Lease` | User (via CLI) | Jumpstarter-controller, exporter-set controller | Yes | +| `Pod` | Exporter-set controller | Kubernetes, exporter-set controller | No | + +#### QEMU vs API-backed backends + +The flow above applies to **container-backed** provisioners (`qemu.jumpstarter.dev`). +For **API-backed** backends (e.g. `corellium.jumpstarter.dev`): + +- `VirtualTargetClass` holds `credentialsSecretRef` and API parameters. +- A typed `*VirtualTarget` claim (e.g. `CorelliumVirtualTarget`) may be created + per instance when the backend provisions an external device with its own + lifecycle and identity. +- The exporter Pod is lighter (API client only; no QEMU runtime container). + +The `ExporterSet` + `jumpstarter-controller` lease flow is identical. + ### Architecture Overview ```text @@ -253,8 +663,8 @@ as physical ones, differentiated only by labels. ▼ ┌────────────────────────────────────┐ │ Kubernetes API │ - │ (Lease CRs, Exporter CRs, │ - │ *ExporterPool CRs) │ + │ (Lease, Exporter, ExporterSet, │ + │ VirtualTargetClass, *VirtualTarget)│ └─┬──────────────┬──────────────┬────┘ │ │ │ watches │ watches │ watches │ @@ -262,15 +672,16 @@ as physical ones, differentiated only by labels. Exporters │ Exporters │ Exporters │ │ │ │ ┌─────────────────▼┐ ┌───────────▼──────────┐┌──▼──────────────────────┐ - │ QEMUExporterPool │ │ AndroidExporterPool │ │ CorelliumExporterPool │ - │ Controller │ │ Controller │ │ Controller │ + │ qemu provisioner │ │ android provisioner │ │ corellium provisioner │ + │ (ExporterSet │ │ (ExporterSet │ │ (ExporterSet │ + │ controller) │ │ controller) │ │ controller) │ └────────┬─────────┘ └──────────┬──────────┘ └────────────┬────────────┘ │ │ │ │ manages │ manages │ manages ▼ ▼ ▼ ┌──────────────────┐ ┌───────────────────────┐ ┌────────────────────────┐ │ Warm Pool │ │ Warm Pool │ │ Warm Pool │ - │ [inst1][inst2].. │ │ [inst1][inst2].. │ │ [inst1].. │ + │ [Exporter].. │ │ [Exporter].. │ │ [Exporter].. │ └────────┬─────────┘ └───────────┬───────────┘ └────────────┬───────────┘ │ │ │ └───────────────────────┼──────────────────────────┘ @@ -281,31 +692,29 @@ as physical ones, differentiated only by labels. **Scaling Inputs — Watches on Leases and Exporters:** -Each pool controller watches two key resources to make scaling decisions: +Each `ExporterSet` controller watches two key resources to make scaling decisions: 1. **Leases** — The controller watches for pending Leases whose label selectors - match the pool's labels. Pending leases with no available exporter signal + match the set's selector. Pending leases with no available exporter signal demand and trigger scale-up. -2. **Exporters** — The controller watches the Exporter objects it owns to track - which instances are available (no active lease) vs. occupied (leased). This +2. **Exporters** — The controller watches owned Exporter objects to track which + instances are available (no active lease) vs. occupied (leased). This determines the current pool utilization. Together these inputs feed the scaling logic: if there are pending leases that -match this pool and no available instances to serve them, scale up. If there are -excess idle instances beyond `minWarmInstances` for a sustained period, scale down. +match this set and no available instances to serve them, scale up. If there are +excess idle instances beyond `minAvailableReplicas` for a sustained period, scale +down. -**Per-Provider Deployments (single image by default):** All provider +**Per-Provisioner Deployments (single image by default):** All provisioner controllers are compiled into a single binary. Each Deployment in the cluster -passes a `--provider=` flag to activate the corresponding reconciler. -This gives each provider isolated logs and independent restarts while -maintaining a single image to build and release. The per-provider `image` -override in the operator CR allows administrators to substitute a custom image -for a specific provider (e.g., a third-party provider distributed as its own -image) without affecting other providers. +passes a `--provisioner=` flag to activate the corresponding reconciler +(e.g., `qemu.jumpstarter.dev`). This gives each provisioner isolated logs and +independent restarts while maintaining a single image to build and release. -The Jumpstarter operator deploys pool controllers based on the `Jumpstarter` -CR configuration. A new `exporterPools` section lists which providers to -enable, following the same pattern as `controller` and `routers`: +The Jumpstarter operator deploys provisioner controllers based on the +`Jumpstarter` CR configuration. A new `exporterSets` section lists which +provisioners to enable: ```yaml apiVersion: operator.jumpstarter.dev/v1alpha1 @@ -316,129 +725,120 @@ metadata: spec: # ... existing controller, routers, authentication config ... - # Pool controllers configuration (new) - exporterPools: - # Default image shared by all pool controllers (can be overridden per provider) - image: quay.io/jumpstarter-dev/pool-controller:latest + exporterSets: + image: quay.io/jumpstarter-dev/exporter-set-controller:latest imagePullPolicy: IfNotPresent - - # List of providers to deploy controllers for - providers: - - name: qemu - enabled: true - resources: - requests: - cpu: 100m - memory: 256Mi - - name: android + provisioners: + - name: qemu.jumpstarter.dev enabled: true - resources: - requests: - cpu: 100m - memory: 256Mi - - name: corellium + - name: corellium.jumpstarter.dev enabled: false - # Override the default image for this provider - image: quay.io/jumpstarter-dev/pool-controller-corellium:latest - imagePullPolicy: Always + image: quay.io/jumpstarter-dev/exporter-set-controller-corellium:latest ``` -The operator creates one Deployment per enabled provider, passing -`--provider=` to the shared binary. This gives administrators a single -knob to enable/disable pool controllers, and the operator handles RBAC, -service accounts, and Deployment lifecycle. - -**Scaling Logic:** Each pool controller monitors its instances and scales based -on available (unleased) instances: +**Scaling Logic:** Each `ExporterSet` controller monitors its instances and scales +based on available (unleased) replicas: -- If available instances drop below a threshold (e.g., `minWarmInstances`), scale up. -- If available instances exceed demand for a cooldown period, scale down (never - below `minWarmInstances`). -- Never exceed `maxTotalInstances` (if set; 0 or omitted means no upper bound). +- If `availableReplicas` drops below `minAvailableReplicas`, scale up. +- If `availableReplicas` exceeds demand for a cooldown period, scale down (never + below `minAvailableReplicas`). +- Never exceed `maxReplicas` (if set; 0 or omitted means no upper bound). +- `kubectl scale exporterset/ --replicas=N` works via the `scale` + subresource (`specReplicasPath=.spec.maxReplicas`). **Instance Lifecycle:** -1. Pool controller creates a Pod from the pool spec using provider-specific - templates. -2. The Pod starts the virtual target (e.g., QEMU VM, Android emulator, or - Corellium API call) and runs the Jumpstarter exporter, registering with - the controller like any other exporter. +1. `ExporterSet` controller creates an Exporter + `*VirtualTarget` from the set + template (provisioner renders the Pod). +2. The Pod starts the virtual target (sidecar pattern for container backends, or + API call for external backends) and runs the Jumpstarter exporter, registering + with the controller like any other exporter. 3. The instance becomes available in the pool for lease assignment. -4. When a lease is released, the exporter internally handles cleanup/reset - (this is existing exporter behavior). The instance returns to the available - pool automatically. - +4. When a lease is released, the exporter handles cleanup/reset per + `recycleStrategy`. The instance returns to the available pool or is replaced. ### API / Protocol Changes -**New CRDs: `*ExporterPool` (one per provider type)** +**New CRDs** + +| CRD | Scope | Role | +| --- | --- | --- | +| `VirtualTargetClass` | Cluster | StorageClass analog — provisioner, credentials, scheduling, binding | +| `QEMUVirtualTarget` | Namespaced | Typed claim for QEMU backends | +| `CorelliumVirtualTarget` | Namespaced | Typed claim for Corellium backends | +| `AndroidVirtualTarget` | Namespaced | Typed claim for Android emulator backends | +| `ExporterSet` | Namespaced | Generic scaling resource (ReplicaSet + HPA analog) | -Each provider type defines its own CRD. All share a common scaling spec -(embedded struct in Go) but have provider-specific configuration fields: +**VirtualTargetClass (common fields):** ```yaml -# Common fields shared by all *ExporterPool CRDs: spec: - # Scaling (common) - minWarmInstances: # Minimum available (unleased) instances (default: 0) - maxTotalInstances: # Maximum total instances, warm + leased (0 or omitted = no limit) - scaleDownCooldown: # Wait before scaling down (default: 5m) - recycleStrategy: # "ExitAndReplace" (default) or "InPlaceReuse" - - # Node scheduling (common, optional) - # Applied to instance Pods — use to target baremetal nodes, nodes with - # nested virtualization, GPU nodes, specific architectures, etc. - nodeSelector: + provisioner: # e.g. qemu.jumpstarter.dev + credentialsSecretRef: # optional; for API-backed provisioners + name: + namespace: + parameters: # opaque to orchestration; provisioner-specific : - - # Pod overrides (common, optional) - # Customize the exporter Pod container image and resource requests/limits. - # Providers set sensible defaults; these fields allow administrators to - # override them per pool. - podTemplate: - image: # Override the default exporter container image + bindingMode: Immediate | WaitForFirstConsumer + reclaimPolicy: Delete | Retain + scheduling: # inherited by rendered exporter Pods + nodeSelector: + : + nodeAffinity: { ... } + tolerations: [ ... ] resources: - requests: - cpu: - memory: limits: - cpu: - memory: - - # Labels applied to all instances (common) - labels: - : - - # Exporter driver configuration template (common) - exporterTemplate: - drivers: - - type: - config: { ... } - -# Provider-specific fields differ per CRD: -# - QEMUExporterPool: machineType, firmware, resources (cpu/mem/storage), ... -# - AndroidExporterPool: systemImage, avdProfile, gpu, ... -# - CorelliumExporterPool: apiHost, apiCredentialsSecret, projectId, ... -# (CorelliumExporterPool typically does not use nodeSelector/podTemplate -# since it provisions instances via external API, so local pods connect to the -# corellium api, and the architecture/characteristics of the running node do not -# matter.) + devices.kubevirt.io/kvm: "1" +``` + +**ExporterSet (common fields):** + +```yaml +spec: + minReplicas: # floor (default: 0) + maxReplicas: # ceiling (0 or omitted = no limit) + minAvailableReplicas: # warm buffer: ready & unleased (default: 0) + scaleDownCooldown: # default: 5m + recycleStrategy: ExitAndReplace | InPlaceReuse + selector: + matchLabels: + : + template: + metadata: + labels: { ... } + spec: + virtualTargetRef: { ... } # reference or inline *VirtualTarget spec + drivers: [ ... ] ``` -**Status subresource (common to all pool CRDs):** +**Status subresource (ExporterSet):** ```yaml status: - totalInstances: 5 - readyInstances: 3 - leasedInstances: 2 + replicas: 5 + readyReplicas: 3 + availableReplicas: 1 # warm (ready & unleased) + leasedReplicas: 2 conditions: - - type: PoolHealthy + - type: SetHealthy status: "True" - type: ScalingLimited status: "False" ``` +**Scale subresource:** `specReplicasPath=.spec.maxReplicas` enables +`kubectl scale` and HPA/KEDA interoperability. + +**Pluggable provisioners:** + +```text +VirtualTargetClass.provisioner → + qemu.jumpstarter.dev → k8s container (+ OS OCI image volume) + ec2.jumpstarter.dev → AWS API + corellium.jumpstarter.dev → Corellium REST API +# one typed *VirtualTarget claim interface; backend is pluggable +``` + **Changes to existing CRDs:** **Exporter — new `enabled` field:** @@ -449,7 +849,7 @@ exporter. This is useful for: - **Lab operations:** Temporarily taking a physical exporter offline for maintenance without deleting it. -- **Graceful scale-down:** Pool controllers set `enabled: false` before +- **Graceful scale-down:** `ExporterSet` controllers set `enabled: false` before terminating an instance, ensuring the controller doesn't race to assign a lease to an exporter that is about to be deleted. @@ -464,10 +864,10 @@ spec: The graceful scale-down sequence becomes: -1. Pool controller sets `enabled: false` on the target exporter. -2. Pool controller waits to confirm no lease was assigned (watches for +1. `ExporterSet` controller sets `enabled: false` on the target exporter. +2. Controller waits to confirm no lease was assigned (watches for `status.leaseRef` to remain empty). -3. Pool controller deletes the Pod and Exporter CR. +3. Controller deletes the Pod, Exporter CR, and associated `*VirtualTarget`. ### Hardware Considerations @@ -476,8 +876,8 @@ for scalable testing. However: - Virtual targets must faithfully emulate the interfaces exposed by physical hardware (serial, network, storage, power) through the existing driver model. -- Providers like QEMU/Renode require `/dev/kvm` access for acceptable - performance on the host nodes. +- Container-backed provisioners require `/dev/kvm` or equivalent; scheduling is + expressed on `VirtualTargetClass.scheduling`. - Timing-sensitive tests (USB/IP latency, boot ROM timeouts) may behave differently on virtual targets — the system should expose labels indicating whether a target is physical or virtual so users can filter when fidelity @@ -490,7 +890,7 @@ for scalable testing. However: **Alternatives considered:** 1. **Pool-based with configurable min/max** — Maintain a warm pool of - pre-spawned instances; scale between `minWarmInstances` and `maxTotalInstances`. + pre-spawned instances; scale between `minAvailableReplicas` and `maxReplicas`. 2. **Purely on-demand** — Spawn a new instance only when a lease request arrives; destroy it when the lease is released. @@ -499,140 +899,140 @@ for scalable testing. However: **Rationale:** Purely on-demand provisioning introduces noticeable latency for CI pipelines (Pod scheduling + image pull + VM boot + exporter registration typically takes 10-15s, and up to 60s with cold image pulls or heavy -providers). A warm pool -provides instant lease fulfillment for the common case. Setting `minWarmInstances: 0` -still allows purely on-demand behavior for rarely-used targets, giving operators -full control over the trade-off. +provisioners). A warm pool provides instant lease fulfillment for the common +case. Setting `minAvailableReplicas: 0` still allows purely on-demand behavior +for rarely-used targets. `VirtualTargetClass.bindingMode: WaitForFirstConsumer` +maps to on-demand provisioning; `Immediate` maps to warm pools. -### DD-2: Pool controller deployment model +### DD-2: Provisioner controller deployment model **Alternatives considered:** -1. **Separate binary per provider** — Each provider is a completely independent - binary/image (e.g., `jumpstarter-qemu-pool-controller`). -2. **Single binary, one deployment per provider** — One image contains all - provider reconcilers; a CLI flag (`--provider=qemu`) selects which one to - activate. Each provider gets its own Deployment in the cluster. -3. **Single binary, single deployment** — One Deployment runs all provider - reconcilers together. -4. **Integrated into jumpstarter-controller** — Add pool reconcilers directly - into the existing operator. - -**Decision:** Option 2 — single binary, one Deployment per provider. - -**Rationale:** A single image is cheaper to build, test, and productize — there -is one CI pipeline, one vulnerability scan, one release artifact. Deploying it -as separate Deployments (one per provider) gives operational benefits: each -provider has isolated logs, independent scaling, and can be restarted without -affecting other providers. The `--provider` flag makes it explicit which CRD -a given Deployment reconciles. Adding a new provider type means adding a new -Deployment manifest pointing to the same image with a different flag — no new -image build required. - -### DD-3: CRD per provider vs. generic CRD +1. **Separate binary per provisioner** — Each provisioner is a completely + independent binary/image. +2. **Single binary, one deployment per provisioner** — One image contains all + provisioner reconcilers; a CLI flag (`--provisioner=qemu.jumpstarter.dev`) + selects which one to activate. +3. **Single binary, single deployment** — One Deployment runs all provisioners. +4. **Integrated into jumpstarter-controller** — Add reconcilers directly into + the existing operator. + +**Decision:** Option 2 — single binary, one Deployment per provisioner. + +**Rationale:** A single image is cheaper to build, test, and productize. +Deploying as separate Deployments gives operational benefits: isolated logs, +independent restarts, and explicit `--provisioner` selection. Adding a new +backend means adding a Deployment manifest with a different flag — no new image +build required. + +### DD-3: Pluggable provisioner vs. CRD-per-pool **Alternatives considered:** -1. **CRD per provider** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) — - Strongly typed, schema-validated, provider-specific fields at the top level. -2. **Single generic CRD** (`VirtualExporterPool`) with a `provider.type` field - and opaque `provider.config` map. -3. **Generic CRD + ConfigMap reference** — Pool CRD references a ConfigMap - containing provider-specific configuration. +1. **CRD per provider pool** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) + — provider typing at the pool CRD level. +2. **Generic `ExporterSet` + pluggable `VirtualTargetClass.provisioner`** — + orchestration generic; device backend selected by provisioner string; typed + `*VirtualTarget` claims retain strong typing. +3. **Fully generic opaque config** — single CRD with `provider.config` map. -**Decision:** CRD per provider. +**Decision:** Option 2 — generic `ExporterSet` + pluggable provisioner on +`VirtualTargetClass`, with typed `*VirtualTarget` claims. -**Rationale:** Strongly-typed CRDs give better UX (IDE completion, webhook -validation, clear documentation per provider). Each provider has fundamentally -different configuration (QEMU needs machine types and firmware images; Corellium -needs API credentials and device models) — a generic map loses type safety and -discoverability. New providers add a new CRD without touching existing ones. +**Rationale:** Separating orchestration (scaling, lease matching, graceful +shutdown) from provisioning (QEMU container, Corellium API, EC2) lets each +provisioner implement backend-appropriate scaling logic while exposing an +identical scaling surface (`minReplicas`/`maxReplicas`/`minAvailableReplicas`). +Typed `*VirtualTarget` claims preserve schema validation per provider without +proliferating pool CRDs. New backends add a claim kind + provisioner string, not +pool-tier changes. ### DD-4: Per-lease parameters vs. pool flavors **Alternatives considered:** -1. **Per-lease `parameters` dictionary** — Leases carry an opaque - `map[string]string` that pool controllers interpret when provisioning - instances (e.g., override CPU, memory, or storage). The controller passes - parameters through without interpretation; only pool controllers read them. -2. **Multiple pool flavors** — Administrators create separate pool CRs for - different resource profiles (e.g., `rpi4-virtual-small` with 2 CPU / 2 Gi - and `rpi4-virtual-large` with 8 CPU / 16 Gi). Users select a profile via - label matching at lease time. - -**Decision:** Option 2 — multiple pool flavors via separate pool CRs. - -**Rationale:** Per-lease parameters add complexity across every layer: the Lease -CRD gains a new field, the controller must pass it through, pool controllers -must parse and validate provider-specific keys, driver templates must support -runtime overrides, and the interaction between parameters and pool defaults -(override vs. merge) must be defined and tested. All of this for a use case -that is already satisfied by creating multiple pools with different resource -profiles and letting users select via labels. The pool-flavors approach keeps -the Lease API unchanged, requires no controller modifications, and is -immediately understandable. Per-lease parameters can be revisited in a future -JEP if the pool-flavors model proves insufficient. +1. **Per-lease `parameters` dictionary** — Leases carry opaque hints (CPU, + memory, storage) interpreted by provisioners. +2. **Multiple `ExporterSet` flavors** — Administrators create separate sets for + different resource profiles; users select via label matching. + +**Decision:** Option 2 — multiple set flavors via separate `ExporterSet` CRs. + +**Rationale:** Per-lease parameters add complexity across every layer for a use +case already satisfied by separate sets with different labels and +`VirtualTargetClass` parameters. Per-lease parameters can be revisited in a +future JEP if needed. ### DD-5: Built-in scaling vs. HPA / KEDA **Alternatives considered:** -1. **Built-in scaling logic** — Each pool controller implements its own - reconciliation loop that watches pending Leases and owned Exporters to - make scaling decisions. -2. **Kubernetes HPA** — Use the Horizontal Pod Autoscaler with custom metrics - (e.g., pending lease count) to scale exporter Pods. -3. **KEDA** — Use KEDA's event-driven autoscaler with a custom scaler that - reads Jumpstarter lease state. - -**Decision:** Option 1 — built-in scaling logic. - -**Rationale:** Pool controllers need Jumpstarter-specific knowledge that -generic autoscalers cannot express: label matching between pending Leases and -pool labels, the graceful disable-before-delete sequence (`enabled: false` -> -verify no lease assigned -> delete), awareness of exporter readiness states, -and the `minWarmInstances` invariant. HPA and KEDA operate on numeric metrics -and target averages — they cannot implement the multi-step graceful shutdown or -lease-aware matching without a custom controller wrapping them, which would -negate their simplicity advantage. Exposing pool metrics for HPA/KEDA-driven -scaling is listed in *Future Possibilities* as a complementary option once the -core pool controller is stable. +1. **Built-in scaling logic** — Each provisioner implements lease-aware + reconciliation with a consistent scaling API. +2. **Kubernetes HPA** — Horizontal Pod Autoscaler with custom metrics. +3. **KEDA** — Event-driven autoscaler with a custom Jumpstarter scaler. + +**Decision:** Option 1 — built-in scaling logic with consistent API surface; +HPA/KEDA as complementary via `scale` subresource and exposed metrics. + +**Rationale:** Each provisioner should implement autoscaling appropriate to its +backend (local container churn vs. EC2 quotas vs. external API rate limits). A +single generic autoscaler cannot express lease-aware matching, graceful +disable-before-delete, or `minAvailableReplicas` invariants. However, the +**same scaling vocabulary** (`minReplicas`/`maxReplicas`/`minAvailableReplicas`) +and the `scale` subresource apply across all provisioners — one mental model for +users, backend-specific logic underneath. Pool metrics for HPA/KEDA are listed +in *Future Possibilities*. + +### DD-6: VirtualTargetClass vs. inline credentials + +**Alternatives considered:** + +1. **Inline credentials in every `ExporterSet`** — simple but duplicates secrets + across pools sharing the same backend account. +2. **`VirtualTargetClass` (StorageClass analog)** — cluster-scoped class holds + credentials, parameters, scheduling; claims reference the class. +3. **Separate `ProviderConfig` CRD** — lighter-weight credential sharing without + full class semantics. + +**Decision:** Option 2 — `VirtualTargetClass` with optional future +`ProviderConfig` for multi-account credential reuse. + +**Rationale:** The CSI StorageClass/PVC pattern is well understood by cluster +admins. `bindingMode` and `reclaimPolicy` map naturally to warm-pool vs. +on-demand and expensive external target retention. Credentials never appear on +namespaced claims. ## Design Details ### Reconciliation Loop -Each pool controller runs a continuous reconciliation loop for its CRD, -triggered by changes to the pool CR, owned Exporters, or matching Leases: +Each `ExporterSet` controller runs a continuous reconciliation loop, triggered by +changes to the set CR, owned Exporters, or matching Leases: ```text -for each *ExporterPool CR: +for each ExporterSet CR: ownedExporters = list Exporters owned by this CR - currentInstances = count ownedExporters in Ready state - leasedInstances = count ownedExporters with an active LeaseRef - availableInstances = currentInstances - leasedInstances - pendingLeases = count pending Leases whose labels match this pool's labels + replicas = count ownedExporters in Ready state + leasedReplicas = count ownedExporters with an active LeaseRef + availableReplicas = replicas - leasedReplicas + pendingLeases = count pending Leases matching spec.selector - # Invariant: always maintain minWarmInstances - if currentInstances < spec.minWarmInstances: - scale up to spec.minWarmInstances + # Invariant: maintain minAvailableReplicas warm buffer + if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas: + scale up to restore availableReplicas - # Demand-driven scale-up: pending leases that we could serve - elif pendingLeases > 0 AND currentInstances < spec.maxTotalInstances: - scale up by min(pendingLeases, spec.maxTotalInstances - currentInstances) + # Demand-driven scale-up + elif pendingLeases > 0 AND replicas < spec.maxReplicas: + scale up by min(pendingLeases, spec.maxReplicas - replicas) - # Threshold-based scale-up: available pool running low - elif availableInstances < spec.minWarmInstances AND currentInstances < spec.maxTotalInstances: - scale up (add instances to restore available pool) - - # Scale-down: excess idle instances beyond what we need - elif availableInstances > spec.minWarmInstances AND cooldown elapsed: + # Scale-down: excess idle replicas + elif availableReplicas > spec.minAvailableReplicas AND cooldown elapsed: graceful scale down: 1. set exporter.spec.enabled = false - 2. wait until confirmed no lease was assigned (leaseRef remains empty) - 3. delete Pod and Exporter CR - (never below minWarmInstances) + 2. wait until leaseRef remains empty + 3. delete Pod, Exporter CR, and *VirtualTarget + (never below minAvailableReplicas) ``` ### Instance States @@ -641,44 +1041,38 @@ Each virtual exporter instance transitions through: ```text Provisioning → Ready (warm pool) → Leased → Ready - └→ Terminating → (deleted if available instances>min) + └→ Terminating → (deleted if available>min) ``` -- **Provisioning:** Pod is starting, VM booting, exporter registering. -- **Ready:** Instance is registered and available for lease. -- **Leased:** Instance is assigned to an active lease. +- **Provisioning:** Pod starting, virtual target provisioning, exporter registering. +- **Ready:** Exporter registered and available for lease. +- **Leased:** Exporter assigned to an active lease. - **Terminating:** Instance being deleted (scale-down). ### Component Interaction -1. Administrator creates a `*ExporterPool` CR (e.g., `QEMUExporterPool`). -2. The corresponding pool controller provisions `minWarmInstances` Pods. -3. Each Pod boots the virtual target and runs the Jumpstarter exporter, +1. Administrator creates `VirtualTargetClass` and `ExporterSet` resources. +2. The provisioner controller provisions `minAvailableReplicas` Exporters (each + owning a `*VirtualTarget`). +3. Each instance Pod boots the virtual target and runs the Jumpstarter exporter, registering with the existing `jumpstarter-controller`. -4. Instances appear in the pool as regular exporters with the specified labels. +4. Instances appear as regular exporters with labels from `spec.template.metadata`. 5. Users lease them normally — the existing controller handles assignment. -6. On lease release, the instance is recycled. Two strategies are supported: - - **Exit-and-replace (default):** The exporter exits after cleanup. The - pool controller detects the Pod termination and creates a fresh - replacement, ensuring a clean state between leases. The cold-start - latency is absorbed by the warm pool — replacement instances are - provisioned proactively to maintain `minWarmInstances`. - - **In-place reuse:** The exporter handles internal cleanup (e.g., power - off the VM, reset state) without exiting. The Pod and exporter process - remain running and the instance transitions back to Ready immediately. - Useful when cold-start latency is high and the provider guarantees - clean state after reset. -7. The pool controller continuously monitors pool utilization and scales - accordingly. +6. On lease release, the instance is recycled per `recycleStrategy`: + - **Exit-and-replace (default):** Exporter exits; controller replaces the + instance proactively to maintain `minAvailableReplicas`. + - **In-place reuse:** Exporter resets internal state without exiting; Pod + remains running and transitions back to Ready immediately. +7. The `ExporterSet` controller continuously monitors utilization and scales. ### Failure Modes -- **Pod crash:** Controller detects the failure via Pod status, replaces the - instance, maintains `minWarmInstances` invariant. -- **Resource exhaustion:** Cannot scale beyond cluster capacity; pool stays at - current size, new leases queue as they would for physical targets. -- **Provider startup failure:** Instance marked as failed, controller retries - with backoff, alerts via conditions on the pool status. +- **Pod crash:** Controller detects failure via Pod status, replaces the instance, + maintains `minAvailableReplicas` invariant. +- **Resource exhaustion:** Cannot scale beyond cluster capacity; set stays at + current size, new leases queue as for physical targets. +- **Provisioner startup failure:** Instance marked failed, controller retries with + backoff, alerts via conditions on the set status. - **Scaling storm:** Rate limiting on scale-up prevents creating too many instances simultaneously. @@ -691,121 +1085,99 @@ Unit tests should meet the project test coverage requirements. ### Integration Tests -- End-to-end lease lifecycle with a QEMU provider in a test cluster +- End-to-end lease lifecycle with QEMU provisioner in a test cluster - Mixed physical/virtual lease orchestration -- Provider failure and recovery scenarios +- Provisioner failure and recovery scenarios +- `VirtualTargetClass` credential injection and claim binding ## Acceptance Criteria -- [ ] `QEMUExporterPool` CRD is defined and validated by the operator -- [ ] Pool controller maintains `minWarmInstances` warm instances for each pool CR -- [ ] Pool controller scales up when available pool is depleted (up to - `maxTotalInstances`) -- [ ] Pool controller scales down idle instances after cooldown (never below - `minWarmInstances`) -- [ ] At least one provider (`QEMUExporterPool`) is fully implemented and tested +- [ ] `VirtualTargetClass`, `QEMUVirtualTarget`, and `ExporterSet` CRDs defined +- [ ] `ExporterSet` controller maintains `minAvailableReplicas` warm buffer +- [ ] Controller scales up when available pool is depleted (up to `maxReplicas`) +- [ ] Controller scales down idle replicas after cooldown (never below + `minAvailableReplicas`) +- [ ] QEMU provisioner (`qemu.jumpstarter.dev`) fully implemented and tested - [ ] Virtual instances register as standard exporters and are leasable without changes to the existing lease flow -- [ ] Pod failures are detected and reported in the pool status. -- [ ] A pool with `minWarmInstances: 0` provisions instances only on demand -- [ ] Pool status subresource reports instance counts and health conditions -- [ ] Documentation covers pool CRD configuration and provider setup -- [ ] Shared scaling logic is reusable for new provider CRDs +- [ ] Pod failures detected and reported in `ExporterSet` status +- [ ] An `ExporterSet` with `minAvailableReplicas: 0` provisions on demand only +- [ ] Status subresource reports Deployment-style counters and health conditions +- [ ] `scale` subresource enables `kubectl scale` interoperability +- [ ] Documentation covers `VirtualTargetClass`, `*VirtualTarget`, and + `ExporterSet` configuration ## Graduation Criteria ### Experimental -- `QEMUExporterPool` functional in a development cluster -- Basic pool lifecycle works end-to-end (scale up, lease, release, scale down) +- QEMU provisioner functional in a development cluster +- Basic set lifecycle works end-to-end (scale up, lease, release, scale down) - Community feedback on CRD schema and scaling behavior ### Stable -- At least two provider CRDs implemented (e.g., `QEMUExporterPool` + - `AndroidExporterPool`) +- At least two provisioners implemented (e.g., `qemu.jumpstarter.dev` + + `corellium.jumpstarter.dev`) - Production usage by at least one team for >1 month - Performance benchmarks documented (cold-start latency, scaling responsiveness) -- Provider authoring guide published (how to add a new `*ExporterPool` CRD) +- Provisioner authoring guide published (how to add a new provisioner + claim kind) ## Backward Compatibility - Existing physical-only workflows are unaffected; lease requests without virtual-specific labels continue to work as before. - No changes to the existing gRPC protocol for physical exporters. -- New `*ExporterPool` CRDs are additive. +- New CRDs (`VirtualTargetClass`, `*VirtualTarget`, `ExporterSet`) are additive. - **Exporter `enabled` field:** Defaults to `true`, so all existing Exporters - continue to behave exactly as before. The `jumpstarter-controller` must be - updated to respect this field (skip disabled exporters during lease - assignment). -- Administrators upgrading to a pool-enabled version see no behavior change - until they explicitly deploy a `*ExporterPool` resource. + continue to behave exactly as before. +- Administrators upgrading see no behavior change until they explicitly deploy + `ExporterSet` and `VirtualTargetClass` resources. ## Consequences ### Positive -- **Instant lease fulfillment:** Warm pools eliminate provisioning latency for - virtual targets, making CI pipelines faster and more predictable. -- **Elastic scaling:** Pools grow and shrink with demand, avoiding both - resource waste (idle VMs) and artificial queuing. -- **Unified user experience:** Virtual and physical targets are leased through - the same mechanism — users do not need to learn a separate workflow. -- **Operator control:** `minWarmInstances` / `maxTotalInstances` give administrators a - simple, declarative knob to tune the cost-vs-responsiveness trade-off per - target type. -- **Extensible provider model:** New virtual providers (Renode, Qemu, Corellium, Android, - etc.) can be added by defining a new CRD and reconciler without modifying - the core controller or existing providers. +- **Instant lease fulfillment:** Warm pools eliminate provisioning latency. +- **Elastic scaling:** Sets grow and shrink with demand. +- **Unified user experience:** Virtual and physical targets leased the same way. +- **Kubernetes-native UX:** `minReplicas`/`maxReplicas`/`minAvailableReplicas`, + Deployment-style status, `kubectl scale` — familiar to cluster admins. +- **Pluggable backends:** New provisioners add a claim kind + provisioner string. +- **Credential separation:** `VirtualTargetClass` keeps secrets off namespaced claims. +- **Fidelity ladder:** Same lease flow across sim, cloud virtual, and hardware tiers. ### Negative -- **Increased operator complexity:** Pool controllers, scaling logic, and - per-provider CRDs add operational surface area — more components to deploy, - monitor, and debug. -- **Resource consumption:** Warm pools consume cluster resources even when not - actively leased. Misconfigured `minWarmInstances` can lead to waste. -- **New CRD proliferation:** Each provider type adds a CRD; clusters with - many providers will have many CRDs to manage and version. +- **Increased CRD surface:** `VirtualTargetClass`, typed `*VirtualTarget`, + and `ExporterSet` add more resources to manage than a single pool CRD per provider. +- **Resource consumption:** Warm pools consume cluster resources when idle. +- **Sidecar complexity:** Container-backed provisioners require multi-container + Pod orchestration and shared-volume protocols. ### Risks -- **Scaling storms:** A burst of pending leases could trigger rapid scale-up, - exhausting cluster resources. Rate limiting mitigates this but may delay - lease fulfillment under extreme load. -- **Provider startup reliability:** If a virtual provider frequently fails to - start (e.g., firmware download issues, QEMU misconfiguration), the pool - controller may enter a tight crash-replace loop, consuming resources without - making progress. +- **Scaling storms:** Burst demand could exhaust cluster resources; rate limiting + mitigates but may delay lease fulfillment. +- **Provisioner reliability:** Failed startups can cause crash-replace loops. ## Rejected Alternatives -- **Static fixed-size pools (status quo):** Cannot scale with demand. Operators - must manually adjust pool sizes, leading to either waste or queuing. - -- **External orchestration (Terraform/Ansible):** Pushes complexity to the user, - breaks the single-pane-of-glass experience, and cannot integrate with - Jumpstarter's lease semantics. - -- **Per-lease `parameters` dictionary on the Lease CRD:** Would allow users to - pass provider-specific resource hints (CPU, memory, storage) per lease. - Rejected because it adds complexity to every layer (Lease CRD, controller - pass-through, pool controller parsing, driver template overrides) for a use - case already served by creating separate pools with different resource - profiles. See DD-4. +- **Static fixed-size pools (status quo):** Cannot scale with demand. +- **External orchestration (Terraform/Ansible):** Breaks lease semantics integration. +- **Per-lease `parameters` dictionary:** See DD-4. +- **CRD-per-pool without VirtualTarget separation:** Couples scaling and provider + config; rejected in favor of generic `ExporterSet` + pluggable provisioner. ## Prior Art -- **LAVA (Linaro Automated Validation Architecture):** Supports virtual DUTs via - QEMU but with static configuration; no on-demand scaling. - -- **Crossplane:** A CNCF project for composing cloud infrastructure as - Kubernetes CRDs. While Crossplane shares the CRD-driven provisioning pattern, - it targets general-purpose cloud resource composition and has no awareness of - Jumpstarter's lease semantics, warm pool management, or exporter registration. - Jumpstarter already has its own CRD model (Leases, Exporters) and operator - framework; adopting Crossplane would add a heavyweight dependency without - replacing the pool-specific scaling and lifecycle logic this JEP requires. +- **LAVA:** Virtual DUTs via QEMU with static configuration; no on-demand scaling. +- **Crossplane:** General-purpose cloud composition; no Jumpstarter lease semantics. + Useful reference for external API integration (e.g., Corellium) but does not + replace pool-specific scaling logic. +- **CSI (StorageClass/PVC):** Pattern adopted for `VirtualTargetClass`/`*VirtualTarget`. +- **KubeVirt:** VM orchestration with pre-mounted images; Jumpstarter differs by + flash-at-runtime model and exporter-as-sidecar pattern. ## Unresolved Questions @@ -813,37 +1185,34 @@ Unit tests should meet the project test coverage requirements. ### Resolved -- **Observability (JEP-0013):** Pool controllers and virtual exporter instances - emit metrics and logs using the same mechanisms defined in JEP-0013. - Pool-specific metrics (pool size, available/leased counts, scale-up/down - events) are additional metric series following the same conventions. -- **Lease release detection:** Pool controllers watch Lease objects directly. - When a Lease referencing one of their managed Exporters is deleted or - transitions to a released state, the controller triggers scale-down - evaluation if needed. -- **Scheduled (future-dated) leases:** The existing `jumpstarter-controller` - already supports `Spec.BeginTime` for scheduled leases. The controller does - not attempt to acquire an exporter until `BeginTime` arrives; it requeues - with a delay. Once `BeginTime` passes and no exporter is available, the - controller sets a `Pending` condition (e.g., reason `NotAvailable`). Pool - controllers watch for pending leases with matching labels as their scaling - input, so they naturally do not scale up for future-dated leases until the - controller makes them effective. +- **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. +- **Lease release detection:** Controllers watch Lease objects directly. +- **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated + leases until effective. ## Future Possibilities -The following extensions are explicitly **not** part of this JEP but are -natural follow-ups enabled by the pool infrastructure: - -- **Corellium provider:** A `CorelliumExporterPool` CRD that provisions - virtual instances via the Corellium REST API, with credentials injected - from Kubernetes Secrets. -- **Renode provider:** A `RenodeExporterPool` CRD leveraging JEP-0010's Renode - integration as another virtual provider type. -- **Composite leases:** Linking multiple exporters (potentially from - different pools) into a single logical target — e.g., a QEMU VM paired - with a network emulator as one leasable unit. This would require - multi-exporter lease semantics and coordinated lifecycle management. +The following extensions are explicitly **not** part of this JEP but the model +stays open to them: + +- **Disaggregated/cross-node accelerators** — ARM64 runtime bridged to a remote + GPU via virtio-gpu/RDMA. +- **Separate `ProviderConfig` CRD** — multi-account credential reuse and rotation + referenced by multiple `VirtualTargetClass` resources. +- **Realized-instance CRD (PV analog)** — for static/pre-provisioned devices that + exist outside the dynamic provisioning flow. +- **`ExporterDeployment` rollout tier** — Deployment analog for rolling updates + across pool instances (versioned template changes). +- **Multiple/spawned-on-lease VirtualTargets per Exporter** — composite benches + and multi-device topologies. +- **Universal physical+virtual `Target` abstraction** — single resource type + spanning hardware and virtual backends. +- **Priority selectors / DeviceClass** — ordered label fallback ("prefer hardware, + fall back to QEMU") at lease time. +- **HPA/KEDA metric exposure** — complementary external autoscaling once core + provisioner controllers are stable. +- **Renode provider** — `renode.jumpstarter.dev` provisioner leveraging JEP-0010. +- **Composite leases** — multiple exporters linked into one logical lease. ## Implementation Plan @@ -863,49 +1232,41 @@ Add the `enabled` boolean field to the Exporter CRD and update the - [ ] Unit tests for the filtering logic - [ ] Integration test: disable an exporter, verify it gets no new leases -**Why first:** This is a small, self-contained change that is independently -useful for lab operations (maintenance mode) and is a prerequisite for -graceful scale-down in later phases. - -### Phase 2: Pool controller scaffold and `QEMUExporterPool` CRD +### Phase 2: Core CRDs and QEMU provisioner -Build the pool controller binary with the `--provider` flag, define the -`QEMUExporterPool` CRD, and implement the core reconciliation loop. +Define `VirtualTargetClass`, `QEMUVirtualTarget`, and `ExporterSet` CRDs. +Implement the `qemu.jumpstarter.dev` provisioner with sidecar Pod rendering and +core reconciliation loop. **Deliverables:** -- [ ] Define `QEMUExporterPool` CRD schema (scaling fields, nodeSelector, - podTemplate, labels, QEMU-specific fields, exporterTemplate) -- [ ] Implement the pool controller binary with `--provider=qemu` flag -- [ ] Implement core scaling logic: maintain `minWarmInstances`, scale up when - pool is depleted, graceful scale-down (disable → wait → delete) -- [ ] Instance provisioning: create Pods running the Jumpstarter exporter - with QEMU provider configuration -- [ ] Instance Pods register as standard Exporter CRs -- [ ] Pool status subresource (totalInstances, readyInstances, leasedInstances, - conditions) +- [ ] Define `VirtualTargetClass`, `QEMUVirtualTarget`, `ExporterSet` CRD schemas +- [ ] Implement exporter-set controller binary with `--provisioner=qemu.jumpstarter.dev` +- [ ] Sidecar Pod rendering (exporter native sidecar + QEMU runtime container) +- [ ] Core scaling logic: `minAvailableReplicas`, demand-driven scale-up, graceful + scale-down +- [ ] Deployment-style status + `scale` subresource - [ ] Watch Leases and Exporters for scaling decisions -- [ ] Add `exporterPools` section to the `Jumpstarter` operator CR spec -- [ ] Operator deploys pool controller Deployments based on enabled providers - (RBAC, service accounts, Deployment lifecycle) -- [ ] Unit tests for reconciliation logic -- [ ] Integration test: deploy a `QEMUExporterPool`, verify instances come up, - lease one, release it, observe scale behavior +- [ ] Add `exporterSets` section to `Jumpstarter` operator CR +- [ ] Integration test: deploy `ExporterSet`, lease, release, observe scaling -### Phase 3: Additional providers +### Phase 3: Additional provisioners -Add support for additional provider types using the same binary with different -`--provider` flags. +Add Corellium and Android provisioners using the same binary with different +`--provisioner` flags. **Deliverables:** -- [ ] `AndroidExporterPool` CRD and reconciler -- [ ] Provider authoring guide documenting how to add a new `*ExporterPool` +- [ ] `corellium.jumpstarter.dev` provisioner + `CorelliumVirtualTarget` CRD +- [ ] `android.jumpstarter.dev` provisioner + `AndroidVirtualTarget` CRD +- [ ] Provisioner authoring guide ## Implementation History - 2025-10-30: RFE filed upstream (GitHub #41) - 2026-06-03: JEP proposed +- 2026-06-18: Revised per review — ExporterSet, VirtualTargetClass, pluggable + provisioner model; added end-to-end flow section ## References From 15086cce2b1aabb5967dcdff90a74734edf6c190 Mon Sep 17 00:00:00 2001 From: Miguel Angel Ajo Pelayo Date: Fri, 19 Jun 2026 16:59:04 +0200 Subject: [PATCH 6/6] docs(jep-0014): align with meeting consensus on virtual exporters Simplify to a 2-CRD namespaced model with nested parameters, flash-at-lease semantics, off-cluster provisioning, and a QEMU-first implementation plan. Co-authored-by: Cursor --- .../JEP-0014-virtual-scalable-exporters.md | 655 +++++++++++++----- 1 file changed, 476 insertions(+), 179 deletions(-) diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md index 85ff2e650..a3378e2e2 100644 --- a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -69,7 +69,7 @@ applies regardless of backend: For example, a target that needs GPU or specialized I/O can run functional checks cheaply on a QEMU class in CI, validate higher-fidelity behavior on a cloud-backed virtual device, and use real hardware as ground truth. The -`VirtualTargetClass` / `*VirtualTarget` abstraction makes this ladder explicit +The `VirtualTargetClass` abstraction makes this ladder explicit without changing the lease experience. ### User Stories @@ -109,38 +109,33 @@ VirtualTargetClass ←── referenced by ── ExporterSet ▼ Exporter ──► Pod (exporter sidecar + target runtime) - -# API-backed / static / multi-device cases may also use: -VirtualTargetClass ←── referenced by ── *VirtualTarget (typed claim) - ↑ -ExporterSet ──► Exporter ────────────────────┘ ``` -- **`VirtualTargetClass`** — cluster-scoped configuration for a backend - (`provisioner`, credentials, scheduling, binding mode, device parameters). - Admins own classes; claim authors never touch credentials. -- **`*VirtualTarget`** — optional strongly-typed claim for backends where each - instance has distinct identity (API-backed devices, static benches). Not - required for homogeneous container-backed pools. -- **`ExporterSet`** — generic scaling resource with `selector` + inline - `template`. References a `VirtualTargetClass` (or optionally a `*VirtualTarget` - claim). One mental model for all backends. +- **`VirtualTargetClass`** — **namespaced** configuration for a backend + (`provisioner`, nested `parameters`, credentials, scheduling, binding mode). + Lives in the same namespace as referencing `ExporterSet` resources. Admins own + classes; `ExporterSet` authors never touch credentials. +- **`ExporterSet`** — namespaced generic scaling resource with `selector` + inline + `template`. References a `VirtualTargetClass` by name in the **same + namespace**. Optional nested `parameters` deep-merge over the class defaults. + One mental model for all backends. - **`Exporter`** — the minimum leased unit. Exposes drivers that connect to the - virtual target provisioned from the class (or claim). + virtual target provisioned from the class. ### Core Concept: ExporterSet with Kubernetes-Native Scaling `ExporterSet` is a generic CRD (ReplicaSet + HPA analog) with familiar scaling -vocabulary. Provider typing lives in `VirtualTargetClass` and `*VirtualTarget`, -not in the pool CRD itself. +vocabulary. Provider typing lives in `VirtualTargetClass`, not in the pool CRD +itself. -**Example: VirtualTargetClass (cluster-scoped, StorageClass analog)** +**Example: VirtualTargetClass (namespaced backend profile)** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: qemu-rpi4 + namespace: jumpstarter spec: provisioner: qemu.jumpstarter.dev bindingMode: Immediate # warm pool; WaitForFirstConsumer = on-demand @@ -155,32 +150,15 @@ spec: resources: limits: devices.kubevirt.io/kvm: "1" - parameters: + parameters: # nested object; provisioner interprets machineType: virt - firmware: registry.example.com/firmware/rpi4:latest - cpu: 4 - memory: 4Gi - storage: 16Gi -``` - -**Example: QEMUVirtualTarget (optional typed claim)** - -For homogeneous QEMU pools, admins configure `VirtualTargetClass` + `ExporterSet` -only (see *End-to-End Flow*). A per-instance claim is optional — useful for -static benches or when per-instance sizing differs from the class defaults: - -```yaml -apiVersion: jumpstarter.dev/v1alpha1 -kind: QEMUVirtualTarget -metadata: - name: rpi4-target-01 - namespace: jumpstarter -spec: - virtualTargetClassName: qemu-rpi4 - resources: - cpu: 8 # override class default - memory: 8Gi - storage: 32Gi + firmware: + url: registry.example.com/firmware/rpi4:latest + digest: sha256:abc... + resources: + cpu: 4 + memory: 4Gi + storage: 16Gi ``` **Example: ExporterSet (generic scaling resource)** @@ -197,7 +175,10 @@ spec: minAvailableReplicas: 2 # PDB-style warm buffer (ready & unleased) scaleDownCooldown: 5m recycleStrategy: ExitAndReplace # or InPlaceReuse - virtualTargetClassName: qemu-rpi4 # references VirtualTargetClass above + virtualTargetClassName: qemu-rpi4 # same-namespace VirtualTargetClass name + parameters: # optional; deep-merged over class parameters + resources: + memory: 8Gi # override only memory; cpu/storage inherited selector: matchLabels: board: rpi4 @@ -222,42 +203,37 @@ status: # scale subresource: specReplicasPath=.spec.maxReplicas ``` -**Example: Corellium VirtualTargetClass + claim** +**Example: Corellium VirtualTargetClass** ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: corellium-kronos + namespace: jumpstarter spec: provisioner: corellium.jumpstarter.dev credentialsSecretRef: - name: corellium-creds - namespace: jumpstarter + name: corellium-creds # Secret in same namespace bindingMode: WaitForFirstConsumer # provision on lease reclaimPolicy: Delete parameters: - apiHost: app.corellium.com - projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" ---- -apiVersion: jumpstarter.dev/v1alpha1 -kind: CorelliumVirtualTarget -metadata: - name: rd1ae-kronos-01 - namespace: jumpstarter -spec: - virtualTargetClassName: corellium-kronos - deviceFlavor: kronos - deviceOs: "1.1.1" - deviceBuild: "Critical Application Monitor (Baremetal)" - consoleName: "Primary Compute Non-Secure" + api: + host: app.corellium.com + projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" + device: + flavor: kronos + os: "1.1.1" + build: "Critical Application Monitor (Baremetal)" ``` The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages the full virtual instance lifecycle through the Corellium REST API — it creates -instances on power-on and destroys them on power-off. The provisioner injects -API credentials from `VirtualTargetClass.credentialsSecretRef` into the -exporter Pod; claim authors never see credentials. +instances on power-on and destroys them on power-off. Device parameters live in +`VirtualTargetClass.spec.parameters` and may be overridden per pool via +`ExporterSet.spec.parameters` (deep-merged). The provisioner injects API +credentials from `VirtualTargetClass.credentialsSecretRef` into the exporter +Pod; `ExporterSet` authors never see credentials. **Example: Android ExporterSet** @@ -271,6 +247,7 @@ spec: minReplicas: 0 maxReplicas: 10 minAvailableReplicas: 0 # fully on-demand + virtualTargetClassName: android-pixel7 selector: matchLabels: device: pixel7 @@ -282,10 +259,6 @@ spec: api-level: "34" virtual: "true" spec: - virtualTargetRef: - apiVersion: jumpstarter.dev/v1alpha1 - kind: AndroidVirtualTarget - name: pixel7-template drivers: - type: jumpstarter_driver_android.driver.AdbDriver - type: jumpstarter_driver_power.driver.EmulatorPower @@ -300,7 +273,11 @@ needed. ### Container-Backed Targets: Sidecar Pattern For container-backed provisioners (`qemu.jumpstarter.dev`, Android emulator, etc.), -the provisioner renders each instance Pod from independently shipped artifacts: +the provisioner renders each instance Pod from independently shipped artifacts. +The sketch below uses **native sidecar init containers** (`restartPolicy: Always`, +[KEP-753](https://github.com/kubernetes/enhancements/issues/753)) as the +**proposed** co-location model — **init containers vs. lifecycle hooks** is +unresolved; see *Unresolved Questions*. ```yaml # rendered by qemu.jumpstarter.dev provisioner @@ -339,7 +316,8 @@ Benefits: The exporter sidecar communicates with the target-runtime container via Unix sockets on a shared `emptyDir` volume (QMP for QEMU control, serial console, launcher socket for dynamic argv). API-backed provisioners (`corellium`, `ec2`) -skip the runtime container and connect out to external APIs. +and off-cluster provisioners (`qemu-baremetal.jumpstarter.dev`) skip the +in-cluster runtime container — see *External and Off-Cluster Provisioning*. ### User Experience @@ -364,10 +342,19 @@ as physical ones, differentiated only by labels. ### End-to-End Flow (QEMU Example) -This section walks through a complete QEMU warm-pool scenario: what each actor -does, which CRDs are involved, and how control passes between components. It -uses the **reference graph** (not a strict ownership tree) for relationships -between resources: +This section walks through a complete **in-cluster QEMU warm-pool** scenario: +what each actor does, which CRDs are involved, and how control passes between +components. The flow uses only **two admin-configured CRDs** — no per-instance +claim resources: + +| Admin CRD | Role in this flow | +| --- | --- | +| `VirtualTargetClass` | Backend profile: provisioner, scheduling, nested `parameters` | +| `ExporterSet` | Pool scaling, labels, drivers, optional parameter overrides | + +Everything else (`Exporter`, `Lease`, `Pod`) is created and managed by +controllers at runtime. Relationships use a **reference graph** (not a strict +ownership tree): ```text VirtualTargetClass ←── referenced by ── ExporterSet @@ -377,12 +364,10 @@ VirtualTargetClass ←── referenced by ── ExporterSet (exporter sidecar + QEMU runtime) ``` -For **homogeneous QEMU pools** (same CPU/RAM/disk for every replica, no -per-lease parameterization), configuration flows through `VirtualTargetClass` + -`ExporterSet` only. The provisioner materializes Pods from those two resources; -per-instance `QEMUVirtualTarget` claims are **not** required in this case (they -remain useful for API-backed backends, static benches, or future multi-device -exporters — see *Future Possibilities*). +Homogeneous QEMU pools configure **`VirtualTargetClass` + `ExporterSet` only**. +The provisioner deep-merges parameters, materializes Pods, and registers +`Exporter` CRs. **OS images are not pre-selected by the pool** — lessees flash +and boot what they need after leasing (see Phase 4 and DD-7). #### Actors @@ -409,20 +394,21 @@ exporters — see *Future Possibilities*). - Operator ensures `jumpstarter-controller` is running (existing behavior). **Result:** Provisioner controller is watching for `ExporterSet` CRs whose -templates reference QEMU virtual targets (via `virtualTargetClassName` or -`*VirtualTarget` claims). +`virtualTargetClassName` references a class handled by that provisioner. -#### Phase 1 — Define the virtual target profile (admin) +#### Phase 1 — Define the virtual target profile (admin, two CRs) **Admin actions:** -1. Create a cluster-scoped `VirtualTargetClass` describing the QEMU backend: +1. Create a `VirtualTargetClass` describing the QEMU backend (same namespace as + the `ExporterSet` that will reference it): ```yaml apiVersion: jumpstarter.dev/v1alpha1 kind: VirtualTargetClass metadata: name: qemu-rpi4 + namespace: jumpstarter spec: provisioner: qemu.jumpstarter.dev bindingMode: Immediate @@ -439,14 +425,17 @@ spec: devices.kubevirt.io/kvm: "1" parameters: machineType: virt - firmware: registry.example.com/firmware/rpi4:latest - cpu: 4 - memory: 4Gi - storage: 16Gi + firmware: + url: registry.example.com/firmware/rpi4:latest + digest: sha256:abc... + resources: + cpu: 4 + memory: 4Gi + storage: 16Gi ``` -2. Create an `ExporterSet` that references the class and declares scaling + - lease-matching labels: +2. Create an `ExporterSet` in the **same namespace** that references the class + by name and declares scaling + lease-matching labels: ```yaml apiVersion: jumpstarter.dev/v1alpha1 @@ -461,6 +450,9 @@ spec: scaleDownCooldown: 5m recycleStrategy: ExitAndReplace virtualTargetClassName: qemu-rpi4 + parameters: + resources: + memory: 8Gi selector: matchLabels: board: rpi4 @@ -481,7 +473,8 @@ spec: **User actions:** None. -**Controller actions:** None yet (waiting for `ExporterSet` to be observed). +**Controller actions:** None yet — exporter-set controller waits until +`ExporterSet` exists and resolves `virtualTargetClassName` to the class above. #### Phase 2 — Warm pool provisioning (exporter-set controller) @@ -489,7 +482,9 @@ spec: **Exporter-set controller actions (reconcile loop):** -1. Read `ExporterSet` spec and referenced `VirtualTargetClass`. +1. Resolve `ExporterSet.spec.virtualTargetClassName` to `VirtualTargetClass` + `qemu-rpi4` in the same namespace; compute merged parameters (deep-merge of + class + set overrides). 2. Count owned `Exporter` CRs: `replicas`, `readyReplicas`, `leasedReplicas`, `availableReplicas` (= ready − leased). 3. If `availableReplicas < minAvailableReplicas` and `replicas < maxReplicas`, @@ -499,11 +494,12 @@ spec: - Render a Kubernetes Pod (sidecar pattern): - **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts first, registers with `jumpstarter-controller`. - - **QEMU runtime container** — started by provisioner; exporter talks to - it via Unix sockets on a shared `emptyDir` (QMP, serial, launcher). + - **QEMU runtime container** — baseline virt machine from merged + `parameters` (CPU, memory, firmware blob); **empty disk** ready for + user flash at lease time. + - Exporter talks to runtime via Unix sockets on a shared `emptyDir` (QMP, + serial, launcher). - Apply scheduling from `VirtualTargetClass.scheduling` to the Pod. - - Apply device parameters from `VirtualTargetClass.parameters` when - constructing the QEMU command line. 4. Update `ExporterSet.status` (`replicas`, `readyReplicas`, `availableReplicas`, `leasedReplicas`, conditions). @@ -550,9 +546,13 @@ jmp lease -l board=rpi4,virtual=true **Result:** User holds an active lease on `rpi4-virtual-aaa`. Pool still maintains warm capacity via background scale-up. -#### Phase 4 — User session (user + exporter sidecar) +#### Phase 4 — User session: flash, boot, test (user + exporter sidecar) -**User actions** (via leased client — same as physical targets): +The warm pool provides **instant lease assignment**; image selection happens +**after** lease — same workflow as a physical bench (DD-7). The pool does not +pre-flash an OS onto instances. + +**User actions** (via leased client): ```python with env() as client: @@ -565,7 +565,7 @@ with env() as client: **Exporter sidecar actions:** - `storage.flash` writes the image to shared storage (or tells QEMU runtime via - QMP/`blockdev-add` in sidecar mode). + QMP/`blockdev-add`). - `power.on` sends QEMU start via QMP or launcher socket on shared volume. - Serial/network drivers proxy to the QEMU runtime container. @@ -590,9 +590,10 @@ jmp delete-lease # or lease TTL expires 2. Apply `recycleStrategy`: - **ExitAndReplace (default):** exporter sidecar exits after cleanup → Pod terminates → controller deletes `Exporter` CR → creates a fresh replacement - to maintain `minAvailableReplicas`. + with empty baseline storage to maintain `minAvailableReplicas` (next lessee + flashes again). - **InPlaceReuse:** exporter resets QEMU state in place → same Pod returns - to Ready without restart. + to Ready without restart (lessee may re-flash before next session). 3. If `availableReplicas > minAvailableReplicas` for longer than `scaleDownCooldown`, gracefully scale down an excess replica: - Set `Exporter.spec.enabled: false` @@ -625,29 +626,48 @@ warm exporter remains. **Result:** Pool grows to meet demand, then shrinks back after cooldown when leases are released. -#### Summary: who touches which CRD +#### Summary: CRDs and runtime objects + +**Admin-configured (2 CRDs — the full pool definition):** + +| CRD | Scope | Created by | Observed by | Relationship | +| --- | --- | --- | --- | --- | +| `VirtualTargetClass` | Namespaced | Admin | Exporter-set controller | Referenced by `ExporterSet` (same namespace) | +| `ExporterSet` | Namespaced | Admin | Exporter-set controller | References class; owns runtime objects below | + +**Platform and runtime (created by controllers):** -| CRD | Created by | Observed by | User-visible? | +| Resource | Created by | Observed by | User-visible? | | --- | --- | --- | --- | | `Jumpstarter` | Admin | Operator | No | -| `VirtualTargetClass` | Admin | Exporter-set controller | No | -| `ExporterSet` | Admin | Exporter-set controller | No (admin/kubectl) | | `Exporter` | Exporter-set controller | Jumpstarter-controller, exporter-set controller | Indirectly (via lease) | | `Lease` | User (via CLI) | Jumpstarter-controller, exporter-set controller | Yes | | `Pod` | Exporter-set controller | Kubernetes, exporter-set controller | No | -#### QEMU vs API-backed backends +#### QEMU vs API-backed vs off-cluster backends + +The flow above applies to **in-cluster container-backed** provisioners +(`qemu.jumpstarter.dev`). Other provisioner strings reuse the same +`ExporterSet` + `jumpstarter-controller` lease flow with different placement: + +| Topology | Example provisioner | Where the target runs | +| --- | --- | --- | +| In-cluster container | `qemu.jumpstarter.dev` | Pod on Kubernetes (sidecar + runtime) | +| API-backed cloud | `corellium.jumpstarter.dev` | External SaaS API; lightweight exporter Pod | +| Off-cluster bare metal | `qemu-baremetal.jumpstarter.dev` | QEMU/emulator on lab hosts outside the cluster | -The flow above applies to **container-backed** provisioners (`qemu.jumpstarter.dev`). -For **API-backed** backends (e.g. `corellium.jumpstarter.dev`): +For **API-backed** backends: -- `VirtualTargetClass` holds `credentialsSecretRef` and API parameters. -- A typed `*VirtualTarget` claim (e.g. `CorelliumVirtualTarget`) may be created - per instance when the backend provisions an external device with its own - lifecycle and identity. +- `VirtualTargetClass` holds `credentialsSecretRef` and shared backend + `parameters`. +- Per-pool overrides are expressed via `ExporterSet.spec.parameters` + (deep-merged over the class). - The exporter Pod is lighter (API client only; no QEMU runtime container). -The `ExporterSet` + `jumpstarter-controller` lease flow is identical. +For **off-cluster** backends, see *External and Off-Cluster Provisioning*. + +The `ExporterSet` + `jumpstarter-controller` lease flow is identical for all +topologies. ### Architecture Overview @@ -664,7 +684,7 @@ The `ExporterSet` + `jumpstarter-controller` lease flow is identical. ┌────────────────────────────────────┐ │ Kubernetes API │ │ (Lease, Exporter, ExporterSet, │ - │ VirtualTargetClass, *VirtualTarget)│ + │ VirtualTargetClass) │ └─┬──────────────┬──────────────┬────┘ │ │ │ watches │ watches │ watches │ @@ -748,8 +768,8 @@ based on available (unleased) replicas: **Instance Lifecycle:** -1. `ExporterSet` controller creates an Exporter + `*VirtualTarget` from the set - template (provisioner renders the Pod). +1. `ExporterSet` controller creates an `Exporter` from the set template + (provisioner renders the Pod). 2. The Pod starts the virtual target (sidecar pattern for container backends, or API call for external backends) and runs the Jumpstarter exporter, registering with the controller like any other exporter. @@ -763,22 +783,23 @@ based on available (unleased) replicas: | CRD | Scope | Role | | --- | --- | --- | -| `VirtualTargetClass` | Cluster | StorageClass analog — provisioner, credentials, scheduling, binding | -| `QEMUVirtualTarget` | Namespaced | Typed claim for QEMU backends | -| `CorelliumVirtualTarget` | Namespaced | Typed claim for Corellium backends | -| `AndroidVirtualTarget` | Namespaced | Typed claim for Android emulator backends | +| `VirtualTargetClass` | Namespaced | Backend profile — provisioner, credentials, scheduling, binding, nested `parameters` | | `ExporterSet` | Namespaced | Generic scaling resource (ReplicaSet + HPA analog) | +**Reference rule:** `ExporterSet.spec.virtualTargetClassName` must name a +`VirtualTargetClass` in the **same namespace**. Cross-namespace references are +rejected at admission. `credentialsSecretRef.name` must refer to a Secret in that +same namespace. + **VirtualTargetClass (common fields):** ```yaml spec: provisioner: # e.g. qemu.jumpstarter.dev credentialsSecretRef: # optional; for API-backed provisioners - name: - namespace: - parameters: # opaque to orchestration; provisioner-specific - : + name: # Secret in same namespace as this class + parameters: # nested YAML object; provisioner-specific + : bindingMode: Immediate | WaitForFirstConsumer reclaimPolicy: Delete | Retain scheduling: # inherited by rendered exporter Pods @@ -800,6 +821,9 @@ spec: minAvailableReplicas: # warm buffer: ready & unleased (default: 0) scaleDownCooldown: # default: 5m recycleStrategy: ExitAndReplace | InPlaceReuse + virtualTargetClassName: # VirtualTargetClass name in same namespace + parameters: # optional nested overrides (deep-merged with class) + : selector: matchLabels: : @@ -807,10 +831,63 @@ spec: metadata: labels: { ... } spec: - virtualTargetRef: { ... } # reference or inline *VirtualTarget spec drivers: [ ... ] ``` +### Dictionary-Based Parameters + +Both `VirtualTargetClass` and `ExporterSet` expose a `spec.parameters` field +carrying provisioner-specific configuration as a **nested YAML object** (maps, +lists, and scalars) — not a flat `map[string]string`. This reads like normal +exporter/driver config rather than CSI's intentionally opaque string map. + +**CRD representation:** The field is schemaless at the API level +(`type: object` with `x-kubernetes-preserve-unknown-fields: true`, or +`apiextensionsv1.JSON` in Go). OpenAPI does not validate nested structure at +`kubectl apply` time. + +**Validation:** The active provisioner validates merged parameters during +reconcile and sets `ExporterSet` status conditions on error. Optional future: +`VirtualTargetClass.spec.parametersSchemaRef` pointing to a JSON Schema +ConfigMap per provisioner. + +**Merge semantics:** When provisioning an instance, the controller computes: + +```text +mergedParameters = deepMerge(VirtualTargetClass.spec.parameters, + ExporterSet.spec.parameters) +``` + +- **Maps** merge recursively — set keys override class keys at the same path. +- **Scalars and lists** in `ExporterSet.spec.parameters` replace the class + value at that path entirely (lists are not concatenated). + +**Example:** + +```yaml +# VirtualTargetClass.spec.parameters +resources: + cpu: 4 + memory: 4Gi + storage: 16Gi +firmware: + url: registry.example.com/firmware/rpi4:v1 + digest: sha256:abc... + +# ExporterSet.spec.parameters (override memory only) +resources: + memory: 8Gi + +# mergedParameters passed to provisioner +resources: + cpu: 4 # inherited from class + memory: 8Gi # overridden by set + storage: 16Gi # inherited from class +firmware: # unchanged — set did not specify firmware + url: registry.example.com/firmware/rpi4:v1 + digest: sha256:abc... +``` + **Status subresource (ExporterSet):** ```yaml @@ -833,10 +910,11 @@ status: ```text VirtualTargetClass.provisioner → - qemu.jumpstarter.dev → k8s container (+ OS OCI image volume) - ec2.jumpstarter.dev → AWS API - corellium.jumpstarter.dev → Corellium REST API -# one typed *VirtualTarget claim interface; backend is pluggable + qemu.jumpstarter.dev → k8s Pod (sidecar + runtime container) + qemu-baremetal.jumpstarter.dev → QEMU on off-cluster lab hosts (SSH/API) + ec2.jumpstarter.dev → AWS API + corellium.jumpstarter.dev → Corellium REST API +# backend is pluggable via provisioner string ``` **Changes to existing CRDs:** @@ -867,7 +945,7 @@ The graceful scale-down sequence becomes: 1. `ExporterSet` controller sets `enabled: false` on the target exporter. 2. Controller waits to confirm no lease was assigned (watches for `status.leaseRef` to remain empty). -3. Controller deletes the Pod, Exporter CR, and associated `*VirtualTarget`. +3. Controller deletes the Pod and Exporter CR. ### Hardware Considerations @@ -883,6 +961,119 @@ for scalable testing. However: whether a target is physical or virtual so users can filter when fidelity matters. +### External and Off-Cluster Provisioning + +Provisioners are **not** limited to in-cluster Pods. The same +`VirtualTargetClass` + `ExporterSet` model applies whether the virtual target +runs as a Kubernetes Pod, on a cloud virtual-device API, or on **bare-metal lab +hosts** outside the cluster. `VirtualTargetClass.provisioner` selects the +backend implementation; `credentialsSecretRef` and nested `parameters` carry +everything the provisioner needs to reach remote infrastructure (API tokens, +SSH keys, host lists, board profiles). + +**Design intent:** Scale a **logical pool** of exporters through familiar +`ExporterSet` semantics while placing workloads where fidelity or hardware +requires it — e.g. a high-fidelity automotive emulator that needs bare-metal +KVM, GPU passthrough, or vendor-specific tooling unavailable in the cluster. + +**What stays the same:** + +- Users lease with labels (`jmp lease -l board=sa8295,fidelity=high`) — no + awareness of placement. +- Each pool member registers as a standard `Exporter` CR with + `jumpstarter-controller`. +- Lessees flash and boot images via existing drivers after lease (see DD-7). + +**What differs per provisioner:** + +- **In-cluster (`qemu.jumpstarter.dev`):** exporter-set controller creates Pod + + sidecar; scheduling from `VirtualTargetClass.scheduling`. +- **API-backed (`corellium.jumpstarter.dev`):** exporter Pod is a thin API + client; cloud device lifecycle managed externally. +- **Off-cluster (`qemu-baremetal.jumpstarter.dev`):** exporter-set controller + provisions exporter + QEMU (or vendor emulator) on remote hosts via SSH or a + lab agent API; may run exporter as a local process on the host rather than a + Pod. The controller still owns `Exporter` CRs in the cluster for lease + assignment. + +**Automotive example — Qualcomm reference board on bare metal:** + +An automotive team runs SA8295-class targets on dedicated lab servers for +higher-fidelity behavior than in-cluster QEMU. The cluster hosts +orchestration only; emulators run on the bench network. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: qcom-sa8295-baremetal + namespace: jumpstarter +spec: + provisioner: qemu-baremetal.jumpstarter.dev + credentialsSecretRef: + name: automotive-lab-ssh + bindingMode: Immediate + parameters: + hosts: + - name: bench-01.automotive.example.com + arch: aarch64 + slots: 2 # concurrent instances per host + - name: bench-02.automotive.example.com + arch: aarch64 + slots: 2 + runtime: + binary: /usr/bin/qemu-system-aarch64 + kvm: true + board: + soc: sa8295 +--- +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: qcom-sa8295-hifi + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 4 + minAvailableReplicas: 1 + virtualTargetClassName: qcom-sa8295-baremetal + parameters: + board: + fidelity: high # deep-merged over class board defaults + selector: + matchLabels: + board: sa8295 + fidelity: high + virtual: "true" + template: + metadata: + labels: + board: sa8295 + fidelity: high + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +``` + +**Provisioner actions (off-cluster):** + +1. Read merged `parameters` and `credentialsSecretRef`. +2. Select a host with free capacity (`slots`). +3. Deploy or attach exporter + runtime on the host (SSH, systemd, or lab agent). +4. Create an `Exporter` CR in the cluster with template labels; register with + `jumpstarter-controller`. +5. On scale-down or failure, tear down the remote instance and delete the + `Exporter` CR. + +Physical reference boards on the same lab network can coexist in the pool — +users distinguish them with labels (`virtual=false` vs `virtual=true`) without +changing the lease workflow. + ## Design Decisions ### DD-1: Pool-based scaling vs. purely on-demand provisioning @@ -925,27 +1116,33 @@ independent restarts, and explicit `--provisioner` selection. Adding a new backend means adding a Deployment manifest with a different flag — no new image build required. -### DD-3: Pluggable provisioner vs. CRD-per-pool +### DD-3: Pluggable provisioner vs. CRD-per-pool vs. typed claims **Alternatives considered:** 1. **CRD per provider pool** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) — provider typing at the pool CRD level. -2. **Generic `ExporterSet` + pluggable `VirtualTargetClass.provisioner`** — - orchestration generic; device backend selected by provisioner string; typed - `*VirtualTarget` claims retain strong typing. -3. **Fully generic opaque config** — single CRD with `provider.config` map. +2. **Generic `ExporterSet` + pluggable `VirtualTargetClass.provisioner` + + nested `parameters`** — orchestration generic; backend selected by provisioner + string; device config as nested YAML on class + set (deep-merge). +3. **Typed `*VirtualTarget` CRDs per provider** (`QEMUVirtualTarget`, + `CorelliumVirtualTarget`, etc.) — strong schema per backend, referenced from + `ExporterSet`. +4. **Fully generic opaque config** — single CRD with flat `provider.config` map. **Decision:** Option 2 — generic `ExporterSet` + pluggable provisioner on -`VirtualTargetClass`, with typed `*VirtualTarget` claims. +`VirtualTargetClass` with **dictionary-based nested `parameters`**. Reject +options 1 and 3. **Rationale:** Separating orchestration (scaling, lease matching, graceful -shutdown) from provisioning (QEMU container, Corellium API, EC2) lets each -provisioner implement backend-appropriate scaling logic while exposing an +shutdown) from provisioning (QEMU container, Corellium API, off-cluster hosts) +lets each provisioner implement backend-appropriate scaling while exposing an identical scaling surface (`minReplicas`/`maxReplicas`/`minAvailableReplicas`). -Typed `*VirtualTarget` claims preserve schema validation per provider without -proliferating pool CRDs. New backends add a claim kind + provisioner string, not -pool-tier changes. +Nested `parameters` on `VirtualTargetClass` and optional `ExporterSet` overrides +replace per-provider claim CRDs — homogeneous pools need only two admin CRDs. +Typed `*VirtualTarget` claims add maintenance overhead without benefit when +pools share one backend profile (2026-06 team review). New backends add a +provisioner string and parameter conventions, not pool-tier or claim-kind changes. ### DD-4: Per-lease parameters vs. pool flavors @@ -990,18 +1187,61 @@ in *Future Possibilities*. 1. **Inline credentials in every `ExporterSet`** — simple but duplicates secrets across pools sharing the same backend account. -2. **`VirtualTargetClass` (StorageClass analog)** — cluster-scoped class holds - credentials, parameters, scheduling; claims reference the class. +2. **`VirtualTargetClass` (namespaced backend profile)** — class in the same + namespace as the referencing `ExporterSet` holds credentials, nested + `parameters`, and scheduling; `ExporterSet.spec.virtualTargetClassName` + references the class by local name. 3. **Separate `ProviderConfig` CRD** — lighter-weight credential sharing without full class semantics. -**Decision:** Option 2 — `VirtualTargetClass` with optional future +**Decision:** Option 2 — **namespaced** `VirtualTargetClass` with optional future `ProviderConfig` for multi-account credential reuse. -**Rationale:** The CSI StorageClass/PVC pattern is well understood by cluster -admins. `bindingMode` and `reclaimPolicy` map naturally to warm-pool vs. -on-demand and expensive external target retention. Credentials never appear on -namespaced claims. +**Rationale:** Unlike CSI `StorageClass` (cluster-scoped), `VirtualTargetClass` +is **namespaced** so teams define isolated backend profiles, credentials, and +scheduling per namespace without cluster-admin involvement. `ExporterSet` may +only reference a class in the **same namespace**; `credentialsSecretRef` points +to a Secret in that namespace — credentials never appear on `ExporterSet`. +`bindingMode` and `reclaimPolicy` still map to warm-pool vs. on-demand and +external target retention. The StorageClass/PVC *separation of class and consumer* +is retained; only scope differs. + +### DD-7: Instance TTL and image refresh (deferred) + +**Alternatives considered:** + +1. **`ExporterSet.spec.ttl` with image refresh** — declarative `maxAge`, + `maxIdleAge`, and `imageRefreshPolicy` on the pool CRD; controller recycles + instances and re-pulls container/firmware images to keep warm pools fresh. +2. **Manual / CronJob pool flush** — operators restart pools or delete Pods on a + schedule outside Jumpstarter. +3. **Admin-pinned images in `parameters`** — declare expected OS/firmware refs on + `VirtualTargetClass` / `ExporterSet`; provisioner always boots those images. +4. **User flash at lease time (v1)** — warm pool instances are provisioned with + baseline runtime only; the lessee flashes and boots the image they want via + existing drivers (`storage.flash`, power cycle) — same workflow as physical + targets. +5. **Separate lifecycle controller (future)** — a cross-cutting controller that + periodically visits **physical and virtual** exporters and flashes the + expected image, without virtual-only fields on `ExporterSet`. + +**Decision:** Reject options 1–3 for v1 — **no TTL, image-refresh, or +admin-pinned boot images on `ExporterSet` / `VirtualTargetClass`**. Option 4 +matches current Jumpstarter behavior: users flash and boot what they need after +leasing. Option 5 remains the preferred direction for automated image hygiene +later. + +**Rationale:** Time-based Pod recycle and provisioner-driven image re-pull are +virtual-pool mechanics that **physical exporters do not share**. Physical machines +have no `maxAge`; their OS changes when someone flashes them, not when a pool +controller rotates Pods. Putting TTL or pinned boot images on `ExporterSet` alone +would split the lease experience. In v1, virtual targets in the warm pool behave +like physical benches: the lessee selects and flashes the desired image. A future +**separate lifecycle controller** can watch `Exporter` resources regardless of +origin and apply uniform policies — e.g. periodic flash of a lab-defined expected +image to idle exporters, scheduled maintenance windows — combining long-lived +(non-refreshed) exporter instances with automated image updates when operators +choose to enable them. ## Design Details @@ -1012,6 +1252,7 @@ changes to the set CR, owned Exporters, or matching Leases: ```text for each ExporterSet CR: + mergedParameters = deepMerge(class.parameters, set.parameters) ownedExporters = list Exporters owned by this CR replicas = count ownedExporters in Ready state leasedReplicas = count ownedExporters with an active LeaseRef @@ -1031,7 +1272,7 @@ for each ExporterSet CR: graceful scale down: 1. set exporter.spec.enabled = false 2. wait until leaseRef remains empty - 3. delete Pod, Exporter CR, and *VirtualTarget + 3. delete Pod and Exporter CR (never below minAvailableReplicas) ``` @@ -1047,13 +1288,12 @@ Provisioning → Ready (warm pool) → Leased → Ready - **Provisioning:** Pod starting, virtual target provisioning, exporter registering. - **Ready:** Exporter registered and available for lease. - **Leased:** Exporter assigned to an active lease. -- **Terminating:** Instance being deleted (scale-down). +- **Terminating:** Instance being deleted (scale-down or failure replace). ### Component Interaction 1. Administrator creates `VirtualTargetClass` and `ExporterSet` resources. -2. The provisioner controller provisions `minAvailableReplicas` Exporters (each - owning a `*VirtualTarget`). +2. The provisioner controller provisions `minAvailableReplicas` Exporters. 3. Each instance Pod boots the virtual target and runs the Jumpstarter exporter, registering with the existing `jumpstarter-controller`. 4. Instances appear as regular exporters with labels from `spec.template.metadata`. @@ -1088,11 +1328,12 @@ Unit tests should meet the project test coverage requirements. - End-to-end lease lifecycle with QEMU provisioner in a test cluster - Mixed physical/virtual lease orchestration - Provisioner failure and recovery scenarios -- `VirtualTargetClass` credential injection and claim binding +- Parameter deep-merge and provisioner-side validation +- `VirtualTargetClass` credential injection ## Acceptance Criteria -- [ ] `VirtualTargetClass`, `QEMUVirtualTarget`, and `ExporterSet` CRDs defined +- [ ] `VirtualTargetClass` and `ExporterSet` CRDs defined - [ ] `ExporterSet` controller maintains `minAvailableReplicas` warm buffer - [ ] Controller scales up when available pool is depleted (up to `maxReplicas`) - [ ] Controller scales down idle replicas after cooldown (never below @@ -1104,8 +1345,9 @@ Unit tests should meet the project test coverage requirements. - [ ] An `ExporterSet` with `minAvailableReplicas: 0` provisions on demand only - [ ] Status subresource reports Deployment-style counters and health conditions - [ ] `scale` subresource enables `kubectl scale` interoperability -- [ ] Documentation covers `VirtualTargetClass`, `*VirtualTarget`, and - `ExporterSet` configuration +- [ ] `parameters` deep-merge produces correct merged config for provisioner +- [ ] Provisioner validates merged `parameters` and surfaces errors via conditions +- [ ] Documentation covers `VirtualTargetClass` and `ExporterSet` configuration ## Graduation Criteria @@ -1117,18 +1359,18 @@ Unit tests should meet the project test coverage requirements. ### Stable -- At least two provisioners implemented (e.g., `qemu.jumpstarter.dev` + - `corellium.jumpstarter.dev`) +- QEMU reference provisioner (`qemu.jumpstarter.dev`) production-ready; at least + one additional topology validated (e.g. off-cluster bare metal or API-backed) - Production usage by at least one team for >1 month - Performance benchmarks documented (cold-start latency, scaling responsiveness) -- Provisioner authoring guide published (how to add a new provisioner + claim kind) +- Provisioner authoring guide published (how to add a new provisioner) ## Backward Compatibility - Existing physical-only workflows are unaffected; lease requests without virtual-specific labels continue to work as before. - No changes to the existing gRPC protocol for physical exporters. -- New CRDs (`VirtualTargetClass`, `*VirtualTarget`, `ExporterSet`) are additive. +- New CRDs (`VirtualTargetClass`, `ExporterSet`) are additive. - **Exporter `enabled` field:** Defaults to `true`, so all existing Exporters continue to behave exactly as before. - Administrators upgrading see no behavior change until they explicitly deploy @@ -1143,14 +1385,14 @@ Unit tests should meet the project test coverage requirements. - **Unified user experience:** Virtual and physical targets leased the same way. - **Kubernetes-native UX:** `minReplicas`/`maxReplicas`/`minAvailableReplicas`, Deployment-style status, `kubectl scale` — familiar to cluster admins. -- **Pluggable backends:** New provisioners add a claim kind + provisioner string. -- **Credential separation:** `VirtualTargetClass` keeps secrets off namespaced claims. +- **Pluggable backends:** New provisioners add a provisioner string. +- **Credential separation:** `VirtualTargetClass` keeps secrets off `ExporterSet` resources. - **Fidelity ladder:** Same lease flow across sim, cloud virtual, and hardware tiers. ### Negative -- **Increased CRD surface:** `VirtualTargetClass`, typed `*VirtualTarget`, - and `ExporterSet` add more resources to manage than a single pool CRD per provider. +- **Increased CRD surface:** `VirtualTargetClass` and `ExporterSet` add more + resources to manage than a single pool CRD per provider. - **Resource consumption:** Warm pools consume cluster resources when idle. - **Sidecar complexity:** Container-backed provisioners require multi-container Pod orchestration and shared-volume protocols. @@ -1168,6 +1410,10 @@ Unit tests should meet the project test coverage requirements. - **Per-lease `parameters` dictionary:** See DD-4. - **CRD-per-pool without VirtualTarget separation:** Couples scaling and provider config; rejected in favor of generic `ExporterSet` + pluggable provisioner. +- **Typed `*VirtualTarget` CRDs per provider:** Rejected at 2026-06 team review; + see DD-3. Dictionary `parameters` on class + set suffice for homogeneous pools. +- **`ExporterSet.spec.ttl` and image refresh:** Rejected for v1; see DD-7. Would + create virtual-only lifecycle semantics unlike physical exporters. ## Prior Art @@ -1175,13 +1421,18 @@ Unit tests should meet the project test coverage requirements. - **Crossplane:** General-purpose cloud composition; no Jumpstarter lease semantics. Useful reference for external API integration (e.g., Corellium) but does not replace pool-specific scaling logic. -- **CSI (StorageClass/PVC):** Pattern adopted for `VirtualTargetClass`/`*VirtualTarget`. +- **CSI (StorageClass/PVC):** Class/consumer separation adopted; scope is + namespaced rather than cluster-scoped (see DD-6). - **KubeVirt:** VM orchestration with pre-mounted images; Jumpstarter differs by flash-at-runtime model and exporter-as-sidecar pattern. ## Unresolved Questions - What is the exact scaling algorithm (proportional, step-based, predictive)? +- **Pod initialization for container-backed provisioners:** Native sidecar init + containers (`restartPolicy: Always`, KEP-753) vs. lifecycle hooks vs. other + co-location patterns for exporter + target-runtime. The sidecar sketch in this + JEP is provisional; resolve in the QEMU provisioner implementation PR. ### Resolved @@ -1212,12 +1463,26 @@ stays open to them: - **HPA/KEDA metric exposure** — complementary external autoscaling once core provisioner controllers are stable. - **Renode provider** — `renode.jumpstarter.dev` provisioner leveraging JEP-0010. +- **Additional cloud/container provisioners** — `corellium.jumpstarter.dev`, + `android.jumpstarter.dev`, `ec2.jumpstarter.dev` (no typed claim CRDs). - **Composite leases** — multiple exporters linked into one logical lease. +- **Cross-cutting lifecycle controller** — periodic flash of lab-defined expected + images to idle **physical and virtual** exporters (see DD-7); long-lived pool + instances combined with optional automated image updates, not virtual-only TTL + on `ExporterSet`. ## Implementation Plan The implementation is broken into phases. Each phase delivers a usable -increment and can be merged independently. +increment and can be merged independently. **v1 focuses on the QEMU reference +implementation**; additional provisioners and lifecycle automation are deferred. + +| Phase | Scope | Status | +| --- | --- | --- | +| 1 | Exporter `enabled` field | Near-term | +| 2 | `VirtualTargetClass` + `ExporterSet` CRDs; nested `parameters`; `qemu.jumpstarter.dev` | Near-term (v1) | +| 3 | External/off-cluster provisioning (`qemu-baremetal.jumpstarter.dev`) | Near-term | +| 4+ | Lifecycle controller, Corellium/Android, etc. | Deferred — see *Future phases* | ### Phase 1: Exporter `enabled` field @@ -1232,41 +1497,73 @@ Add the `enabled` boolean field to the Exporter CRD and update the - [ ] Unit tests for the filtering logic - [ ] Integration test: disable an exporter, verify it gets no new leases -### Phase 2: Core CRDs and QEMU provisioner +### Phase 2: Core CRDs and QEMU reference provisioner -Define `VirtualTargetClass`, `QEMUVirtualTarget`, and `ExporterSet` CRDs. -Implement the `qemu.jumpstarter.dev` provisioner with sidecar Pod rendering and -core reconciliation loop. +Define namespaced `VirtualTargetClass` and `ExporterSet` CRDs. Implement +**only** the `qemu.jumpstarter.dev` in-cluster provisioner — the reference +implementation for the 2-CRD model, parameter deep-merge, warm pool, and +flash-at-lease workflow (DD-7). **Deliverables:** -- [ ] Define `VirtualTargetClass`, `QEMUVirtualTarget`, `ExporterSet` CRD schemas +- [ ] Define `VirtualTargetClass` and `ExporterSet` CRD schemas (namespaced; + nested `parameters` with schemaless object fields; same-namespace reference + rule) +- [ ] Implement parameter deep-merge and provisioner-side validation - [ ] Implement exporter-set controller binary with `--provisioner=qemu.jumpstarter.dev` -- [ ] Sidecar Pod rendering (exporter native sidecar + QEMU runtime container) +- [ ] Sidecar Pod rendering (provisional init-container model — see Unresolved + Questions) - [ ] Core scaling logic: `minAvailableReplicas`, demand-driven scale-up, graceful scale-down - [ ] Deployment-style status + `scale` subresource - [ ] Watch Leases and Exporters for scaling decisions - [ ] Add `exporterSets` section to `Jumpstarter` operator CR -- [ ] Integration test: deploy `ExporterSet`, lease, release, observe scaling +- [ ] Integration test: deploy `ExporterSet`, lease, flash, boot, release, + observe scaling -### Phase 3: Additional provisioners +### Phase 3: External / off-cluster provisioning -Add Corellium and Android provisioners using the same binary with different -`--provisioner` flags. +Extend the exporter-set controller with an off-cluster QEMU provisioner to +validate the pluggable backend model beyond in-cluster Pods. Documents and +implements the flow in *External and Off-Cluster Provisioning*. **Deliverables:** -- [ ] `corellium.jumpstarter.dev` provisioner + `CorelliumVirtualTarget` CRD -- [ ] `android.jumpstarter.dev` provisioner + `AndroidVirtualTarget` CRD +- [ ] `qemu-baremetal.jumpstarter.dev` provisioner (or equivalent off-cluster + stub) using the same binary with `--provisioner=qemu-baremetal.jumpstarter.dev` +- [ ] Remote host selection, SSH/agent deploy, and `Exporter` CR registration from + off-cluster instances +- [ ] Example `VirtualTargetClass` + `ExporterSet` manifests for lab bare-metal + (automotive profile) +- [ ] Integration test or documented manual test plan for off-cluster scale-up + and lease + +### Future phases (deferred) + +The following are **explicitly out of v1** scope. They reuse the same +`VirtualTargetClass` + `ExporterSet` CRDs and nested `parameters` — no typed +claim CRDs. + +**Additional provisioners** + +- [ ] `corellium.jumpstarter.dev` — API-backed cloud virtual devices +- [ ] `android.jumpstarter.dev` — in-cluster Android emulator pools +- [ ] `ec2.jumpstarter.dev` — AWS-backed targets - [ ] Provisioner authoring guide +**Cross-cutting lifecycle controller (DD-7)** + +- [ ] Separate controller for periodic flash / maintenance on **physical and + virtual** exporters — not `ExporterSet.spec.ttl` + ## Implementation History - 2025-10-30: RFE filed upstream (GitHub #41) - 2026-06-03: JEP proposed - 2026-06-18: Revised per review — ExporterSet, VirtualTargetClass, pluggable provisioner model; added end-to-end flow section +- 2026-06-18: Team review — dictionary `parameters`, removed typed VirtualTarget + CRDs, namespaced `VirtualTargetClass`, deferred TTL (DD-7) ## References