diff --git a/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md new file mode 100644 index 000000000..a3378e2e2 --- /dev/null +++ b/python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md @@ -0,0 +1,1578 @@ +# JEP-0014: Virtual Scalable Exporters + +| Field | Value | +| ----------------- | -------------------------------------------------------------- | +| **JEP** | 0014 | +| **Title** | Virtual Scalable Exporters | +| **Author(s)** | @mangelajo (Miguel Angel Ajo Pelayo) | +| **Status** | Draft | +| **Type** | Standards Track | +| **Created** | 2026-06-03 | +| **Updated** | 2026-06-18 | +| **Discussion** | https://github.com/jumpstarter-dev/jumpstarter/issues/41 | +| **Requires** | | +| **Supersedes** | | +| **Superseded-By** | | + +--- + +## Abstract + +This JEP proposes a Virtual Scalable Exporter subsystem for Jumpstarter that +manages pools of virtual targets with configurable autoscaling. Conceptually, +the system scales **virtual targets**; the **Exporter** is the scheduling and +leasing unit (the Pod analog). Each `ExporterSet` declares scaling bounds using +familiar Kubernetes vocabulary (`minReplicas`, `maxReplicas`, +`minAvailableReplicas`); the controller maintains a warm pool of ready exporters +to absorb the 10-60s cold-start latency of VM boot and exporter registration. +This enables low-latency lease acquisition, massive scalability, resource +efficiency, and simplified orchestration of mixed physical/virtual test +topologies — while allowing administrators to tune the trade-off between +responsiveness and resource consumption on a per-target basis. + +## Motivation + +Jumpstarter currently excels at managing scarce, physical hardware targets. +However, testing and development often require a mix of physical devices and +scalable, virtual resources. Today, virtual targets must be manually deployed +as static exporters with a fixed count — there is no mechanism for the system +to maintain or scale a pool of virtual instances based on demand. + +This model has several limitations: + +- **Artificial scarcity:** Virtual targets are treated as a fixed-size pool, + just like physical ones, which defeats their "virtually unlimited" potential. +- **No elasticity:** The pool cannot grow when demand spikes (CI burst) or + shrink when idle, leading to either queuing or waste. +- **Manual lifecycle:** Administrators must manually deploy, monitor, and scale + virtual exporter instances — there is no declarative "desired state" for a + virtual target pool. +- **Cold-start penalty vs. waste trade-off:** Users must choose between + pre-spawning many instances (wasting resources when idle) or spawning on + demand (high latency at lease time). There is no middle ground. + +The core problem is that virtual targets lack a pool manager that can maintain a +configurable warm pool while autoscaling to meet demand. + +### Fidelity / Cost Ladder + +One logical target can be served by multiple backends at different fidelity and +cost tiers. Users select via labels through `jmp lease`; the same workflow +applies regardless of backend: + +| class (provisioner) | fidelity | scale/cost | role | +| --- | --- | --- | --- | +| container sim (`qemu.jumpstarter.dev`) | low | cheap / CI-scale | functional checks | +| cloud virtual device (`corellium.jumpstarter.dev`) | high | metered | higher-fidelity behavior | +| real hardware (Exporter) | full | scarce | ground truth | + +For example, a target that needs GPU or specialized I/O can run functional +checks cheaply on a QEMU class in CI, validate higher-fidelity behavior on a +cloud-backed virtual device, and use real hardware as ground truth. The +The `VirtualTargetClass` abstraction makes this ladder explicit +without changing the lease experience. + +### User Stories + +- **As a** CI pipeline author, **I want to** lease N virtual targets instantly + from a warm pool, **so that** my pipeline doesn't block on provisioning + latency during burst periods. + +- **As a** developer, **I want to** lease a virtual target matching a known + physical board's properties with near-zero wait time, **so that** I can + iterate quickly without waiting for scarce hardware. + +- **As a** platform engineer, **I want to** declare an `ExporterSet` with + `minAvailableReplicas: 2, maxReplicas: 20`, **so that** there are always warm + instances ready while the system scales up on demand and scales down when idle. + +- **As a** cost-conscious operator, **I want to** set `minAvailableReplicas: 0` + for rarely-used target types, **so that** they consume no resources until + actually requested, accepting a cold-start delay. + +## Proposal + +The proposal introduces **Virtual Scalable Exporters** — a controller-managed +pool of virtual target instances with configurable autoscaling. Rather than +treating virtual targets as purely on-demand or purely static, each +`ExporterSet` declares scaling parameters that let administrators tune the +trade-off between instant availability and resource consumption. + +### Resource Hierarchy + +Virtual scalable exporters are modeled on familiar Kubernetes workload +primitives: + +```text +VirtualTargetClass ←── referenced by ── ExporterSet + │ + ▼ + Exporter ──► Pod + (exporter sidecar + target runtime) +``` + +- **`VirtualTargetClass`** — **namespaced** configuration for a backend + (`provisioner`, nested `parameters`, credentials, scheduling, binding mode). + Lives in the same namespace as referencing `ExporterSet` resources. Admins own + classes; `ExporterSet` authors never touch credentials. +- **`ExporterSet`** — namespaced generic scaling resource with `selector` + inline + `template`. References a `VirtualTargetClass` by name in the **same + namespace**. Optional nested `parameters` deep-merge over the class defaults. + One mental model for all backends. +- **`Exporter`** — the minimum leased unit. Exposes drivers that connect to the + virtual target provisioned from the class. + +### Core Concept: ExporterSet with Kubernetes-Native Scaling + +`ExporterSet` is a generic CRD (ReplicaSet + HPA analog) with familiar scaling +vocabulary. Provider typing lives in `VirtualTargetClass`, not in the pool CRD +itself. + +**Example: VirtualTargetClass (namespaced backend profile)** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: qemu-rpi4 + namespace: jumpstarter +spec: + provisioner: qemu.jumpstarter.dev + bindingMode: Immediate # warm pool; WaitForFirstConsumer = on-demand + reclaimPolicy: Delete + scheduling: # inherited by rendered exporter Pods + nodeSelector: + kubernetes.io/arch: arm64 + tolerations: + - key: jumpstarter.dev/kvm + operator: Exists + effect: NoSchedule + resources: + limits: + devices.kubevirt.io/kvm: "1" + parameters: # nested object; provisioner interprets + machineType: virt + firmware: + url: registry.example.com/firmware/rpi4:latest + digest: sha256:abc... + resources: + cpu: 4 + memory: 4Gi + storage: 16Gi +``` + +**Example: ExporterSet (generic scaling resource)** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: rpi4-virtual + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 20 + minAvailableReplicas: 2 # PDB-style warm buffer (ready & unleased) + scaleDownCooldown: 5m + recycleStrategy: ExitAndReplace # or InPlaceReuse + virtualTargetClassName: qemu-rpi4 # same-namespace VirtualTargetClass name + parameters: # optional; deep-merged over class parameters + resources: + memory: 8Gi # override only memory; cpu/storage inherited + selector: + matchLabels: + board: rpi4 + template: # embedded template (Deployment idiom) + metadata: + labels: + board: rpi4 + arch: aarch64 + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +status: + replicas: 5 + readyReplicas: 3 + availableReplicas: 1 # warm (ready & unleased) + leasedReplicas: 2 +# scale subresource: specReplicasPath=.spec.maxReplicas +``` + +**Example: Corellium VirtualTargetClass** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: corellium-kronos + namespace: jumpstarter +spec: + provisioner: corellium.jumpstarter.dev + credentialsSecretRef: + name: corellium-creds # Secret in same namespace + bindingMode: WaitForFirstConsumer # provision on lease + reclaimPolicy: Delete + parameters: + api: + host: app.corellium.com + projectId: "778f00af-5e9b-40e6-8e7f-c4f14b632e9c" + device: + flavor: kronos + os: "1.1.1" + build: "Critical Application Monitor (Baremetal)" +``` + +The Corellium driver (`jumpstarter_driver_corellium.driver.Corellium`) manages +the full virtual instance lifecycle through the Corellium REST API — it creates +instances on power-on and destroys them on power-off. Device parameters live in +`VirtualTargetClass.spec.parameters` and may be overridden per pool via +`ExporterSet.spec.parameters` (deep-merged). The provisioner injects API +credentials from `VirtualTargetClass.credentialsSecretRef` into the exporter +Pod; `ExporterSet` authors never see credentials. + +**Example: Android ExporterSet** + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: pixel7-emulator + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 10 + minAvailableReplicas: 0 # fully on-demand + virtualTargetClassName: android-pixel7 + selector: + matchLabels: + device: pixel7 + template: + metadata: + labels: + device: pixel7 + os: android + api-level: "34" + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_android.driver.AdbDriver + - type: jumpstarter_driver_power.driver.EmulatorPower +``` + +An `ExporterSet` with `minAvailableReplicas: 0` consumes no resources until a +lease is requested, accepting cold-start latency. An `ExporterSet` with +`minAvailableReplicas: 3` always has 3 ready-to-lease exporters — leases are +fulfilled instantly from the warm pool, and the controller scales up if more are +needed. + +### Container-Backed Targets: Sidecar Pattern + +For container-backed provisioners (`qemu.jumpstarter.dev`, Android emulator, etc.), +the provisioner renders each instance Pod from independently shipped artifacts. +The sketch below uses **native sidecar init containers** (`restartPolicy: Always`, +[KEP-753](https://github.com/kubernetes/enhancements/issues/753)) as the +**proposed** co-location model — **init containers vs. lifecycle hooks** is +unresolved; see *Unresolved Questions*. + +```yaml +# rendered by qemu.jumpstarter.dev provisioner +spec: + initContainers: + - name: exporter # native sidecar (starts first, drains last) + restartPolicy: Always + image: quay.io/jumpstarter-dev/exporter:latest + containers: + - name: target-runtime # QEMU/Cuttlefish — independent image + image: quay.io/jumpstarter-dev/qemu-runtime:latest + volumeMounts: + - name: os + mountPath: /os + - name: shared + mountPath: /shared + volumes: + - name: os + image: + reference: registry.example.com/os/rpi4:latest # OS as OCI artifact + - name: shared + emptyDir: {} +``` + +Benefits: + +- **Independent release cadence** — exporter, runtime, and OS image version + independently. +- **Fault isolation** — exporter survives target-runtime crashes and can drain + or report failure. +- **Standard interfaces** — drivers attach over virtio (serial/SPI/CAN/GPIO) or + Unix sockets on shared volumes; same driver code works physical + virtual. +- **Unprivileged Pods** — virtio-backed guests avoid privileged containers when + the host supports it. + +The exporter sidecar communicates with the target-runtime container via Unix +sockets on a shared `emptyDir` volume (QMP for QEMU control, serial console, +launcher socket for dynamic argv). API-backed provisioners (`corellium`, `ec2`) +and off-cluster provisioners (`qemu-baremetal.jumpstarter.dev`) skip the +in-cluster runtime container — see *External and Off-Cluster Provisioning*. + +### User Experience + +From the user's perspective, virtual scalable exporters appear as regular +exporters in the pool. The lease experience is unchanged: + +```bash +# Lease any rpi4 target — may match physical or virtual +jmp lease -l board=rpi4 + +# Lease explicitly virtual targets +jmp lease -l board=rpi4,virtual=true + +# Prefer ground truth when fidelity matters +jmp lease -l board=rpi4,fidelity=full +``` + +The guiding principle is: **"Get me a target that matches my requirements."** The +distinction between physical and virtual is an implementation detail, not a +primary concern for the user. Virtual exporters simply appear in the same pool +as physical ones, differentiated only by labels. + +### End-to-End Flow (QEMU Example) + +This section walks through a complete **in-cluster QEMU warm-pool** scenario: +what each actor does, which CRDs are involved, and how control passes between +components. The flow uses only **two admin-configured CRDs** — no per-instance +claim resources: + +| Admin CRD | Role in this flow | +| --- | --- | +| `VirtualTargetClass` | Backend profile: provisioner, scheduling, nested `parameters` | +| `ExporterSet` | Pool scaling, labels, drivers, optional parameter overrides | + +Everything else (`Exporter`, `Lease`, `Pod`) is created and managed by +controllers at runtime. Relationships use a **reference graph** (not a strict +ownership tree): + +```text +VirtualTargetClass ←── referenced by ── ExporterSet + │ + ▼ + Exporter ──► Pod + (exporter sidecar + QEMU runtime) +``` + +Homogeneous QEMU pools configure **`VirtualTargetClass` + `ExporterSet` only**. +The provisioner deep-merges parameters, materializes Pods, and registers +`Exporter` CRs. **OS images are not pre-selected by the pool** — lessees flash +and boot what they need after leasing (see Phase 4 and DD-7). + +#### Actors + +| Actor | Component | Responsibility | +| --- | --- | --- | +| **Administrator** | Human / GitOps | Cluster bootstrap, class + set CRs | +| **Jumpstarter operator** | `Jumpstarter` CR | Deploys `jumpstarter-controller`, routers, exporter-set controllers | +| **Exporter-set controller** | `qemu.jumpstarter.dev` Deployment | Reconciles `ExporterSet`, creates Exporters/Pods, scales pool | +| **Jumpstarter controller** | Existing controller | Assigns `Lease` → `Exporter`, unchanged lease semantics | +| **User** | CLI / CI (`jmp lease`, drivers) | Requests leases, flashes images, runs tests | + +#### Phase 0 — Cluster bootstrap (admin, one-time) + +**Admin actions:** + +1. Install Jumpstarter operator (if not already present). +2. Configure the `Jumpstarter` CR with `spec.exporterSets.provisioners` listing + `qemu.jumpstarter.dev` (and any other provisioners). + +**Controller actions:** + +- Operator creates the exporter-set controller Deployment + (`--provisioner=qemu.jumpstarter.dev`). +- Operator ensures `jumpstarter-controller` is running (existing behavior). + +**Result:** Provisioner controller is watching for `ExporterSet` CRs whose +`virtualTargetClassName` references a class handled by that provisioner. + +#### Phase 1 — Define the virtual target profile (admin, two CRs) + +**Admin actions:** + +1. Create a `VirtualTargetClass` describing the QEMU backend (same namespace as + the `ExporterSet` that will reference it): + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: qemu-rpi4 + namespace: jumpstarter +spec: + provisioner: qemu.jumpstarter.dev + bindingMode: Immediate + reclaimPolicy: Delete + scheduling: + nodeSelector: + kubernetes.io/arch: arm64 + tolerations: + - key: jumpstarter.dev/kvm + operator: Exists + effect: NoSchedule + resources: + limits: + devices.kubevirt.io/kvm: "1" + parameters: + machineType: virt + firmware: + url: registry.example.com/firmware/rpi4:latest + digest: sha256:abc... + resources: + cpu: 4 + memory: 4Gi + storage: 16Gi +``` + +2. Create an `ExporterSet` in the **same namespace** that references the class + by name and declares scaling + lease-matching labels: + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: rpi4-virtual + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 20 + minAvailableReplicas: 2 + scaleDownCooldown: 5m + recycleStrategy: ExitAndReplace + virtualTargetClassName: qemu-rpi4 + parameters: + resources: + memory: 8Gi + selector: + matchLabels: + board: rpi4 + template: + metadata: + labels: + board: rpi4 + arch: aarch64 + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +``` + +**User actions:** None. + +**Controller actions:** None yet — exporter-set controller waits until +`ExporterSet` exists and resolves `virtualTargetClassName` to the class above. + +#### Phase 2 — Warm pool provisioning (exporter-set controller) + +**Trigger:** `ExporterSet` CR created or updated; `minAvailableReplicas: 2`. + +**Exporter-set controller actions (reconcile loop):** + +1. Resolve `ExporterSet.spec.virtualTargetClassName` to `VirtualTargetClass` + `qemu-rpi4` in the same namespace; compute merged parameters (deep-merge of + class + set overrides). +2. Count owned `Exporter` CRs: `replicas`, `readyReplicas`, `leasedReplicas`, + `availableReplicas` (= ready − leased). +3. If `availableReplicas < minAvailableReplicas` and `replicas < maxReplicas`, + scale up by creating new instances. For each new instance: + - Create an `Exporter` CR with labels from `spec.template.metadata` and + drivers from `spec.template.spec`. + - Render a Kubernetes Pod (sidecar pattern): + - **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts + first, registers with `jumpstarter-controller`. + - **QEMU runtime container** — baseline virt machine from merged + `parameters` (CPU, memory, firmware blob); **empty disk** ready for + user flash at lease time. + - Exporter talks to runtime via Unix sockets on a shared `emptyDir` (QMP, + serial, launcher). + - Apply scheduling from `VirtualTargetClass.scheduling` to the Pod. +4. Update `ExporterSet.status` (`replicas`, `readyReplicas`, `availableReplicas`, + `leasedReplicas`, conditions). + +**Jumpstarter-controller actions:** + +- Accepts exporter registrations from the sidecar processes (existing gRPC flow). +- Marks exporters as available for lease assignment when ready. + +**User actions:** None. + +**Result:** Two warm exporters appear in the pool, labeled `board=rpi4, +virtual=true`. `ExporterSet.status.availableReplicas: 2`. + +```text +ExporterSet rpi4-virtual +├── Exporter rpi4-virtual-aaa [Ready, unleased] → Pod (exporter + QEMU) +└── Exporter rpi4-virtual-bbb [Ready, unleased] → Pod (exporter + QEMU) +``` + +#### Phase 3 — User requests a lease (user + jumpstarter-controller) + +**User actions:** + +```bash +jmp lease -l board=rpi4,virtual=true +``` + +**Jumpstarter-controller actions:** + +1. Create a `Lease` CR with `spec.selector.matchLabels: {board: rpi4, virtual: "true"}`. +2. Scan available `Exporter` CRs matching the selector (enabled, no active + `leaseRef`, ready). +3. Pick one (e.g. `rpi4-virtual-aaa`) and set `Exporter.status.leaseRef` to the + lease name. +4. Return connection details to the user (existing flow). + +**Exporter-set controller actions:** + +- Observes `leasedReplicas` increased, `availableReplicas` decreased. +- If `availableReplicas < minAvailableReplicas`, begins scale-up (create another + instance to refill the warm buffer). +- Does **not** participate in lease assignment. + +**Result:** User holds an active lease on `rpi4-virtual-aaa`. Pool still +maintains warm capacity via background scale-up. + +#### Phase 4 — User session: flash, boot, test (user + exporter sidecar) + +The warm pool provides **instant lease assignment**; image selection happens +**after** lease — same workflow as a physical bench (DD-7). The pool does not +pre-flash an OS onto instances. + +**User actions** (via leased client): + +```python +with env() as client: + client.storage.flash("/path/to/image.raw") # write disk image + client.power.on() # boot QEMU via QemuPower driver + client.serial.read() # interact over serial + # ... run tests ... +``` + +**Exporter sidecar actions:** + +- `storage.flash` writes the image to shared storage (or tells QEMU runtime via + QMP/`blockdev-add`). +- `power.on` sends QEMU start via QMP or launcher socket on shared volume. +- Serial/network drivers proxy to the QEMU runtime container. + +**Controller actions:** None during the session (lease is held). + +#### Phase 5 — Lease release and recycle (user + controllers) + +**User actions:** + +```bash +jmp delete-lease # or lease TTL expires +``` + +**Jumpstarter-controller actions:** + +1. Clear `Exporter.status.leaseRef` on `rpi4-virtual-aaa`. +2. Mark lease as released. + +**Exporter-set controller actions:** + +1. Observe exporter is unleased; update `availableReplicas` / `leasedReplicas`. +2. Apply `recycleStrategy`: + - **ExitAndReplace (default):** exporter sidecar exits after cleanup → Pod + terminates → controller deletes `Exporter` CR → creates a fresh replacement + with empty baseline storage to maintain `minAvailableReplicas` (next lessee + flashes again). + - **InPlaceReuse:** exporter resets QEMU state in place → same Pod returns + to Ready without restart (lessee may re-flash before next session). +3. If `availableReplicas > minAvailableReplicas` for longer than + `scaleDownCooldown`, gracefully scale down an excess replica: + - Set `Exporter.spec.enabled: false` + - Wait until no lease assigned + - Delete Pod + `Exporter` CR + +**Result:** Pool returns to steady state with `minAvailableReplicas` warm, +unleased exporters. + +#### Phase 6 — Demand spike (scale-up under load) + +**Trigger:** Three users (or CI jobs) request leases simultaneously; only one +warm exporter remains. + +**User actions:** Three concurrent `jmp lease -l board=rpi4,virtual=true`. + +**Jumpstarter-controller actions:** + +- Assigns the one available exporter immediately. +- Sets `Pending` condition on the other two leases (existing behavior when no + exporter is available). + +**Exporter-set controller actions:** + +1. Sees pending leases matching `spec.selector` with no available exporters. +2. Scales up: creates new `Exporter` + Pod instances (up to `maxReplicas`). +3. As new exporters register and become ready, jumpstarter-controller assigns + pending leases. + +**Result:** Pool grows to meet demand, then shrinks back after cooldown when +leases are released. + +#### Summary: CRDs and runtime objects + +**Admin-configured (2 CRDs — the full pool definition):** + +| CRD | Scope | Created by | Observed by | Relationship | +| --- | --- | --- | --- | --- | +| `VirtualTargetClass` | Namespaced | Admin | Exporter-set controller | Referenced by `ExporterSet` (same namespace) | +| `ExporterSet` | Namespaced | Admin | Exporter-set controller | References class; owns runtime objects below | + +**Platform and runtime (created by controllers):** + +| Resource | Created by | Observed by | User-visible? | +| --- | --- | --- | --- | +| `Jumpstarter` | Admin | Operator | No | +| `Exporter` | Exporter-set controller | Jumpstarter-controller, exporter-set controller | Indirectly (via lease) | +| `Lease` | User (via CLI) | Jumpstarter-controller, exporter-set controller | Yes | +| `Pod` | Exporter-set controller | Kubernetes, exporter-set controller | No | + +#### QEMU vs API-backed vs off-cluster backends + +The flow above applies to **in-cluster container-backed** provisioners +(`qemu.jumpstarter.dev`). Other provisioner strings reuse the same +`ExporterSet` + `jumpstarter-controller` lease flow with different placement: + +| Topology | Example provisioner | Where the target runs | +| --- | --- | --- | +| In-cluster container | `qemu.jumpstarter.dev` | Pod on Kubernetes (sidecar + runtime) | +| API-backed cloud | `corellium.jumpstarter.dev` | External SaaS API; lightweight exporter Pod | +| Off-cluster bare metal | `qemu-baremetal.jumpstarter.dev` | QEMU/emulator on lab hosts outside the cluster | + +For **API-backed** backends: + +- `VirtualTargetClass` holds `credentialsSecretRef` and shared backend + `parameters`. +- Per-pool overrides are expressed via `ExporterSet.spec.parameters` + (deep-merged over the class). +- The exporter Pod is lighter (API client only; no QEMU runtime container). + +For **off-cluster** backends, see *External and Off-Cluster Provisioning*. + +The `ExporterSet` + `jumpstarter-controller` lease flow is identical for all +topologies. + +### Architecture Overview + +```text + ┌─────────────────────────┐ + │ jumpstarter-controller │ + │ (creates Leases, │ + │ assigns Exporters) │ + └──────────┬──────────────┘ + │ + creates/updates Lease & Exporter objects + │ + ▼ + ┌────────────────────────────────────┐ + │ Kubernetes API │ + │ (Lease, Exporter, ExporterSet, │ + │ VirtualTargetClass) │ + └─┬──────────────┬──────────────┬────┘ + │ │ │ + watches │ watches │ watches │ + Leases + │ Leases + │ Leases + │ + Exporters │ Exporters │ Exporters │ + │ │ │ + ┌─────────────────▼┐ ┌───────────▼──────────┐┌──▼──────────────────────┐ + │ qemu provisioner │ │ android provisioner │ │ corellium provisioner │ + │ (ExporterSet │ │ (ExporterSet │ │ (ExporterSet │ + │ controller) │ │ controller) │ │ controller) │ + └────────┬─────────┘ └──────────┬──────────┘ └────────────┬────────────┘ + │ │ │ + │ manages │ manages │ manages + ▼ ▼ ▼ + ┌──────────────────┐ ┌───────────────────────┐ ┌────────────────────────┐ + │ Warm Pool │ │ Warm Pool │ │ Warm Pool │ + │ [Exporter].. │ │ [Exporter].. │ │ [Exporter].. │ + └────────┬─────────┘ └───────────┬───────────┘ └────────────┬───────────┘ + │ │ │ + └───────────────────────┼──────────────────────────┘ + │ register as standard Exporter CRs + ▼ + Kubernetes API (Exporters) +``` + +**Scaling Inputs — Watches on Leases and Exporters:** + +Each `ExporterSet` controller watches two key resources to make scaling decisions: + +1. **Leases** — The controller watches for pending Leases whose label selectors + match the set's selector. Pending leases with no available exporter signal + demand and trigger scale-up. +2. **Exporters** — The controller watches owned Exporter objects to track which + instances are available (no active lease) vs. occupied (leased). This + determines the current pool utilization. + +Together these inputs feed the scaling logic: if there are pending leases that +match this set and no available instances to serve them, scale up. If there are +excess idle instances beyond `minAvailableReplicas` for a sustained period, scale +down. + +**Per-Provisioner Deployments (single image by default):** All provisioner +controllers are compiled into a single binary. Each Deployment in the cluster +passes a `--provisioner=` flag to activate the corresponding reconciler +(e.g., `qemu.jumpstarter.dev`). This gives each provisioner isolated logs and +independent restarts while maintaining a single image to build and release. + +The Jumpstarter operator deploys provisioner controllers based on the +`Jumpstarter` CR configuration. A new `exporterSets` section lists which +provisioners to enable: + +```yaml +apiVersion: operator.jumpstarter.dev/v1alpha1 +kind: Jumpstarter +metadata: + name: jumpstarter + namespace: jumpstarter +spec: + # ... existing controller, routers, authentication config ... + + exporterSets: + image: quay.io/jumpstarter-dev/exporter-set-controller:latest + imagePullPolicy: IfNotPresent + provisioners: + - name: qemu.jumpstarter.dev + enabled: true + - name: corellium.jumpstarter.dev + enabled: false + image: quay.io/jumpstarter-dev/exporter-set-controller-corellium:latest +``` + +**Scaling Logic:** Each `ExporterSet` controller monitors its instances and scales +based on available (unleased) replicas: + +- If `availableReplicas` drops below `minAvailableReplicas`, scale up. +- If `availableReplicas` exceeds demand for a cooldown period, scale down (never + below `minAvailableReplicas`). +- Never exceed `maxReplicas` (if set; 0 or omitted means no upper bound). +- `kubectl scale exporterset/ --replicas=N` works via the `scale` + subresource (`specReplicasPath=.spec.maxReplicas`). + +**Instance Lifecycle:** + +1. `ExporterSet` controller creates an `Exporter` from the set template + (provisioner renders the Pod). +2. The Pod starts the virtual target (sidecar pattern for container backends, or + API call for external backends) and runs the Jumpstarter exporter, registering + with the controller like any other exporter. +3. The instance becomes available in the pool for lease assignment. +4. When a lease is released, the exporter handles cleanup/reset per + `recycleStrategy`. The instance returns to the available pool or is replaced. + +### API / Protocol Changes + +**New CRDs** + +| CRD | Scope | Role | +| --- | --- | --- | +| `VirtualTargetClass` | Namespaced | Backend profile — provisioner, credentials, scheduling, binding, nested `parameters` | +| `ExporterSet` | Namespaced | Generic scaling resource (ReplicaSet + HPA analog) | + +**Reference rule:** `ExporterSet.spec.virtualTargetClassName` must name a +`VirtualTargetClass` in the **same namespace**. Cross-namespace references are +rejected at admission. `credentialsSecretRef.name` must refer to a Secret in that +same namespace. + +**VirtualTargetClass (common fields):** + +```yaml +spec: + provisioner: # e.g. qemu.jumpstarter.dev + credentialsSecretRef: # optional; for API-backed provisioners + name: # Secret in same namespace as this class + parameters: # nested YAML object; provisioner-specific + : + bindingMode: Immediate | WaitForFirstConsumer + reclaimPolicy: Delete | Retain + scheduling: # inherited by rendered exporter Pods + nodeSelector: + : + nodeAffinity: { ... } + tolerations: [ ... ] + resources: + limits: + devices.kubevirt.io/kvm: "1" +``` + +**ExporterSet (common fields):** + +```yaml +spec: + minReplicas: # floor (default: 0) + maxReplicas: # ceiling (0 or omitted = no limit) + minAvailableReplicas: # warm buffer: ready & unleased (default: 0) + scaleDownCooldown: # default: 5m + recycleStrategy: ExitAndReplace | InPlaceReuse + virtualTargetClassName: # VirtualTargetClass name in same namespace + parameters: # optional nested overrides (deep-merged with class) + : + selector: + matchLabels: + : + template: + metadata: + labels: { ... } + spec: + drivers: [ ... ] +``` + +### Dictionary-Based Parameters + +Both `VirtualTargetClass` and `ExporterSet` expose a `spec.parameters` field +carrying provisioner-specific configuration as a **nested YAML object** (maps, +lists, and scalars) — not a flat `map[string]string`. This reads like normal +exporter/driver config rather than CSI's intentionally opaque string map. + +**CRD representation:** The field is schemaless at the API level +(`type: object` with `x-kubernetes-preserve-unknown-fields: true`, or +`apiextensionsv1.JSON` in Go). OpenAPI does not validate nested structure at +`kubectl apply` time. + +**Validation:** The active provisioner validates merged parameters during +reconcile and sets `ExporterSet` status conditions on error. Optional future: +`VirtualTargetClass.spec.parametersSchemaRef` pointing to a JSON Schema +ConfigMap per provisioner. + +**Merge semantics:** When provisioning an instance, the controller computes: + +```text +mergedParameters = deepMerge(VirtualTargetClass.spec.parameters, + ExporterSet.spec.parameters) +``` + +- **Maps** merge recursively — set keys override class keys at the same path. +- **Scalars and lists** in `ExporterSet.spec.parameters` replace the class + value at that path entirely (lists are not concatenated). + +**Example:** + +```yaml +# VirtualTargetClass.spec.parameters +resources: + cpu: 4 + memory: 4Gi + storage: 16Gi +firmware: + url: registry.example.com/firmware/rpi4:v1 + digest: sha256:abc... + +# ExporterSet.spec.parameters (override memory only) +resources: + memory: 8Gi + +# mergedParameters passed to provisioner +resources: + cpu: 4 # inherited from class + memory: 8Gi # overridden by set + storage: 16Gi # inherited from class +firmware: # unchanged — set did not specify firmware + url: registry.example.com/firmware/rpi4:v1 + digest: sha256:abc... +``` + +**Status subresource (ExporterSet):** + +```yaml +status: + replicas: 5 + readyReplicas: 3 + availableReplicas: 1 # warm (ready & unleased) + leasedReplicas: 2 + conditions: + - type: SetHealthy + status: "True" + - type: ScalingLimited + status: "False" +``` + +**Scale subresource:** `specReplicasPath=.spec.maxReplicas` enables +`kubectl scale` and HPA/KEDA interoperability. + +**Pluggable provisioners:** + +```text +VirtualTargetClass.provisioner → + qemu.jumpstarter.dev → k8s Pod (sidecar + runtime container) + qemu-baremetal.jumpstarter.dev → QEMU on off-cluster lab hosts (SSH/API) + ec2.jumpstarter.dev → AWS API + corellium.jumpstarter.dev → Corellium REST API +# backend is pluggable via provisioner string +``` + +**Changes to existing CRDs:** + +**Exporter — new `enabled` field:** + +Exporters gain an `enabled` boolean field (default: `true`). When set to +`false`, the `jumpstarter-controller` will not assign new leases to this +exporter. This is useful for: + +- **Lab operations:** Temporarily taking a physical exporter offline for + maintenance without deleting it. +- **Graceful scale-down:** `ExporterSet` controllers set `enabled: false` before + terminating an instance, ensuring the controller doesn't race to assign a + lease to an exporter that is about to be deleted. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: Exporter +metadata: + name: qemu-rpi4-instance-3 +spec: + enabled: false # Controller will not assign new leases to this exporter +``` + +The graceful scale-down sequence becomes: + +1. `ExporterSet` controller sets `enabled: false` on the target exporter. +2. Controller waits to confirm no lease was assigned (watches for + `status.leaseRef` to remain empty). +3. Controller deletes the Pod and Exporter CR. + +### Hardware Considerations + +This proposal is specifically designed to reduce reliance on physical hardware +for scalable testing. However: + +- Virtual targets must faithfully emulate the interfaces exposed by physical + hardware (serial, network, storage, power) through the existing driver model. +- Container-backed provisioners require `/dev/kvm` or equivalent; scheduling is + expressed on `VirtualTargetClass.scheduling`. +- Timing-sensitive tests (USB/IP latency, boot ROM timeouts) may behave + differently on virtual targets — the system should expose labels indicating + whether a target is physical or virtual so users can filter when fidelity + matters. + +### External and Off-Cluster Provisioning + +Provisioners are **not** limited to in-cluster Pods. The same +`VirtualTargetClass` + `ExporterSet` model applies whether the virtual target +runs as a Kubernetes Pod, on a cloud virtual-device API, or on **bare-metal lab +hosts** outside the cluster. `VirtualTargetClass.provisioner` selects the +backend implementation; `credentialsSecretRef` and nested `parameters` carry +everything the provisioner needs to reach remote infrastructure (API tokens, +SSH keys, host lists, board profiles). + +**Design intent:** Scale a **logical pool** of exporters through familiar +`ExporterSet` semantics while placing workloads where fidelity or hardware +requires it — e.g. a high-fidelity automotive emulator that needs bare-metal +KVM, GPU passthrough, or vendor-specific tooling unavailable in the cluster. + +**What stays the same:** + +- Users lease with labels (`jmp lease -l board=sa8295,fidelity=high`) — no + awareness of placement. +- Each pool member registers as a standard `Exporter` CR with + `jumpstarter-controller`. +- Lessees flash and boot images via existing drivers after lease (see DD-7). + +**What differs per provisioner:** + +- **In-cluster (`qemu.jumpstarter.dev`):** exporter-set controller creates Pod + + sidecar; scheduling from `VirtualTargetClass.scheduling`. +- **API-backed (`corellium.jumpstarter.dev`):** exporter Pod is a thin API + client; cloud device lifecycle managed externally. +- **Off-cluster (`qemu-baremetal.jumpstarter.dev`):** exporter-set controller + provisions exporter + QEMU (or vendor emulator) on remote hosts via SSH or a + lab agent API; may run exporter as a local process on the host rather than a + Pod. The controller still owns `Exporter` CRs in the cluster for lease + assignment. + +**Automotive example — Qualcomm reference board on bare metal:** + +An automotive team runs SA8295-class targets on dedicated lab servers for +higher-fidelity behavior than in-cluster QEMU. The cluster hosts +orchestration only; emulators run on the bench network. + +```yaml +apiVersion: jumpstarter.dev/v1alpha1 +kind: VirtualTargetClass +metadata: + name: qcom-sa8295-baremetal + namespace: jumpstarter +spec: + provisioner: qemu-baremetal.jumpstarter.dev + credentialsSecretRef: + name: automotive-lab-ssh + bindingMode: Immediate + parameters: + hosts: + - name: bench-01.automotive.example.com + arch: aarch64 + slots: 2 # concurrent instances per host + - name: bench-02.automotive.example.com + arch: aarch64 + slots: 2 + runtime: + binary: /usr/bin/qemu-system-aarch64 + kvm: true + board: + soc: sa8295 +--- +apiVersion: jumpstarter.dev/v1alpha1 +kind: ExporterSet +metadata: + name: qcom-sa8295-hifi + namespace: jumpstarter +spec: + minReplicas: 0 + maxReplicas: 4 + minAvailableReplicas: 1 + virtualTargetClassName: qcom-sa8295-baremetal + parameters: + board: + fidelity: high # deep-merged over class board defaults + selector: + matchLabels: + board: sa8295 + fidelity: high + virtual: "true" + template: + metadata: + labels: + board: sa8295 + fidelity: high + virtual: "true" + spec: + drivers: + - type: jumpstarter_driver_power.driver.QemuPower + - type: jumpstarter_driver_network.driver.TcpNetwork + config: + port: 22 + - type: jumpstarter_driver_serial.driver.QemuSerial +``` + +**Provisioner actions (off-cluster):** + +1. Read merged `parameters` and `credentialsSecretRef`. +2. Select a host with free capacity (`slots`). +3. Deploy or attach exporter + runtime on the host (SSH, systemd, or lab agent). +4. Create an `Exporter` CR in the cluster with template labels; register with + `jumpstarter-controller`. +5. On scale-down or failure, tear down the remote instance and delete the + `Exporter` CR. + +Physical reference boards on the same lab network can coexist in the pool — +users distinguish them with labels (`virtual=false` vs `virtual=true`) without +changing the lease workflow. + +## Design Decisions + +### DD-1: Pool-based scaling vs. purely on-demand provisioning + +**Alternatives considered:** + +1. **Pool-based with configurable min/max** — Maintain a warm pool of + pre-spawned instances; scale between `minAvailableReplicas` and `maxReplicas`. +2. **Purely on-demand** — Spawn a new instance only when a lease request arrives; + destroy it when the lease is released. + +**Decision:** Pool-based with configurable min/max. + +**Rationale:** Purely on-demand provisioning introduces noticeable latency for +CI pipelines (Pod scheduling + image pull + VM boot + exporter registration +typically takes 10-15s, and up to 60s with cold image pulls or heavy +provisioners). A warm pool provides instant lease fulfillment for the common +case. Setting `minAvailableReplicas: 0` still allows purely on-demand behavior +for rarely-used targets. `VirtualTargetClass.bindingMode: WaitForFirstConsumer` +maps to on-demand provisioning; `Immediate` maps to warm pools. + +### DD-2: Provisioner controller deployment model + +**Alternatives considered:** + +1. **Separate binary per provisioner** — Each provisioner is a completely + independent binary/image. +2. **Single binary, one deployment per provisioner** — One image contains all + provisioner reconcilers; a CLI flag (`--provisioner=qemu.jumpstarter.dev`) + selects which one to activate. +3. **Single binary, single deployment** — One Deployment runs all provisioners. +4. **Integrated into jumpstarter-controller** — Add reconcilers directly into + the existing operator. + +**Decision:** Option 2 — single binary, one Deployment per provisioner. + +**Rationale:** A single image is cheaper to build, test, and productize. +Deploying as separate Deployments gives operational benefits: isolated logs, +independent restarts, and explicit `--provisioner` selection. Adding a new +backend means adding a Deployment manifest with a different flag — no new image +build required. + +### DD-3: Pluggable provisioner vs. CRD-per-pool vs. typed claims + +**Alternatives considered:** + +1. **CRD per provider pool** (`QEMUExporterPool`, `AndroidExporterPool`, etc.) + — provider typing at the pool CRD level. +2. **Generic `ExporterSet` + pluggable `VirtualTargetClass.provisioner` + + nested `parameters`** — orchestration generic; backend selected by provisioner + string; device config as nested YAML on class + set (deep-merge). +3. **Typed `*VirtualTarget` CRDs per provider** (`QEMUVirtualTarget`, + `CorelliumVirtualTarget`, etc.) — strong schema per backend, referenced from + `ExporterSet`. +4. **Fully generic opaque config** — single CRD with flat `provider.config` map. + +**Decision:** Option 2 — generic `ExporterSet` + pluggable provisioner on +`VirtualTargetClass` with **dictionary-based nested `parameters`**. Reject +options 1 and 3. + +**Rationale:** Separating orchestration (scaling, lease matching, graceful +shutdown) from provisioning (QEMU container, Corellium API, off-cluster hosts) +lets each provisioner implement backend-appropriate scaling while exposing an +identical scaling surface (`minReplicas`/`maxReplicas`/`minAvailableReplicas`). +Nested `parameters` on `VirtualTargetClass` and optional `ExporterSet` overrides +replace per-provider claim CRDs — homogeneous pools need only two admin CRDs. +Typed `*VirtualTarget` claims add maintenance overhead without benefit when +pools share one backend profile (2026-06 team review). New backends add a +provisioner string and parameter conventions, not pool-tier or claim-kind changes. + +### DD-4: Per-lease parameters vs. pool flavors + +**Alternatives considered:** + +1. **Per-lease `parameters` dictionary** — Leases carry opaque hints (CPU, + memory, storage) interpreted by provisioners. +2. **Multiple `ExporterSet` flavors** — Administrators create separate sets for + different resource profiles; users select via label matching. + +**Decision:** Option 2 — multiple set flavors via separate `ExporterSet` CRs. + +**Rationale:** Per-lease parameters add complexity across every layer for a use +case already satisfied by separate sets with different labels and +`VirtualTargetClass` parameters. Per-lease parameters can be revisited in a +future JEP if needed. + +### DD-5: Built-in scaling vs. HPA / KEDA + +**Alternatives considered:** + +1. **Built-in scaling logic** — Each provisioner implements lease-aware + reconciliation with a consistent scaling API. +2. **Kubernetes HPA** — Horizontal Pod Autoscaler with custom metrics. +3. **KEDA** — Event-driven autoscaler with a custom Jumpstarter scaler. + +**Decision:** Option 1 — built-in scaling logic with consistent API surface; +HPA/KEDA as complementary via `scale` subresource and exposed metrics. + +**Rationale:** Each provisioner should implement autoscaling appropriate to its +backend (local container churn vs. EC2 quotas vs. external API rate limits). A +single generic autoscaler cannot express lease-aware matching, graceful +disable-before-delete, or `minAvailableReplicas` invariants. However, the +**same scaling vocabulary** (`minReplicas`/`maxReplicas`/`minAvailableReplicas`) +and the `scale` subresource apply across all provisioners — one mental model for +users, backend-specific logic underneath. Pool metrics for HPA/KEDA are listed +in *Future Possibilities*. + +### DD-6: VirtualTargetClass vs. inline credentials + +**Alternatives considered:** + +1. **Inline credentials in every `ExporterSet`** — simple but duplicates secrets + across pools sharing the same backend account. +2. **`VirtualTargetClass` (namespaced backend profile)** — class in the same + namespace as the referencing `ExporterSet` holds credentials, nested + `parameters`, and scheduling; `ExporterSet.spec.virtualTargetClassName` + references the class by local name. +3. **Separate `ProviderConfig` CRD** — lighter-weight credential sharing without + full class semantics. + +**Decision:** Option 2 — **namespaced** `VirtualTargetClass` with optional future +`ProviderConfig` for multi-account credential reuse. + +**Rationale:** Unlike CSI `StorageClass` (cluster-scoped), `VirtualTargetClass` +is **namespaced** so teams define isolated backend profiles, credentials, and +scheduling per namespace without cluster-admin involvement. `ExporterSet` may +only reference a class in the **same namespace**; `credentialsSecretRef` points +to a Secret in that namespace — credentials never appear on `ExporterSet`. +`bindingMode` and `reclaimPolicy` still map to warm-pool vs. on-demand and +external target retention. The StorageClass/PVC *separation of class and consumer* +is retained; only scope differs. + +### DD-7: Instance TTL and image refresh (deferred) + +**Alternatives considered:** + +1. **`ExporterSet.spec.ttl` with image refresh** — declarative `maxAge`, + `maxIdleAge`, and `imageRefreshPolicy` on the pool CRD; controller recycles + instances and re-pulls container/firmware images to keep warm pools fresh. +2. **Manual / CronJob pool flush** — operators restart pools or delete Pods on a + schedule outside Jumpstarter. +3. **Admin-pinned images in `parameters`** — declare expected OS/firmware refs on + `VirtualTargetClass` / `ExporterSet`; provisioner always boots those images. +4. **User flash at lease time (v1)** — warm pool instances are provisioned with + baseline runtime only; the lessee flashes and boots the image they want via + existing drivers (`storage.flash`, power cycle) — same workflow as physical + targets. +5. **Separate lifecycle controller (future)** — a cross-cutting controller that + periodically visits **physical and virtual** exporters and flashes the + expected image, without virtual-only fields on `ExporterSet`. + +**Decision:** Reject options 1–3 for v1 — **no TTL, image-refresh, or +admin-pinned boot images on `ExporterSet` / `VirtualTargetClass`**. Option 4 +matches current Jumpstarter behavior: users flash and boot what they need after +leasing. Option 5 remains the preferred direction for automated image hygiene +later. + +**Rationale:** Time-based Pod recycle and provisioner-driven image re-pull are +virtual-pool mechanics that **physical exporters do not share**. Physical machines +have no `maxAge`; their OS changes when someone flashes them, not when a pool +controller rotates Pods. Putting TTL or pinned boot images on `ExporterSet` alone +would split the lease experience. In v1, virtual targets in the warm pool behave +like physical benches: the lessee selects and flashes the desired image. A future +**separate lifecycle controller** can watch `Exporter` resources regardless of +origin and apply uniform policies — e.g. periodic flash of a lab-defined expected +image to idle exporters, scheduled maintenance windows — combining long-lived +(non-refreshed) exporter instances with automated image updates when operators +choose to enable them. + +## Design Details + +### Reconciliation Loop + +Each `ExporterSet` controller runs a continuous reconciliation loop, triggered by +changes to the set CR, owned Exporters, or matching Leases: + +```text +for each ExporterSet CR: + mergedParameters = deepMerge(class.parameters, set.parameters) + ownedExporters = list Exporters owned by this CR + replicas = count ownedExporters in Ready state + leasedReplicas = count ownedExporters with an active LeaseRef + availableReplicas = replicas - leasedReplicas + pendingLeases = count pending Leases matching spec.selector + + # Invariant: maintain minAvailableReplicas warm buffer + if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas: + scale up to restore availableReplicas + + # Demand-driven scale-up + elif pendingLeases > 0 AND replicas < spec.maxReplicas: + scale up by min(pendingLeases, spec.maxReplicas - replicas) + + # Scale-down: excess idle replicas + elif availableReplicas > spec.minAvailableReplicas AND cooldown elapsed: + graceful scale down: + 1. set exporter.spec.enabled = false + 2. wait until leaseRef remains empty + 3. delete Pod and Exporter CR + (never below minAvailableReplicas) +``` + +### Instance States + +Each virtual exporter instance transitions through: + +```text +Provisioning → Ready (warm pool) → Leased → Ready + └→ Terminating → (deleted if available>min) +``` + +- **Provisioning:** Pod starting, virtual target provisioning, exporter registering. +- **Ready:** Exporter registered and available for lease. +- **Leased:** Exporter assigned to an active lease. +- **Terminating:** Instance being deleted (scale-down or failure replace). + +### Component Interaction + +1. Administrator creates `VirtualTargetClass` and `ExporterSet` resources. +2. The provisioner controller provisions `minAvailableReplicas` Exporters. +3. Each instance Pod boots the virtual target and runs the Jumpstarter exporter, + registering with the existing `jumpstarter-controller`. +4. Instances appear as regular exporters with labels from `spec.template.metadata`. +5. Users lease them normally — the existing controller handles assignment. +6. On lease release, the instance is recycled per `recycleStrategy`: + - **Exit-and-replace (default):** Exporter exits; controller replaces the + instance proactively to maintain `minAvailableReplicas`. + - **In-place reuse:** Exporter resets internal state without exiting; Pod + remains running and transitions back to Ready immediately. +7. The `ExporterSet` controller continuously monitors utilization and scales. + +### Failure Modes + +- **Pod crash:** Controller detects failure via Pod status, replaces the instance, + maintains `minAvailableReplicas` invariant. +- **Resource exhaustion:** Cannot scale beyond cluster capacity; set stays at + current size, new leases queue as for physical targets. +- **Provisioner startup failure:** Instance marked failed, controller retries with + backoff, alerts via conditions on the set status. +- **Scaling storm:** Rate limiting on scale-up prevents creating too many + instances simultaneously. + +## Test Plan + + + +### Unit Tests +Unit tests should meet the project test coverage requirements. + +### Integration Tests + +- End-to-end lease lifecycle with QEMU provisioner in a test cluster +- Mixed physical/virtual lease orchestration +- Provisioner failure and recovery scenarios +- Parameter deep-merge and provisioner-side validation +- `VirtualTargetClass` credential injection + +## Acceptance Criteria + +- [ ] `VirtualTargetClass` and `ExporterSet` CRDs defined +- [ ] `ExporterSet` controller maintains `minAvailableReplicas` warm buffer +- [ ] Controller scales up when available pool is depleted (up to `maxReplicas`) +- [ ] Controller scales down idle replicas after cooldown (never below + `minAvailableReplicas`) +- [ ] QEMU provisioner (`qemu.jumpstarter.dev`) fully implemented and tested +- [ ] Virtual instances register as standard exporters and are leasable without + changes to the existing lease flow +- [ ] Pod failures detected and reported in `ExporterSet` status +- [ ] An `ExporterSet` with `minAvailableReplicas: 0` provisions on demand only +- [ ] Status subresource reports Deployment-style counters and health conditions +- [ ] `scale` subresource enables `kubectl scale` interoperability +- [ ] `parameters` deep-merge produces correct merged config for provisioner +- [ ] Provisioner validates merged `parameters` and surfaces errors via conditions +- [ ] Documentation covers `VirtualTargetClass` and `ExporterSet` configuration + +## Graduation Criteria + +### Experimental + +- QEMU provisioner functional in a development cluster +- Basic set lifecycle works end-to-end (scale up, lease, release, scale down) +- Community feedback on CRD schema and scaling behavior + +### Stable + +- QEMU reference provisioner (`qemu.jumpstarter.dev`) production-ready; at least + one additional topology validated (e.g. off-cluster bare metal or API-backed) +- Production usage by at least one team for >1 month +- Performance benchmarks documented (cold-start latency, scaling responsiveness) +- Provisioner authoring guide published (how to add a new provisioner) + +## Backward Compatibility + +- Existing physical-only workflows are unaffected; lease requests without + virtual-specific labels continue to work as before. +- No changes to the existing gRPC protocol for physical exporters. +- New CRDs (`VirtualTargetClass`, `ExporterSet`) are additive. +- **Exporter `enabled` field:** Defaults to `true`, so all existing Exporters + continue to behave exactly as before. +- Administrators upgrading see no behavior change until they explicitly deploy + `ExporterSet` and `VirtualTargetClass` resources. + +## Consequences + +### Positive + +- **Instant lease fulfillment:** Warm pools eliminate provisioning latency. +- **Elastic scaling:** Sets grow and shrink with demand. +- **Unified user experience:** Virtual and physical targets leased the same way. +- **Kubernetes-native UX:** `minReplicas`/`maxReplicas`/`minAvailableReplicas`, + Deployment-style status, `kubectl scale` — familiar to cluster admins. +- **Pluggable backends:** New provisioners add a provisioner string. +- **Credential separation:** `VirtualTargetClass` keeps secrets off `ExporterSet` resources. +- **Fidelity ladder:** Same lease flow across sim, cloud virtual, and hardware tiers. + +### Negative + +- **Increased CRD surface:** `VirtualTargetClass` and `ExporterSet` add more + resources to manage than a single pool CRD per provider. +- **Resource consumption:** Warm pools consume cluster resources when idle. +- **Sidecar complexity:** Container-backed provisioners require multi-container + Pod orchestration and shared-volume protocols. + +### Risks + +- **Scaling storms:** Burst demand could exhaust cluster resources; rate limiting + mitigates but may delay lease fulfillment. +- **Provisioner reliability:** Failed startups can cause crash-replace loops. + +## Rejected Alternatives + +- **Static fixed-size pools (status quo):** Cannot scale with demand. +- **External orchestration (Terraform/Ansible):** Breaks lease semantics integration. +- **Per-lease `parameters` dictionary:** See DD-4. +- **CRD-per-pool without VirtualTarget separation:** Couples scaling and provider + config; rejected in favor of generic `ExporterSet` + pluggable provisioner. +- **Typed `*VirtualTarget` CRDs per provider:** Rejected at 2026-06 team review; + see DD-3. Dictionary `parameters` on class + set suffice for homogeneous pools. +- **`ExporterSet.spec.ttl` and image refresh:** Rejected for v1; see DD-7. Would + create virtual-only lifecycle semantics unlike physical exporters. + +## Prior Art + +- **LAVA:** Virtual DUTs via QEMU with static configuration; no on-demand scaling. +- **Crossplane:** General-purpose cloud composition; no Jumpstarter lease semantics. + Useful reference for external API integration (e.g., Corellium) but does not + replace pool-specific scaling logic. +- **CSI (StorageClass/PVC):** Class/consumer separation adopted; scope is + namespaced rather than cluster-scoped (see DD-6). +- **KubeVirt:** VM orchestration with pre-mounted images; Jumpstarter differs by + flash-at-runtime model and exporter-as-sidecar pattern. + +## Unresolved Questions + +- What is the exact scaling algorithm (proportional, step-based, predictive)? +- **Pod initialization for container-backed provisioners:** Native sidecar init + containers (`restartPolicy: Always`, KEP-753) vs. lifecycle hooks vs. other + co-location patterns for exporter + target-runtime. The sidecar sketch in this + JEP is provisional; resolve in the QEMU provisioner implementation PR. + +### Resolved + +- **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. +- **Lease release detection:** Controllers watch Lease objects directly. +- **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated + leases until effective. + +## Future Possibilities + +The following extensions are explicitly **not** part of this JEP but the model +stays open to them: + +- **Disaggregated/cross-node accelerators** — ARM64 runtime bridged to a remote + GPU via virtio-gpu/RDMA. +- **Separate `ProviderConfig` CRD** — multi-account credential reuse and rotation + referenced by multiple `VirtualTargetClass` resources. +- **Realized-instance CRD (PV analog)** — for static/pre-provisioned devices that + exist outside the dynamic provisioning flow. +- **`ExporterDeployment` rollout tier** — Deployment analog for rolling updates + across pool instances (versioned template changes). +- **Multiple/spawned-on-lease VirtualTargets per Exporter** — composite benches + and multi-device topologies. +- **Universal physical+virtual `Target` abstraction** — single resource type + spanning hardware and virtual backends. +- **Priority selectors / DeviceClass** — ordered label fallback ("prefer hardware, + fall back to QEMU") at lease time. +- **HPA/KEDA metric exposure** — complementary external autoscaling once core + provisioner controllers are stable. +- **Renode provider** — `renode.jumpstarter.dev` provisioner leveraging JEP-0010. +- **Additional cloud/container provisioners** — `corellium.jumpstarter.dev`, + `android.jumpstarter.dev`, `ec2.jumpstarter.dev` (no typed claim CRDs). +- **Composite leases** — multiple exporters linked into one logical lease. +- **Cross-cutting lifecycle controller** — periodic flash of lab-defined expected + images to idle **physical and virtual** exporters (see DD-7); long-lived pool + instances combined with optional automated image updates, not virtual-only TTL + on `ExporterSet`. + +## Implementation Plan + +The implementation is broken into phases. Each phase delivers a usable +increment and can be merged independently. **v1 focuses on the QEMU reference +implementation**; additional provisioners and lifecycle automation are deferred. + +| Phase | Scope | Status | +| --- | --- | --- | +| 1 | Exporter `enabled` field | Near-term | +| 2 | `VirtualTargetClass` + `ExporterSet` CRDs; nested `parameters`; `qemu.jumpstarter.dev` | Near-term (v1) | +| 3 | External/off-cluster provisioning (`qemu-baremetal.jumpstarter.dev`) | Near-term | +| 4+ | Lifecycle controller, Corellium/Android, etc. | Deferred — see *Future phases* | + +### Phase 1: Exporter `enabled` field + +Add the `enabled` boolean field to the Exporter CRD and update the +`jumpstarter-controller` lease assignment logic to skip disabled exporters. + +**Deliverables:** + +- [ ] Add `spec.enabled` field to Exporter CRD (default: `true`) +- [ ] Update lease assignment in `jumpstarter-controller` to filter out + disabled exporters +- [ ] Unit tests for the filtering logic +- [ ] Integration test: disable an exporter, verify it gets no new leases + +### Phase 2: Core CRDs and QEMU reference provisioner + +Define namespaced `VirtualTargetClass` and `ExporterSet` CRDs. Implement +**only** the `qemu.jumpstarter.dev` in-cluster provisioner — the reference +implementation for the 2-CRD model, parameter deep-merge, warm pool, and +flash-at-lease workflow (DD-7). + +**Deliverables:** + +- [ ] Define `VirtualTargetClass` and `ExporterSet` CRD schemas (namespaced; + nested `parameters` with schemaless object fields; same-namespace reference + rule) +- [ ] Implement parameter deep-merge and provisioner-side validation +- [ ] Implement exporter-set controller binary with `--provisioner=qemu.jumpstarter.dev` +- [ ] Sidecar Pod rendering (provisional init-container model — see Unresolved + Questions) +- [ ] Core scaling logic: `minAvailableReplicas`, demand-driven scale-up, graceful + scale-down +- [ ] Deployment-style status + `scale` subresource +- [ ] Watch Leases and Exporters for scaling decisions +- [ ] Add `exporterSets` section to `Jumpstarter` operator CR +- [ ] Integration test: deploy `ExporterSet`, lease, flash, boot, release, + observe scaling + +### Phase 3: External / off-cluster provisioning + +Extend the exporter-set controller with an off-cluster QEMU provisioner to +validate the pluggable backend model beyond in-cluster Pods. Documents and +implements the flow in *External and Off-Cluster Provisioning*. + +**Deliverables:** + +- [ ] `qemu-baremetal.jumpstarter.dev` provisioner (or equivalent off-cluster + stub) using the same binary with `--provisioner=qemu-baremetal.jumpstarter.dev` +- [ ] Remote host selection, SSH/agent deploy, and `Exporter` CR registration from + off-cluster instances +- [ ] Example `VirtualTargetClass` + `ExporterSet` manifests for lab bare-metal + (automotive profile) +- [ ] Integration test or documented manual test plan for off-cluster scale-up + and lease + +### Future phases (deferred) + +The following are **explicitly out of v1** scope. They reuse the same +`VirtualTargetClass` + `ExporterSet` CRDs and nested `parameters` — no typed +claim CRDs. + +**Additional provisioners** + +- [ ] `corellium.jumpstarter.dev` — API-backed cloud virtual devices +- [ ] `android.jumpstarter.dev` — in-cluster Android emulator pools +- [ ] `ec2.jumpstarter.dev` — AWS-backed targets +- [ ] Provisioner authoring guide + +**Cross-cutting lifecycle controller (DD-7)** + +- [ ] Separate controller for periodic flash / maintenance on **physical and + virtual** exporters — not `ExporterSet.spec.ttl` + +## Implementation History + +- 2025-10-30: RFE filed upstream (GitHub #41) +- 2026-06-03: JEP proposed +- 2026-06-18: Revised per review — ExporterSet, VirtualTargetClass, pluggable + provisioner model; added end-to-end flow section +- 2026-06-18: Team review — dictionary `parameters`, removed typed VirtualTarget + CRDs, namespaced `VirtualTargetClass`, deferred TTL (DD-7) + +## References + +- [GitHub Issue #41: RFE: On-Demand Virtual Target Provisioning](https://github.com/jumpstarter-dev/jumpstarter/issues/41) +- [PITCREW-409: jumpstarter JEP: virtual scalable exporters](https://redhat.atlassian.net/browse/PITCREW-409) +- [JEP-0010: Renode Integration](JEP-0010-renode-integration.md) — Related provider +- [JEP-0013: Observability](JEP-0013-observability-telemetry-logs.md) — Integration point + +--- + +This JEP is licensed under the +[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0), diff --git a/python/docs/source/contributing/jeps/index.md b/python/docs/source/contributing/jeps/index.md index 05e205eb8..352e6eb23 100644 --- a/python/docs/source/contributing/jeps/index.md +++ b/python/docs/source/contributing/jeps/index.md @@ -37,6 +37,7 @@ For the full process definition, see [JEP-0000](JEP-0000-jep-process.md). | 0010 | [Renode Integration](JEP-0010-renode-integration.md) | Implemented | @vtz (Vinicius Zein) | | 0011 | [Protobuf Introspection and Interface Generation](JEP-0011-protobuf-introspection-interface-generation.md) | Accepted | @kirkbrauer (Kirk Brauer) | | 0013 | [Metrics, Tracing, and Log Observability](JEP-0013-observability-telemetry-logs.md) | Accepted | @mangelajo (Miguel Angel Ajo Pelayo) | +| 0014 | [Virtual Scalable Exporters](JEP-0014-virtual-scalable-exporters.md) | Draft | @mangelajo (Miguel Angel Ajo Pelayo) | ### Informational JEPs @@ -63,4 +64,12 @@ For the full process definition, see [JEP-0000](JEP-0000-jep-process.md). | Active | Living document, actively maintained (Process JEPs only) | | Superseded | Replaced by a newer JEP | +```{toctree} +:hidden: +JEP-0000-jep-process.md +JEP-0010-renode-integration.md +JEP-0011-protobuf-introspection-interface-generation.md +JEP-0013-observability-telemetry-logs.md +JEP-0014-virtual-scalable-exporters.md +```