Skip to content

Nebius provider: boot disk creation fails with BootCpuArchitecture is invalid #111

@RaleighSF

Description

@RaleighSF

Summary

Creating any Nebius-backed workspace via Brev fails immediately with:

rpc error: code = InvalidArgument desc = BootDisk.BootCpuArchitecture is invalid (type: Error, retryable: true)

Reproduced consistently on 2026-04-20 for SKU gpu-h200-sxm.1gpu-16vcpu-200gb via both direct API POST and the production web console at brev.nvidia.com.

Root cause

v1/providers/nebius/instance.go::buildDiskCreateRequest constructs a compute.DiskSpec with Size, Type, and Source fields but does not set SourceImageCpuArchitecture.

Nebius's proto (nebius/api/nebius/compute/v1/disk.proto) declares:

enum SourceImageCPUArchitecture {
  SOURCE_IMAGE_CPU_UNSPECIFIED = 0;
  AMD64 = 1;
  ARM64 = 2;
}

SourceImageCPUArchitecture source_image_cpu_architecture = 9;

When the field is unset, it defaults to SOURCE_IMAGE_CPU_UNSPECIFIED, which Nebius rejects with the error above. Since buildDiskCreateRequest is shared across every Nebius SKU code path, this affects all Nebius-backed workspaces (H100 SXM, H200 SXM, 1-GPU and 8-GPU variants).

Reproduction

  1. Create any Nebius workspace via brev.nvidia.com or via a direct POST to /api/organizations/{org}/workspaces.
  2. Workspace lands in status: FAILURE immediately with the error above in statusMessage.
  3. brev reset <workspace> returns rpc error: code = Internal desc = not implemented — no client-side remediation path.

Captured working-shape payload from the web console (still produces FAILURE because the bug is server-side in the Brev→Nebius call, not in the payload from the client to Brev):

{
  "name": "cosmos-reason-lab",
  "workspaceGroupId": "brev-nebius-prod",
  "workspaceTemplateId": "4nbb4lg2s",
  "instanceType": "gpu-h200-sxm.1gpu-16vcpu-200gb",
  "diskStorage": "500Gi",
  "workspaceVersion": "v1",
  "vmBuild": {"forceJupyterInstall": true}
}

Proposed fix

In buildDiskCreateRequest, set the CPU architecture on the disk spec before attaching the image source:

baseReq.Spec.SourceImageCpuArchitecture = compute.SourceImageCPUArchitecture_AMD64

For ARM images, derive from the image family name or from an attrs.Architecture field if surfaced.

Impact

Blocks all Nebius-backed workspace creation for affected orgs. Nebius is the only Brev provider surfacing stop/start-capable H100/H200 SKUs in many orgs' catalogs (Shadeform SKUs are stoppable: false), so this effectively blocks modern-GPU demo usage requiring stop/start.

Environment

  • Brev CLI: v0.6.322 (latest)
  • Org IDs affected: confirmed on org-3BzqVpk4eldrvOy47zcCQOhCFHq
  • Failure timestamps: 2026-04-20 / 2026-04-21 UTC
  • SKUs tested: gpu-h200-sxm.1gpu-16vcpu-200gb

Related gaps

  • brev reset returns "not implemented" for the Nebius provider — worth tracking separately.
  • brevdev/brev-cli pkg/instancetypes/instancetypes.go last updated 2024-05-30; brev start --gpu rejects every modern SKU returned by brev search gpu (separate issue in brev-cli).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions