intel · harp-intel · Apr 10, 2026 · Apr 10, 2026
@@ -0,0 +1,148 @@
+---
+name: functional-test
+description: >
+  Use this skill when running functional tests to validate PerfSpect code changes,
+  when the user says "run functional tests", "test my changes", "check for regressions",
+  or when verifying a code change did not break existing functionality.
+---
+
+> **Skill Loaded:** "Using functional-test skill."
+
+# Functional Test Runner
+
+Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change.
+
+## Test script
+
+`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding.
+
+## Prerequisites
+
+1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`).
+2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target.
+3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`).
+
+## Workflow
+
+### Step 1 — Analyze the code change
+
+Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify:
+
+- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content.
+- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table?
+
+### Step 2 — Identify affected test categories
+
+Use the code-to-category mapping below to determine which `TEST_*` categories are affected.
+
+| Changed path | Categories |
+|---|---|
+| `cmd/config/` | `TEST_CONFIG` |
+| `cmd/flamegraph/` | `TEST_FLAME` |
+| `cmd/lock/` | `TEST_LOCK` |
+| `cmd/metrics/` | `TEST_METRICS` |
+| `cmd/report/` | `TEST_REPORT` |
+| `cmd/benchmark/` | `TEST_BENCHMARK` |
+| `cmd/telemetry/` | `TEST_TELEMETRY` |
+| `cmd/root.go` | All — trace the specific change to narrow |
+| `internal/app/` | All — trace the specific change to narrow |
+| `internal/workflow/` | All reporting commands — trace to narrow |
+| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` |
+| `internal/target/` | All — affects SSH/local execution |
+| `internal/script/` | All — affects script execution |
+| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` |
+| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` |
+| `internal/cpus/` | All — CPU detection used everywhere |
+| `internal/progress/` | All — progress UI used everywhere |
+| `internal/util/` | All — trace the specific change to narrow |
+| `main.go`, `go.mod`, `go.sum` | All |
+| `scripts/`, `tools/` | All — embedded resources |
+
+### Step 3 — Identify specific affected tests
+
+Read the test catalog for each affected category. Load **only** the doc files for affected categories:
+
+| Category | Test catalog |
+|---|---|
+| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) |
+| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) |
+| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) |
+| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) |
+| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) |
+| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) |
+| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) |
+
+Within the loaded catalog, find every test whose behavior intersects with the change using these criteria:
+
+1. **Flag changes** — Tests that pass the changed flag in `t_args`.
+2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string.
+3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`.
+4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.).
+5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests.
+6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies.
+7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change.
+
+Build a list of specific test names (`t_name` values) and their category.
+
+### Step 4 — Predict expected test outcomes
+
+For each identified test, determine whether the code change should:
+
+- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns.
+- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be.
+- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded.
+
+### Step 5 — Run the affected test categories
+
+Disable all categories except those containing affected tests:
+
+```bash
+TARGET=<host> USER_NAME=<user> PRIVATE_KEY_PATH=<key> \
+  PERFSPECT_DIR=. \
+  TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \
+  TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \
+  <enable affected categories here>=true \
+  ../tools/perfspect/functional_test.sh -q -v
+```
+
+Add `NO_ROOT=true` if the remote user does not have password-less sudo.
+
+### Step 6 — Verify output aligns with the change
+
+Do not stop at PASS/FAIL. For each affected test:
+
+1. **Read the test output.** Examine `test/output/<N>-<test_name>/stdout.txt`, `stderr.txt`, and `perfspect.log`.
+2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples:
+   - Error message changed → confirm `stderr.txt` contains the new text.
+   - New output field added → confirm it appears in `stdout.txt` or generated report files.
+   - Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content.
+   - Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path.
+   - Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files.
+3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns.
+
+### Step 7 — Report to user
+
+Provide:
+- The list of tests identified as affected and why.
+- PASS/FAIL status of each.
+- For each affected test: what was verified in the output and whether the change is reflected correctly.
+- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change).
+- Any tests that passed but whose output reveals a concern.
+
+## Environment variable reference
+
+| Variable | Default | Purpose |
+|---|---|---|
+| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary |
+| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts |
+| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) |
+| `USER_NAME` | _(empty)_ | SSH username for remote target |
+| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target |
+| `NO_ROOT` | `false` | Set to `true` to run without root |
+| `TEST_CONFIG` | `true` | Run config tests |
+| `TEST_FLAME` | `true` | Run flame tests |
+| `TEST_LOCK` | `true` | Run lock tests |
+| `TEST_METRICS` | `true` | Run metrics tests |
+| `TEST_REPORT` | `true` | Run report tests |
+| `TEST_BENCHMARK` | `true` | Run benchmark tests |
+| `TEST_TELEMETRY` | `true` | Run telemetry tests |
@@ -0,0 +1,30 @@
+# Benchmark Tests (TEST_BENCHMARK)
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) |
+| `benchmark input` | `benchmark --input <prev>` | Reprocessing from `benchmark default` output |
+| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra |
+| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 |
+
+## Flags exercised
+
+`--input`, `--format`, unknown flags (cobra validation)
+
+Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly).
+
+## Test dependencies
+
+- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`).
+
+## Output verification guidance
+
+- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content.
+- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks.
+- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely).
+- **`benchmark invalid format`**: Verify exit code is 1.
+- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change.
+- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format.
+- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required.
@@ -0,0 +1,40 @@
+# Config Tests (TEST_CONFIG)
+
+All config tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `config help` | `config --help` | Help text prints `Usage:` |
+| `config default` | `config` | No-op prints `No changes requested` and `Configuration` |
+| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting |
+| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms |
+| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table |
+| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` |
+| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` |
+| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` |
+| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` |
+| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` |
+| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` |
+| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` |
+| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` |
+| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` |
+| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` |
+| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` |
+| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` |
+| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` |
+| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` |
+| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` |
+| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` |
+| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` |
+
+## Flags exercised
+
+`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help`
+
+## Output verification guidance
+
+- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set <flag> to <value>` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set.
+- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, --<flag> <value>, valid values are` message. Verify exit code is 1.
+- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value.
@@ -0,0 +1,43 @@
+# Flame Tests (TEST_FLAME)
+
+All flame tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Runner | Args exercised | Validates |
+|---|---|---|---|
+| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` |
+| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` |
+| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks |
+| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks |
+| `flame with input` | `run_test` | `flame --input <prev_output>` | Reprocessing from raw data produced by `flame all options` |
+| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` |
+| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` |
+| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` |
+| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` |
+| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` |
+
+## Flags exercised
+
+`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input`
+
+## Custom validation functions
+
+Tests `flame duration java`, `flame all options`, `flame sigint java` use:
+```bash
+jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java"
+```
+
+Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use:
+```bash
+jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng"
+```
+
+## Output verification guidance
+
+- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name.
+- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting.
+- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1.
+- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown).
+- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed.
+- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated.
@@ -0,0 +1,23 @@
+# Lock Tests (TEST_LOCK)
+
+All lock tests require root (`t_requires_root=true`).
+
+## Test catalog
+
+| Test name | Args exercised | Validates |
+|---|---|---|
+| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection |
+| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) |
+| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) |
+| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) |
+
+## Flags exercised
+
+`--duration`, `--frequency`, `--package`, `--no-summary`, `--format`
+
+## Output verification guidance
+
+- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded.
+- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added.
+- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align.
+- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user.