Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions .claude/skills/functional-test/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
name: functional-test
description: >
Use this skill when running functional tests to validate PerfSpect code changes,
when the user says "run functional tests", "test my changes", "check for regressions",
or when verifying a code change did not break existing functionality.
---

> **Skill Loaded:** "Using functional-test skill."

# Functional Test Runner

Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change.

## Test script

`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding.

## Prerequisites

1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`).
2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target.
3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`).

## Workflow

### Step 1 — Analyze the code change

Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify:

- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content.
- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table?

### Step 2 — Identify affected test categories

Use the code-to-category mapping below to determine which `TEST_*` categories are affected.

| Changed path | Categories |
|---|---|
| `cmd/config/` | `TEST_CONFIG` |
| `cmd/flamegraph/` | `TEST_FLAME` |
| `cmd/lock/` | `TEST_LOCK` |
| `cmd/metrics/` | `TEST_METRICS` |
| `cmd/report/` | `TEST_REPORT` |
| `cmd/benchmark/` | `TEST_BENCHMARK` |
| `cmd/telemetry/` | `TEST_TELEMETRY` |
| `cmd/root.go` | All — trace the specific change to narrow |
| `internal/app/` | All — trace the specific change to narrow |
| `internal/workflow/` | All reporting commands — trace to narrow |
| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` |
| `internal/target/` | All — affects SSH/local execution |
| `internal/script/` | All — affects script execution |
| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` |
| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` |
| `internal/cpus/` | All — CPU detection used everywhere |
| `internal/progress/` | All — progress UI used everywhere |
| `internal/util/` | All — trace the specific change to narrow |
| `main.go`, `go.mod`, `go.sum` | All |
| `scripts/`, `tools/` | All — embedded resources |

### Step 3 — Identify specific affected tests

Read the test catalog for each affected category. Load **only** the doc files for affected categories:

| Category | Test catalog |
|---|---|
| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) |
| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) |
| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) |
| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) |
| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) |
| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) |
| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) |

Within the loaded catalog, find every test whose behavior intersects with the change using these criteria:

1. **Flag changes** — Tests that pass the changed flag in `t_args`.
2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string.
3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`.
4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.).
5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests.
6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies.
7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change.

Build a list of specific test names (`t_name` values) and their category.

### Step 4 — Predict expected test outcomes

For each identified test, determine whether the code change should:

- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns.
- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be.
- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded.

### Step 5 — Run the affected test categories

Disable all categories except those containing affected tests:

```bash
TARGET=<host> USER_NAME=<user> PRIVATE_KEY_PATH=<key> \
PERFSPECT_DIR=. \
TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \
TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \
<enable affected categories here>=true \
../tools/perfspect/functional_test.sh -q -v
```

Add `NO_ROOT=true` if the remote user does not have password-less sudo.

### Step 6 — Verify output aligns with the change

Do not stop at PASS/FAIL. For each affected test:

1. **Read the test output.** Examine `test/output/<N>-<test_name>/stdout.txt`, `stderr.txt`, and `perfspect.log`.
2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples:
- Error message changed → confirm `stderr.txt` contains the new text.
- New output field added → confirm it appears in `stdout.txt` or generated report files.
- Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content.
- Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path.
- Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files.
3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns.

### Step 7 — Report to user

Provide:
- The list of tests identified as affected and why.
- PASS/FAIL status of each.
- For each affected test: what was verified in the output and whether the change is reflected correctly.
- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change).
- Any tests that passed but whose output reveals a concern.

## Environment variable reference

| Variable | Default | Purpose |
|---|---|---|
| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary |
| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts |
| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) |
| `USER_NAME` | _(empty)_ | SSH username for remote target |
| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target |
| `NO_ROOT` | `false` | Set to `true` to run without root |
| `TEST_CONFIG` | `true` | Run config tests |
| `TEST_FLAME` | `true` | Run flame tests |
| `TEST_LOCK` | `true` | Run lock tests |
| `TEST_METRICS` | `true` | Run metrics tests |
| `TEST_REPORT` | `true` | Run report tests |
| `TEST_BENCHMARK` | `true` | Run benchmark tests |
| `TEST_TELEMETRY` | `true` | Run telemetry tests |
30 changes: 30 additions & 0 deletions .claude/skills/functional-test/docs/benchmark-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Benchmark Tests (TEST_BENCHMARK)

## Test catalog

| Test name | Args exercised | Validates |
|---|---|---|
| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) |
| `benchmark input` | `benchmark --input <prev>` | Reprocessing from `benchmark default` output |
| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra |
| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 |

## Flags exercised

`--input`, `--format`, unknown flags (cobra validation)

Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly).

## Test dependencies

- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`).

## Output verification guidance

- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content.
- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks.
- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely).
- **`benchmark invalid format`**: Verify exit code is 1.
- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change.
- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format.
- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required.
40 changes: 40 additions & 0 deletions .claude/skills/functional-test/docs/config-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Config Tests (TEST_CONFIG)

All config tests require root (`t_requires_root=true`).

## Test catalog

| Test name | Args exercised | Validates |
|---|---|---|
| `config help` | `config --help` | Help text prints `Usage:` |
| `config default` | `config` | No-op prints `No changes requested` and `Configuration` |
| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting |
| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms |
| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table |
| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` |
| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` |
| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` |
| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` |
| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` |
| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` |
| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` |
| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` |
| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` |
| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` |
| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` |
| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` |
| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` |
| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` |
| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` |
| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` |
| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` |

## Flags exercised

`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help`

## Output verification guidance

- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set <flag> to <value>` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set.
- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, --<flag> <value>, valid values are` message. Verify exit code is 1.
- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value.
43 changes: 43 additions & 0 deletions .claude/skills/functional-test/docs/flame-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Flame Tests (TEST_FLAME)

All flame tests require root (`t_requires_root=true`).

## Test catalog

| Test name | Runner | Args exercised | Validates |
|---|---|---|---|
| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` |
| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` |
| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks |
| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks |
| `flame with input` | `run_test` | `flame --input <prev_output>` | Reprocessing from raw data produced by `flame all options` |
| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` |
| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` |
| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` |
| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` |
| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` |

## Flags exercised

`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input`

## Custom validation functions

Tests `flame duration java`, `flame all options`, `flame sigint java` use:
```bash
jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java"
```

Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use:
```bash
jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng"
```

## Output verification guidance

- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name.
- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting.
- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1.
- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown).
- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed.
- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated.
23 changes: 23 additions & 0 deletions .claude/skills/functional-test/docs/lock-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Lock Tests (TEST_LOCK)

All lock tests require root (`t_requires_root=true`).

## Test catalog

| Test name | Args exercised | Validates |
|---|---|---|
| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection |
| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) |
| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) |
| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) |

## Flags exercised

`--duration`, `--frequency`, `--package`, `--no-summary`, `--format`

## Output verification guidance

- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded.
- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added.
- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align.
- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user.
Loading
Loading