Skip to content

feat(logging): JSON formatter + structured dispatch fields (COW-1035)#37

Open
brunota20 wants to merge 1 commit into
feat/resource-limit-tests-cow-1036from
feat/structured-logging-audit-cow-1035
Open

feat(logging): JSON formatter + structured dispatch fields (COW-1035)#37
brunota20 wants to merge 1 commit into
feat/resource-limit-tests-cow-1036from
feat/structured-logging-audit-cow-1035

Conversation

@brunota20

Copy link
Copy Markdown
Collaborator

What does this PR do?

Closes the COW-1035 structured-logging audit. The engine now ships JSON-formatted logs by default with consistent field shapes across every dispatch, host call, and order submission. A single `jq` / Loki / Grafana stream reconstructs the full timeline of any module event.

Third M4 issue landed.

Changes

Area Change
Formatter New `--pretty-logs` CLI flag. Default is JSON (`tracing_subscriber::fmt::json().flatten_event(true)`). `--pretty-logs` opts into the 0.1 human-readable format for local dev. `tracing-subscriber` gains the `json` feature.
Dispatch logs Every `dispatch_block` / `dispatch_log` invocation emits a structured `dispatch ok` DEBUG line with `module`, `chain_id`, `event_kind` ("block" / "log"), `block_number`, `latency_ms`. Existing host-error WARN + trap ERROR paths gain the same fields for cross-correlation.
Host backends Already emit structured DEBUG with `chain_id`, `method`, `bytes`, `latency_ms` on `chain::request` / `cow-api::submit-order` / `cow-api::request`. Audit confirmed no changes needed.
Runbook ergonomics `just run-m2` / `just run-m3` pass `--pretty-logs` so the runbook output samples keep matching the operator's local terminal. Production deploys omit the flag and get JSON.

Live Sepolia validation

{"timestamp":"2026-06-18T13:56:48.587125Z","level":"DEBUG","message":"dispatch ok",
 "module":"price-alert","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":134,"target":"nexum_engine::supervisor"}
{"timestamp":"2026-06-18T13:56:48.857911Z","level":"DEBUG","message":"dispatch ok",
 "module":"balance-tracker","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":270,"target":"nexum_engine::supervisor"}
{"timestamp":"2026-06-18T13:56:50.193531Z","level":"DEBUG","message":"dispatch ok",
 "module":"stop-loss","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":1335,"target":"nexum_engine::supervisor"}

Per-module dispatch latency now observable: price-alert ~135ms (1 eth_call), balance-tracker ~270ms (2 eth_getBalance), stop-loss ~1.3s (oracle read + OrderCreation + cow-api submit + retry classify).

Breaking changes

Default log format changed from human-readable to JSON. Existing operators who parse log output by-line will need to pass `--pretty-logs` to restore the old format, OR pipe through `jq` to extract the same information. The runbooks already opt into `--pretty-logs`; only direct `cargo run -p nexum-engine` invocations need to add the flag.

Out of scope (deferred)

  • Per-order timeline aggregation (would need a `uid` field on the stop-loss / twap-monitor submit logs joined with the supervisor dispatch line). Module-side logs already include the UID; downstream correlation via `jq` works today. Worth revisiting after COW-1034 (Prometheus) lands.
  • OpenTelemetry trace IDs / spans across the dispatch -> host-call boundary. Worth a follow-up once metric labels are decided.

Testing

  • `cargo test --workspace` -> 154 host tests + 6 doctests passing.
  • `cargo clippy --all-targets --workspace -- -D warnings` clean.
  • `cargo fmt --all --check` clean.
  • Live Sepolia smoke run captures JSON output; per-module latencies plausible.
  • `--pretty-logs` flag verified to opt back into the 0.1 formatter.

AI assistance disclosure

AI Assistance: this change + description was produced by a Claude Code agent (Claude Opus 4.7 1M context). The agent audited the existing logging call sites, ran the JSON output live against Sepolia for verification, and wrote the runbook ergonomics tweaks. A human (Bruno) reviewed and is accountable for the result.

Linear: COW-1035. Stacks on #36 (COW-1036 resource-limit tests).

Closes the COW-1035 structured-logging audit. The engine now ships
JSON-formatted logs by default, with consistent field shapes across
every dispatch, host call, and order submission. A single `jq` /
Loki / Grafana stream over the engine output reconstructs the full
timeline of any module event.

## Changes

### Formatter

- New `--pretty-logs` CLI flag (parsed in `cli.rs`). When set, the
  engine uses the historical 0.1 human-readable formatter; otherwise
  emits JSON with flattened event fields + no current-span noise.
- `main.rs` `tracing_subscriber::fmt` builder picks the formatter from
  the flag. `EnvFilter` (`RUST_LOG` / `engine.toml::[engine].log_level`)
  applies to both.
- `tracing-subscriber` feature set gains `json`.

### Dispatch logs (supervisor.rs)

Every `dispatch_block` and `dispatch_log` invocation now emits a
structured log line:

- `dispatch ok` (DEBUG) on success with `module`, `chain_id`,
  `event_kind` ("block" / "log"), `block_number`, `latency_ms`.
- Existing host-error WARN + trap ERROR paths gain the same fields
  for cross-correlation.

Live Sepolia validation:

```json
{"timestamp":"2026-06-18T13:56:48.587125Z","level":"DEBUG","message":"dispatch ok",
 "module":"price-alert","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":134,"target":"nexum_engine::supervisor"}
{"timestamp":"2026-06-18T13:56:48.857911Z","level":"DEBUG","message":"dispatch ok",
 "module":"balance-tracker","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":270,"target":"nexum_engine::supervisor"}
{"timestamp":"2026-06-18T13:56:50.193531Z","level":"DEBUG","message":"dispatch ok",
 "module":"stop-loss","chain_id":11155111,"event_kind":"block",
 "block_number":11087508,"latency_ms":1335,"target":"nexum_engine::supervisor"}
```

The latency-per-module distribution is now observable per-block:
price-alert ~135 ms (1 eth_call), balance-tracker ~270 ms (2
eth_getBalance), stop-loss ~1.3 s (oracle read + OrderCreation +
cow-api submit + retry classify).

### Host-call logs (pre-existing, kept as-is)

The M1 host backends already emit structured DEBUG with `chain_id`,
`method`, `bytes`, `latency_ms` on every `chain::request` /
`cow-api::submit-order` / `cow-api::request`. The audit confirmed
they already satisfy the COW-1035 contract; no changes needed.

### Runbook ergonomics

`just run-m2` and `just run-m3` pass `--pretty-logs` so the
runbook output samples (M2 + M3 runbooks) keep matching what the
operator sees locally. Production deploys (`cargo run -p
nexum-engine -- --engine-config engine.toml` directly, e.g. from
a Docker entrypoint) get JSON by default.

## Validation

- `cargo test --workspace` -> 154 host tests + 6 doctests passing.
- `cargo clippy --all-targets --workspace -- -D warnings` clean.
- `cargo fmt --all --check` clean.
- Live Sepolia smoke: JSON output captured; per-module dispatch
  latencies plausible against expected work per module.
- Pretty-logs flag verified to opt back into the 0.1 formatter.

## What this still doesn't do

- Per-order timeline aggregation (would need a `uid` field on the
  stop-loss / twap-monitor submit logs). Currently the order UID
  is logged by each module's `host.log` call (module-side) but
  not joined with the supervisor dispatch line. Acceptable today;
  the JSON shape supports adding the field later without breaking
  consumers.
- Trace IDs / spans across the dispatch -> host-call boundary.
  Worth a follow-up once Prometheus (COW-1034) lands and we know
  the metric labels we want to correlate against.

Linear: COW-1035. Third M4 issue landed; stacks on #36 (COW-1036).
@linear-code

linear-code Bot commented Jun 18, 2026

Copy link
Copy Markdown

COW-1035

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant