FE-711: Add Metrics to Experiments by kube · Pull Request #8751 · hashintel/hash

kube · 2026-05-25T23:17:28Z

🌟 What is the purpose of this PR?

🔗 Related links

...

🚫 Blocked by

...

🔍 What does this change?

...

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

does not modify any publishable blocks or libraries, or modifications do not need publishing
modifies an npm-publishable library and I have added a changeset file(s)
modifies a Cargo-publishable library and I have amended the version
modifies a Cargo-publishable library, but it is not yet ready to publish
modifies a block that will need publishing via GitHub action once merged
I am unsure / need advice

📜 Does this require a change to the docs?

The changes in this PR:

are internal and do not require a docs change
are in a state where docs changes are not yet required but will be
- this is tracked in: Insert Link Here
require changes to docs which are made as part of this PR
require changes to docs which are not made in this PR
- Provide more detail here
I am unsure / need advice

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

do not affect the execution graph
affected the execution graph, and the turbo.json's have been updated to reflect this
I am unsure / need advice

⚠️ Known issues

🐾 Next steps

🛡 What tests cover this?

❓ How to test this?

Checkout the branch / view the deployment
Try X
Confirm that Y

📹 Demo

vercel · 2026-05-25T23:17:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hash	Ready	Preview, Comment	May 26, 2026 9:11pm
petrinaut	Ready	Preview, Comment	May 26, 2026 9:11pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
hashdotdesign-tokens	Ignored	Preview	May 26, 2026 9:11pm

cursor · 2026-05-25T23:17:35Z

PR Summary

Medium Risk
Published core API and worker message shape change, plus user-supplied expression metrics compiled at runtime; well covered by tests but affects experiment execution and streaming behavior.

Overview
Monte Carlo experiments no longer stream a built-in place token count distribution; they use configurable metrics (place means, transition firing counts, or compiled expression code) with scalar or per-run distribution output, optional run/time aggregation, and mergeable accumulators.

petrinaut-core exposes the new metric APIs, a per-run SimulationFrameReader for sampling, worker metricFrames / init metricSpecs, and experiment metrics (replacing distributions), including a local experiment path when metrics are defined without a worker.

Petrinaut UI requires at least one metric when creating an experiment (drawer editor with LSP for expressions), stores metricFrames / latestMetricFramesById, and charts results per metric instead of a single place timeline; the standalone Simulate Metrics tab is hidden from the mode switcher.

^{Reviewed by Cursor Bugbot for commit 74a64da. Bugbot is set up for automated code reviews on this repo. Configure here.}

augmentcode · 2026-05-25T23:20:30Z

🤖 Augment PR Summary

Summary: This PR introduces first-class Monte Carlo “experiment metrics” for Petrinaut experiments, including runtime/user-defined metric evaluation and UI controls to configure and view metric outputs.

Changes:

Exports new Monte Carlo metric APIs and types from @hashintel/petrinaut-core (metric specs, user-defined metrics, accumulators).
Adds a SimulationFrameReader-based per-run frame reader so metrics can safely inspect place/transition state.
Implements numeric and histogram accumulator utilities plus tests for merge/monoid behavior.
Adds metric spec compilation (including expression metrics via compileMetric) into user-defined metric configs.
Extends createMonteCarloExperiment() with a local execution path when metric callbacks/specs are provided (worker cannot receive executable code).
Updates experiments React state/provider to store metric specs, metric frames, and latest-by-id.
Adds UI for defining metrics when creating an experiment (including LSP diagnostics for expression metrics).
Updates experiment viewing UI to optionally show token-count timeline and a new metrics summary section.
Adds architecture/proposal/usage docs for Monte Carlo metrics direction.

Technical Notes: Worker-backed experiments remain distribution-only; experiments with expression/aggregated metrics run locally to avoid posting JS code across the worker boundary.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

graphite-app · 2026-05-25T23:49:10Z

+  color: "neutral.s120",
+  cursor: "pointer",
+});
+
+const metricCollapseIconStyle = css({
+  transition: "[transform 200ms ease-in-out]",
+  "&[data-state=open]": {
+    transform: "[rotate(90deg)]",
+  },


Metric ID validation inconsistency allows duplicate IDs with different whitespace. The code trims the ID when checking if empty (line 136) but adds the untrimmed ID to the Set (line 142), and checks for duplicates using the untrimmed ID (line 139). This means metric IDs like " test" and "test" would both pass validation as unique IDs, causing potential conflicts.

for (const metricSpec of input.metricSpecs) { const trimmedId = metricSpec.id.trim(); if (trimmedId === "") { throw new Error("Metric id is required"); } if (metricIds.has(trimmedId)) { throw new Error(`Metric id "${trimmedId}" is duplicated`); } metricIds.add(trimmedId); // ... }

Spotted by Graphite

Is this helpful? React 👍 or 👎 to let us know.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c8d52c6. Configure here.}

cursor · 2026-05-25T23:57:42Z

+    time: firstRunSummary.currentTime,
+    runCount: simulator.runCount,
+  };
+}


Progress reports run 0's frame, not global frame

Medium Severity

getProgressFromResult and progressFromResult derive frameNumber and time from simulator.getRunSummary(0), which returns run 0's individual frame number. If run 0 completes early (e.g., no firable transitions), its frameNumber freezes while other runs continue advancing. The old code sourced these values from the distribution metric's context.frameNumber, which was the global simulator frame number — always reflecting the latest advance across all runs.

Additional Locations (1)

libs/@hashintel/petrinaut-core/src/simulation/monte-carlo/worker/monte-carlo.worker.ts#L36-L47

^{Reviewed by Cursor Bugbot for commit c8d52c6. Configure here.}

cursor · 2026-05-25T23:57:42Z

    case "cancel": {
      isRunning = false;
      simulator = null;
-      distributionMetric = null;


Worker cancel handler doesn't reset userMetrics array

Low Severity

The cancel handler sets simulator = null and isInitialized = false but does not reset userMetrics = []. The old code cleared distributionMetric = null at this point, and the init error handler in the same file correctly resets userMetrics = []. This inconsistency leaves stale metric objects and their accumulated frame data in worker memory after cancellation until the worker is re-initialized or terminated.

^{Reviewed by Cursor Bugbot for commit c8d52c6. Configure here.}

kube self-assigned this May 25, 2026

github-actions Bot added area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) type/eng > frontend Owned by the @frontend team labels May 25, 2026

augmentcode Bot reviewed May 25, 2026

View reviewed changes

vercel Bot deployed to Preview – petrinaut May 25, 2026 23:20 View deployment

graphite-app Bot reviewed May 25, 2026

View reviewed changes

Comment thread libs/@hashintel/petrinaut/src/react/experiments/provider.tsx Outdated

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread ...l/petrinaut/src/ui/views/Editor/panels/SimulateView/experiments/create-experiment-drawer.tsx

Comment thread ...l/petrinaut/src/ui/views/Editor/panels/SimulateView/experiments/create-experiment-drawer.tsx

vercel Bot deployed to Preview – hash May 25, 2026 23:29 View deployment

vercel Bot deployed to Preview – petrinaut May 25, 2026 23:45 View deployment

github-actions Bot removed the area/infra Relates to version control, CI, CD or IaC (area) label May 25, 2026

graphite-app Bot reviewed May 25, 2026

View reviewed changes

vercel Bot deployed to Preview – petrinaut May 25, 2026 23:50 View deployment

vercel Bot deployed to Preview – hash May 25, 2026 23:53 View deployment

cursor Bot reviewed May 25, 2026

View reviewed changes

vercel Bot deployed to Preview – petrinaut May 26, 2026 00:25 View deployment

vercel Bot deployed to Preview – hash May 26, 2026 00:27 View deployment

Base automatically changed from cm/petrinaut-ai-assistant-mvp to main May 26, 2026 17:11

github-actions Bot added area/deps Relates to third-party dependencies (area) area/infra Relates to version control, CI, CD or IaC (area) type/eng > backend Owned by the @backend team area/apps area/apps > hash.design Affects the `hash.design` design site (app) labels May 26, 2026

kube added 4 commits May 26, 2026 23:03

Fix section dropdown stacking

e13db46

Checkpoint Monte Carlo metrics work

35c8035

Wire Monte Carlo metrics through worker

59f9194

Address Monte Carlo metrics review

65ba869

kube added 2 commits May 26, 2026 23:03

Remove local Monte Carlo docs from PR

e726ebb

Show distribution metric bins on chart click

74a64da

kube force-pushed the cf/fe-711-add-metrics-to-monte-carlo-experiments branch from 1e2973a to 74a64da Compare May 26, 2026 21:03

github-actions Bot removed area/deps Relates to third-party dependencies (area) area/infra Relates to version control, CI, CD or IaC (area) type/eng > backend Owned by the @backend team area/apps area/apps > hash.design Affects the `hash.design` design site (app) labels May 26, 2026

vercel Bot deployed to Preview – petrinaut May 26, 2026 21:08 View deployment

vercel Bot deployed to Preview – hash May 26, 2026 21:11 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FE-711: Add Metrics to Experiments#8751

FE-711: Add Metrics to Experiments#8751
kube wants to merge 6 commits into
mainfrom
cf/fe-711-add-metrics-to-monte-carlo-experiments

kube commented May 25, 2026

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 25, 2026 •

edited

Loading

Uh oh!

augmentcode Bot commented May 25, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphite-app Bot May 25, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 25, 2026

Uh oh!

cursor Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

kube commented May 25, 2026

🌟 What is the purpose of this PR?

🔗 Related links

🚫 Blocked by

🔍 What does this change?

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

📜 Does this require a change to the docs?

🕸️ Does this require a change to the Turbo Graph?

⚠️ Known issues

🐾 Next steps

🛡 What tests cover this?

❓ How to test this?

📹 Demo

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

augmentcode Bot commented May 25, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphite-app Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Progress reports run 0's frame, not global frame

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Worker cancel handler doesn't reset userMetrics array

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 25, 2026 •

edited

Loading

cursor Bot commented May 25, 2026 •

edited

Loading