Skip to content

chore: sync public mirror from internal#781

Open
haasonsaas wants to merge 4 commits into
mainfrom
sync/public-release-mirror
Open

chore: sync public mirror from internal#781
haasonsaas wants to merge 4 commits into
mainfrom
sync/public-release-mirror

Conversation

@haasonsaas

@haasonsaas haasonsaas commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • sync the sanitized public tree from evalops/maestro-internal
  • keep evalops/maestro as a generated public mirror of the private source of truth
  • preserve public-owned CI and trusted-publishing workflows from the public checkout
  • internal source SHA: 9e0c36821d05d08fd2dcfd45b480b06f9c07099e
  • last generated public sync base: e18fb040ace9cad9fbfcb81903938a54cf5825d0
  • previewed public-tree drift: 25 file(s) to copy/update and 0 stale file(s) to delete
  • public-only commits since last generated sync: 0

Source-of-truth status

Public Mirror Drift Audit

  • package: @evalops/maestro
  • private source: https://github.com/evalops/maestro-internal@main (9e0c36821d05)
  • public projection: https://github.com/evalops/maestro@main (e18fb040ace9)
  • files to copy or update: 25
  • stale files to delete: 0
  • result: drift detected
  • invariant: public_projection_has_drift

Sample Changed Paths

  • copy/update packages/core/src/sandbox/daytona-sandbox.ts
  • copy/update src/guardian/runner.ts
  • copy/update src/safety/validators/network-policy-validator.ts
  • copy/update src/sandbox/docker-sandbox.ts
  • copy/update src/sandbox/local-sandbox.ts
  • copy/update src/sandbox/native-sandbox.ts
  • copy/update src/sandbox/output-capture.ts
  • copy/update src/sandbox/types.ts
  • copy/update src/tools/bash.ts
  • copy/update src/tools/gh-helpers.ts
  • copy/update src/tools/gh.ts
  • copy/update src/utils/url-extractor.ts
  • copy/update test/guardian/guardian-runner.test.ts
  • copy/update test/packages/core/daytona-sandbox-edge-cases.test.ts
  • copy/update test/packages/core/daytona-sandbox.test.ts
  • copy/update test/safety/network-policy-validator.test.ts
  • copy/update test/sandbox-integration.test.ts
  • copy/update test/sandbox/docker-sandbox.test.ts
  • copy/update test/sandbox/local-sandbox.test.ts
  • copy/update test/sandbox/native-sandbox-max-buffer.test.ts
  • copy/update test/sandbox/native-sandbox.test.ts
  • copy/update test/tools/bash.test.ts
  • copy/update test/tools/gh-helpers.test.ts
  • copy/update test/tools/gh.test.ts
  • copy/update test/utils/url-extractor.test.ts

Guidance

Let internal main generate and merge the public sync PR before relying on public main.

Drift sample

  • copy/update packages/core/src/sandbox/daytona-sandbox.ts
  • copy/update src/guardian/runner.ts
  • copy/update src/safety/validators/network-policy-validator.ts
  • copy/update src/sandbox/docker-sandbox.ts
  • copy/update src/sandbox/local-sandbox.ts
  • copy/update src/sandbox/native-sandbox.ts
  • copy/update src/sandbox/output-capture.ts
  • copy/update src/sandbox/types.ts
  • copy/update src/tools/bash.ts
  • copy/update src/tools/gh-helpers.ts
  • copy/update src/tools/gh.ts
  • copy/update src/utils/url-extractor.ts
  • copy/update test/guardian/guardian-runner.test.ts
  • copy/update test/packages/core/daytona-sandbox-edge-cases.test.ts
  • copy/update test/packages/core/daytona-sandbox.test.ts
  • copy/update test/safety/network-policy-validator.test.ts
  • copy/update test/sandbox-integration.test.ts
  • copy/update test/sandbox/docker-sandbox.test.ts
  • copy/update test/sandbox/local-sandbox.test.ts
  • copy/update test/sandbox/native-sandbox-max-buffer.test.ts

Public-only commits since last generated sync

  • none detected since last generated sync

Validation

  • generated by the sync-public-release-mirror workflow in public-tree mode

Test Plan

  • generated by the sync-public-release-mirror workflow in public-tree mode
  • public-source-provenance require-internal-pr check confirms internal source PR lineage
  • CI, integration, rust-hosted-conformance, coverage, Socket, and Cursor checks must pass before merge

Staged Rollout

  • Staging is unnecessary for this generated mirror PR: it does not independently promote user-visible behavior. It mirrors already-reviewed internal source from evalops/maestro-internal@9e0c36821d05d08fd2dcfd45b480b06f9c07099e, including existing hidden/evaluation surfaces, and keeps public package parity behind the established public-source-provenance gate.

Supersedes

@cursor

cursor Bot commented Jun 11, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Touches command execution, cancellation, and enterprise network/guardian enforcement—areas where parsing mistakes or abort bugs could block legitimate work or miss bypass attempts.

Overview
This sync brings sandbox execution and enterprise safety changes from the internal tree into the public mirror.

Sandbox exec / execWithArgs now accept optional AbortSignal, cap streamed output via shared output-capture, and are implemented across local, Docker, native, and Daytona backends. Docker aborts forward into the container process group; Daytona uses session APIs when available for cancellable runs with timeouts and output limits. bash passes abort signals through to sandbox exec.

GitHub CLI tools stop shell-quoting full commands: they run gh via argv (with execpolicy, safe-mode checks, timeouts, truncation, and sandbox probes) instead of routing everything through bash.

Enterprise network policy gains a much deeper shell parser in url-extractor (wrappers, substitutions, git/ssh targets) plus fail-closed handling for opaque network commands. The validator unions recursive URL scans with token-aware extraction and tightens host matching (trailing dots, IPv6).

Guardian shouldGuardCommand replaces regex-only detection with tokenization and nested-shell expansion (sudo, sh -c, ssh, eval, substitutions) so wrapped git commit/push and rm -r triggers are harder to hide; inline MAESTRO_GUARDIAN=0 in the command string no longer disables guarding (env override still applies).

Reviewed by Cursor Bugbot for commit e36ce74. Bugbot is set up for automated code reviews on this repo. Configure here.

@haasonsaas haasonsaas force-pushed the sync/public-release-mirror branch from d49c2cc to 2a4c56c Compare June 11, 2026 23:43

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Docker abort ignores container process
    • DockerSandbox now records the in-container child PID, forwards aborts with a second docker exec, and routes abortable exec() calls through the same cancellation path.

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 2a4c56c. Configure here.

Comment thread src/sandbox/docker-sandbox.ts

@haasonsaas haasonsaas left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Hermes automated security scan flagged this PR.

🟠 Unsafe patterns (review):

  • src/tools/bash.ts — exec(): const result = await sandbox.exec(interpolatedCommand, cwd, env, signal);
  • test/packages/core/daytona-sandbox.test.ts — exec(): const result = await sandbox.exec(
  • test/sandbox/native-sandbox-max-buffer.test.ts — exec(): const promise = sandbox.exec("gh api");

Automated gitleaks + pattern scan. Dismiss this review if it's a false positive.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 610758ac5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/safety/validators/network-policy-validator.ts Outdated
… policy

Restore the recursive URL scan over the full bash args (including the
command string) alongside the bash-token aware extractor. The
token-aware path alone misses URLs embedded mid-string in shell
arguments (e.g. curl "see https://... here", echo "https://...",
heredocs), which let enterprise network policy be bypassed. Union both
scans so neither path can be evaded independently.

Addresses Codex P1 finding on PR #781.
@haasonsaas

Copy link
Copy Markdown
Contributor Author

Addressed the Codex P1 in cef6e3b: restored the recursive URL scan over the bash args (including the command string) while keeping the bash-token aware extractor. The two scans are now unioned and deduped, so URLs embedded mid-string (curl "see https://...", echo "https://...", heredocs) or referenced from runtimes the token extractor doesn't enumerate (node -e, python -c) can no longer bypass enterprise network policy.

Updated two existing tests that asserted echo https://evil.com was allowed under blockedHosts — under the stricter union posture those URLs are now blocked. Added an explicit regression test covering curl/echo mid-string and heredoc cases. Note: this fix lives only in the public mirror; the same change should be applied upstream in evalops/maestro-internal so the next sync doesn't re-introduce the regression.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cef6e3b80e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/utils/url-extractor.ts
…ions

OpenSSH lets `-o HostName=evil.com github.com`, `-J evil.com github.com`,
or `-W evil.com:22 jumpbox` override the positional host, so the real
connection goes somewhere policy never sees. Walk these options, extract
the destination host (or recurse into ProxyCommand/RemoteCommand shell),
and return them alongside the positional target so callers re-validate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e36ce74159

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (lowered === "proxycommand" || lowered === "remotecommand") {
// Commands are opaque arbitrary shell — recursively extract any hosts
// they reference so policy enforcement can still see them.
return extractUrlsFromShellCommand(trimmed);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject opaque SSH proxy commands

Fresh evidence after the SSH-option handling change: this branch treats ProxyCommand/RemoteCommand by returning only URLs extracted from the option value, so an opaque proxy such as ssh -o ProxyCommand='nc $TARGET 22' 127.0.0.1 yields no option target and checkNetworkPolicy(..., {allowedHosts:['127.0.0.1']}) allows it after validating only the positional host. OpenSSH confirms this option is active (ssh -G ... prints proxycommand nc $TARGET 22), so enterprise allowed/blocked-host policy can be bypassed whenever the proxy command's host is variable or otherwise not statically parsed; make these option values fail closed when extraction finds no validatable target.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants