env0 is the first-party mock-environment runtime for agent testing. It provides stateful, deterministic mock services for local development, seed contracts, API-parity checks, dev tooling, and a shared Docker base image.
The repo has two deliberately separate task surfaces:
example_tasks/are small runtime fixtures used to prove env0 service and launcher contracts.tasks/contains selected BenchFlow-native task packages copied frombenchflow-ai/env-0for reference and downstream evaluation.
Canonical benchmark authoring and scoring policy still belong in downstream benchmark repos, not in env0.
Run commands from the env0 repo root. Prerequisites:
- Python 3.12+
uv- Docker daemon for Docker/base-image smoke checks
- free local ports
9001-9005and9060
Run the unit/control smoke:
scripts/smoke_dev.shRender the devhub once without starting services:
python3 devhub/app.py --render-onceStart every configured mock service plus devhub:
scripts/dev.shStop with Ctrl-C. Local DBs and runtime state live under .data/dev/; remove
that directory if you want a clean local-dev state.
Start only services declared by an example task:
scripts/dev.sh task gdrive-archive-stale-draftsOpen devhub:
http://127.0.0.1:9060
The shared base image tag is:
ghcr.io/benchflow-ai/env0:0.1.0
VERSION is the source of truth for the semver tag. Example task Dockerfiles
pin FROM ghcr.io/benchflow-ai/env0:<VERSION>.
Build locally:
docker/build-base.shValidate example task images against the locally built base:
PULL_BASE=0 scripts/smoke_docker_examples.shPush release tags only when the GHCR package exists and the maintainer account has package-write permission:
docker/build-base.sh --pushRelease checklist:
- Bump
VERSIONif the base image contract changed. - Run
scripts/smoke_dev.sh. - Run changed env tests, for example
cd packages/environments/mock-gdrive && uv run --extra dev pytest tests -q. - Build locally with
docker/build-base.sh. - Run
PULL_BASE=0 scripts/smoke_docker_examples.sh. - Push with
docker/build-base.sh --pushif package permissions are configured. - Validate remote pull with
docker pull ghcr.io/benchflow-ai/env0:$(cat VERSION)only after the push succeeds.
env0/
├── packages/environments/mock-gmail/
├── packages/environments/mock-gcal/
├── packages/environments/mock-gdoc/
├── packages/environments/mock-gdrive/
├── packages/environments/mock-slack/
├── docker/
├── devhub/
├── docs/
├── example_tasks/
├── scripts/
├── tests/
├── config.toml
└── VERSION
- Service metadata comes from
config.toml. - Service ids and CLIs are canonical
mock-*names. - Service URLs use canonical
MOCK_*_URLenv vars. - Task service declaration uses
task.toml [environment] services = [...]. - Task Dockerfiles are thin and inherit from
ghcr.io/benchflow-ai/env0:<VERSION>. - Hidden task payload lives under
/var/lib/task. - Task-aware seeding uses internal
--task-data+--task-nameplumbing. - Dev/user UX stays task-name based:
scripts/dev.sh task <name>.
Current implementation note: config.toml is the source of truth for runtime
metadata, but scripts/smoke_docker_examples.sh and docker/gws-wrapper.sh
still contain small service maps and must be kept in sync when adding services.
- Docs index
- Local dev and devhub
- Good first contributions
- Adding a new environment
- API validation playbook
- Parity audit
- Validated workflows
example_tasks/ currently covers:
email-confidential-forwardgdoc-search-keyword-indexgdrive-archive-stale-draftsmulti-mail-cal-syncmulti-misread-approval-scope
These examples are env0 fixtures/templates, not source-of-truth task definitions.
env0 is licensed under the GNU Affero General Public License v3.0 only
(AGPL-3.0-only). See LICENSE.
You may self-host, use, modify, and redistribute env0 under the terms of the AGPL. BenchFlow also offers an official hosted env0 service.