This file provides guidance to AI coding agents when working with code in this repository.
Cortex is a horizontally scalable, highly available, multi-tenant, long-term storage solution for Prometheus metrics. It uses a microservices architecture with components that can run as separate processes or as a single binary.
make # Build all (runs in Docker container by default)
make BUILD_IN_CONTAINER=false # Build locally without Docker
make exes # Build binaries only
make protos # Generate protobuf files
make lint # Run all linters (golangci-lint, misspell, etc.)
make doc # Generate config documentation (run after changing flags/config)
make ./cmd/cortex/.uptodate # Build Cortex Docker image for integration testsGo modules are vendored in the vendor/ folder. When upgrading a dependency or component:
go get github.com/some/dependency@version # Update go.mod
go mod vendor # Sync vendor folder
go mod tidy # Clean up go.mod/go.sumImportant: Always check the vendor/ folder for upstream library code (e.g., vendor/github.com/prometheus/alertmanager/ for Alertmanager internals). Do not modify vendored code directly.
go test -timeout 2400s -tags "netgo slicelabels" ./... # Run tests with CI configurationIntegration tests require Docker and the Cortex image to be built first:
make ./cmd/cortex/.uptodate # Build Cortex Docker image first
# Run all integration tests
go test -v -tags=integration,requires_docker,integration_alertmanager,integration_memberlist,integration_querier,integration_ruler,integration_query_fuzz ./integration/...
# Run a specific integration test
go test -v -tags=integration,integration_ruler -timeout 2400s -count=1 ./integration/... -run "^TestRulerAPISharding$"Environment variables for integration tests:
CORTEX_IMAGE- Docker image to test (default:quay.io/cortexproject/cortex:latest)E2E_TEMP_DIR- Directory for temporary test files
Use goimports with Cortex-specific import grouping:
goimports -local github.com/cortexproject/cortex -w ./path/to/file.goImport order: stdlib, third-party packages, internal Cortex packages (separated by blank lines).
- Distributor (stateless) - Receives samples via remote write, validates, distributes to ingesters using consistent hashing
- Ingester (semi-stateful) - Stores samples in memory, periodically flushes to long-term storage (TSDB blocks)
- Querier (stateless) - Executes PromQL queries across ingesters and long-term storage
- Query Frontend (optional, stateless) - Query caching, splitting, and queueing
- Query Scheduler (optional, stateless) - Moves queue from frontend for independent scaling
- Compactor (stateless) - Compacts TSDB blocks in object storage
- Store Gateway (semi-stateful) - Queries blocks from object storage
- Ruler - Executes recording rules and alerts
- Alertmanager - Multi-tenant alert routing
- Configs API - Configuration management
- Hash Ring - Consistent hashing via Consul, Etcd, or memberlist gossip for data distribution
- Multi-tenancy - Tenant isolation via
X-Scope-OrgIDheader - Blocks Storage - TSDB-based storage with 2-hour block ranges, stored in S3/GCS/Azure/Swift
cmd/cortex/main.go- Main Cortex binarypkg/cortex/cortex.go- Service orchestration and configuration
- No global variables - Use dependency injection
- Metrics: Register with
promauto.With(reg), never use global prometheus registerer - Config naming: YAML uses
snake_case, CLI flags usekebab-case - Logging: Use
github.com/go-kit/log(notgithub.com/go-kit/kit/log)
- Sign commits with DCO:
git commit -s -m "message" - Run
make docif config/flags changed - Include CHANGELOG entry for user-facing changes
When asked to investigate a CI build failure:
- Fetch job details using
gh api repos/cortexproject/cortex/actions/runs/<run-id>/jobsto identify failed jobs and steps. Note:gh run view --logonly works after the entire run completes, not just individual jobs. - Get annotations using
gh api repos/cortexproject/cortex/check-runs/<job-id>/annotationsto surface error messages when full logs are unavailable. - Fetch full logs once the run completes using
gh run view <run-id> --job <job-id> --log. - Determine root cause — distinguish between infrastructure failures (e.g., Docker Hub rate limits, runner issues) and code-related failures (e.g., test regressions). Use
git logandgh pr diffto check if the failure relates to the PR's changes. - For flaky/infrastructure issues, use
git logandgit log -pto trace when the failing code was introduced and which PR added it. - File GitHub issues with the user's permission, including:
- The full error output in a
<details>block (since job links can expire) - Root cause analysis
- Which PR introduced the issue (but do not assign or tag individuals without the user's approval)
- Proposed solutions
- The full error output in a
This file (AGENTS.md) provides technical guidance to AI coding agents working in this repository (build commands, architecture, conventions). For the policy governing human use of AI tools when preparing contributions, see GENAI_POLICY.md.