fix(cli): skip identity stitch in CI to stop ephemeral-env identify spam#5366
fix(cli): skip identity stitch in CI to stop ephemeral-env identify spam#5366pamelachia wants to merge 3 commits into
Conversation
The OnGotrueID hook in cmd/root.go calls StitchLogin once per process when NeedsIdentityStitch() returns true. SaveState persists distinct_id to ~/.supabase/telemetry.json synchronously, which works fine on a stable machine. In CI runners, Docker, and npx wrappers the home directory is wiped between invocations, so every fresh process sees an empty DistinctID and re-stitches. Daily $identify volume from posthog-go went from ~15K to ~640K/day after the Go CLI's first credentialed deploy and kept growing. Gate NeedsIdentityStitch on !isCI so the auto-stitch from the X-Gotrue-Id response header is suppressed in CI. canSend() is left alone, so cli_* capture events (cli_command_executed, cli_stack_started, cli_project_linked) still fire from CI, preserving the 31-85% of CLI usage that runs in CI and the dashboards built on it. login.go calls StitchLogin directly without the guard, so an explicit supabase login still identifies in CI.
Coverage Report for CI Build 26518644545Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-0.01%) to 63.747%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions12 previously-covered lines in 2 files lost coverage.
Coverage Stats
💛 - Coveralls |
|
Hey @pamelachia thanks for that. As we're porting commands, we also have the same telemetry implementation on the TypeScript side. Can you also apply this change to the telemetry implementation we're using in TypeScript so it's 1:1 for ported and non ported commands? |
seanoliver
left a comment
There was a problem hiding this comment.
LGTM! Left one non-blocking question inline!
|
|
||
| func (s *Service) NeedsIdentityStitch() bool { | ||
| return s != nil && s.state.DistinctID == "" && s.canSend() | ||
| return s != nil && s.state.DistinctID == "" && s.canSend() && !s.isCI |
There was a problem hiding this comment.
Do we only need to suppress auto-stitch for detected CI here? The PR description also calls out Docker/npx-style ephemeral homes, and those won't necessarily set isCI, so they could keep aliasing/identifying once per invocation.
(Not blocking IMO, this is probably fine if the observed spike is CI-only.)
What
Gate the auto-stitch path on
!isCIso CI runners and other ephemeral environments stop firing$identify/$create_aliasonce per CLI invocation.canSend()is unchanged, socli_*capture events still fire from CI.login.gocallsStitchLogindirectly without this guard, so an explicitsupabase loginstill identifies in CI.Why
PostHog alert "total identify events increase" fired 2026-05-22 (244.5% day-over-day). I traced it to the Go CLI's first credentialed production deploy (#5329 at 2026-05-21 08:24 UTC, the binary that combined #5054's identity-stitch logic with #5314's credential wiring). Hour-by-hour change-point matches that deploy within minutes; the spike is 100% from
$lib = 'posthog-go'.The persistence code is correct.
SaveStatewrites~/.supabase/telemetry.jsonsynchronously after a successful stitch. The break is environmental: CI runners, Docker containers, andnpx supabasewrappers wipe the home directory between invocations, so every fresh process re-stitches.Cohort breakdown over 6 days post-deploy:
Single-day users look like CI runs. The daily, 96-identifies-per-user cohort looks like engineers whose own CI runs many Supabase workflows.
Daily
posthog-go $identifyvolume went from ~15K to 638K/day and was still growing.Why not gate
canSend()itselfcli_*capture events are heavily used:cli_stack_startedcli_project_linkedcli_command_executedcli_login_completedSix existing PostHog insights consume
cli_command_executed, including the Agent-Led Growth dashboards. Killing CI capture would blind us to the dominant CLI use case.Identity stitches in CI have no analytical value because each ephemeral run mints a fresh
device_idand immediately discards it. Capture events ARE valuable because theis_ciproperty already segments them cleanly in PostHog.Test plan
TestServiceNeedsIdentityStitchadds a subtest coveringIsCI: true(passes locally)internal/telemetry/...package tests passgofmt -dclean,go vet ./internal/telemetry/...cleanposthog-go $identifydaily volume drop back toward the pre-spike ~15K/day baseline within a day of releaseLinked
GROWTH-886