[NET-847] [Alert i6JcGq] getblock_solana-mainnet_Hotblocks_Block_Not_Updating#491
Open
elina-chertova wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated fix proposal for alert
i6JcGq.open-beta/app/data/investigations/i6JcGq/app/data/investigations/i6JcGq/report.htmlReviewer quick view
FINAL RESPONSE — getblock_solana-mainnet_Hotblocks_Block_Not_Updating (i6JcGq)
Track A — Safe action now
Verdict: accept — the implementer's patch is correct, minimal, and addresses the confirmed root cause.
Root cause
Helm template scope bug in
deployments/charts/hotblocks-service/templates/deployment.yaml[evidence].Inside
{{- range $key, $provider := .Values.providers }}, the two-variable range form does not rebind.to the current element —.remains bound to root.Values. The condition guarding--geyser-x-access-token:evaluates
.geyser_x_access_tokenat root scope → alwaysnil→ always skipped. The siblinggeyser_x_tokenblock correctly uses$provider.geyser_x_tokenas its condition — this is a copy-paste defect [evidence].Effect chain:
getblock-geysersidecar (yellowstone-geyser-proxy) starts without--geyser-x-access-token→ GetBlock Geyser endpoint rejects the unauthenticated Yellowstone subscription → no blocks stream to the hotblocks-service container →sqd_hotblocks_head_slotnever increments → alert fires after the configuredfor:window.Evidence table
up{job="solana-getblock-solana-mainnet-hotblocks-service"}44dae858sqd_hotblocks_head_slot{namespace="solana-hotblocks"}be4b5af39ba23f58getblock.yamlgeyser_x_access_token0ffeac34c75e4264984011f97de8693c— token IS presentdeployment.yamlif-condition{{ if .geyser_x_access_token }}— wrong scope24ba2c60Proposed patch (already staged)
File:
deployments/charts/hotblocks-service/templates/deployment.yamlStaged at:
fixes/proposed/deployments/charts/hotblocks-service/templates/deployment.yaml— confirmed correct at line 94 of the proposed file.Safety: The change is inert for every provider that does not set
geyser_x_access_token. Onlygetblock(solana-mainnet) is currently affected. All other providers keep identical rendered output.Recovery signal
sqd_hotblocks_head_slot{job="solana-getblock-solana-mainnet-hotblocks-service"}must show advancing values within 5–10 minutes of ArgoCD-triggered pod restart. If still flat at the 10-minute mark: token may be expired — verifygeyser_x_access_tokenvalue with GetBlock dashboard, not a code issue.Rejected alternatives
ifcondition from values; template must be patched.Track B — Root cause status
Root cause confirmed by: static analysis of
deployment.yaml:94(wrong scopeif) +getblock.yaml(token present, value confirmed) + Grafana metric (sqd_hotblocks_head_slot= 0 data points for 6 h, podup=1) [evidence].No further investigation needed. The execution plan stop condition (Lane C-equivalent: template analysis confirms arg is absent from rendered spec) is satisfied.
Observability debt (do not PR)
The
getblock-geysersidecar has no authentication-failure log extraction in the current evidence pipeline — a single log line fromsqd_yellowstone_geyser_proxyatRUST_LOG=debugwould make future auth regressions immediately obvious without needing template archaeology.Fix metadata
(Generated by the terminal-debate agent — values reflect the agent's self-assessment, not a verified verdict. Use them as a starting point for review.)
Summary
FINAL RESPONSE — getblock_solana-mainnet_Hotblocks_Block_Not_Updating (i6JcGq)
Track A — Safe action now
Verdict: accept — the implementer's patch is correct, minimal, and addresses the confirmed root cause.
Root cause
Helm template scope bug in
deployments/charts/hotblocks-service/templates/deployment.yaml[evidence].Inside
{{- range $key, $provider := .Values.providers }}, the two-variable range form does not rebind.to the current element —.remains bound to root.Values. The condition guarding--geyser-x-access-token:evaluates
.geyser_x_access_tokenat root scope → alwaysnil→ always skipped. The siblinggeyser_x_tokenblock correctly uses$provider.geyser_x_tokenas its condition — this is a copy-paste defect [evidence].Effect chain:
getblock-geysersidecar (yellowstone-geyser-proxy) starts without--geyser-x-access-token→ GetBlock Geyser endpoint rejects the unauthenticated Yellowstone subscription → no blocks stream to the hotblocks-service container →sqd_hotblocks_head_slotnever increments → alert fires after the configuredfor:window.Evidence table
up{job="solana-getblock-solana-mainnet-hotblocks-service"}44dae858sqd_hotblocks_head_slot{namespace="solana-hotblocks"}be4b5af39ba23f58getblock.yamlgeyser_x_access_token0ffeac34c75e4264984011f97de8693c— token IS presentdeployment.yamlif-condition{{ if .geyser_x_access_token }}— wrong scope24ba2c60Proposed patch (already staged)
File:
deployments/charts/hotblocks-service/templates/deployment.yamlStaged at:
fixes/proposed/deployments/charts/hotblocks-service/templates/deployment.yaml— confirmed correct at line 94 of the proposed file.Safety: The change is inert for every provider that does not set
geyser_x_access_token. Onlygetblock(solana-mainnet) is currently affected. All other providers keep identical rendered output.Recovery signal
sqd_hotblocks_head_slot{job="solana-getblock-solana-mainnet-hotblocks-service"}must show advancing values within 5–10 minutes of ArgoCD-triggered pod restart. If still flat at the 10-minute mark: token may be expired — verifygeyser_x_access_tokenvalue with GetBlock dashboard, not a code issue.Rejected alternatives
ifcondition from values; template must be patched.Track B — Root cause status
Root cause confirmed by: static analysis of
deployment.yaml:94(wrong scopeif) +getblock.yaml(token present, value confirmed) + Grafana metric (sqd_hotblocks_head_slot= 0 data points for 6 h, podup=1) [evidence].No further investigation needed. The execution plan stop condition (Lane C-equivalent: template analysis confirms arg is absent from rendered spec) is satisfied.
Observability debt (do not PR)
The
getblock-geysersidecar has no authentication-failure log extraction in the current evidence pipeline — a single log line fromsqd_yellowstone_geyser_proxyatRUST_LOG=debugwould make future auth regressions immediately obvious without needing template archaeology.Risk & rollout
Alternatives considered
Listed as a record of the agent's debate — re-evaluate these if the current fix does not bring the signal back to steady state.
Reproduction status
Incident behavior was reproduced or corroborated strongly enough for a non-hypothesis fix proposal.
Validation checklist
Changed files
solana/solana-data-service/src/data-source/geyser-setup.tssolana/solana-data-service/src/data-source/geyser.tsNotify
cc @tmcgroul (automation opened this PR.)