Skip to content

OCPBUGS-90370: Stabilize telemetry config to prevent continuous console pod rollouts#1173

Open
sg00dwin wants to merge 1 commit into
openshift:mainfrom
sg00dwin:OCPBUGS-90370-configmap-reconcile-loop
Open

OCPBUGS-90370: Stabilize telemetry config to prevent continuous console pod rollouts#1173
sg00dwin wants to merge 1 commit into
openshift:mainfrom
sg00dwin:OCPBUGS-90370-configmap-reconcile-loop

Conversation

@sg00dwin

@sg00dwin sg00dwin commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

Fixes continuous console pod rollouts caused byGetTelemetryConfiguration()producing different ConfigMap content on consecutive sync cycles when telemeter-client availability flickers.

Bug: Users are logged out of the web console every ~5 minutes. The operator creates ~11 replica sets in 50 minutes because the console-config ConfigMap alternates between two states, each triggering a deployment rollout that destroys in-memory sessions.

Root cause: GetTelemetryConfiguration() has an early return when telemeter-client is unavailable that produces a map with a completely different key set than the available path:

  • Unavailable: {CLUSTER_ID, SEGMENT_*, TELEMETER_CLIENT_DISABLED} — skips ORGANIZATION_ID, ACCOUNT_MAIL
  • Available: {CLUSTER_ID, SEGMENT_*, ORGANIZATION_ID, ACCOUNT_MAIL} — skips TELEMETER_CLIENT_DISABLED

When availability flickers (node reboots, monitoring pod restarts), ApplyConfigMap correctly detects the data change, bumps resourceVersion, which changes the deployment annotation, triggering a new ReplicaSet.

Fix:

  • Remove the early return so the function always sets all keys regardless of telemeter-client availability
  • Make GetAccessToken errors non-fatal (log + continue with empty token) so disconnected clusters don't fail the sync

Result: The telemetry config map always contains the same key set. Values may differ (empty vs populated), but key set stability means the ConfigMap only changes when values actually change.

Reviewer note

With this change, when telemeter-client is unavailable the console-config ConfigMap will include ORGANIZATION_ID: "" and ACCOUNT_MAIL: "" (empty strings) rather than omitting those keys entirely. These values are passed through to the console frontend (bridge) as telemetry config. Empty strings should be functionally equivalent to absent keys — the frontend won't send those fields in telemetry events — but I'd appreciate a reviewer familiar with the bridge telemetry consumption confirming that's harmless.

Testing

  • Added 3 test functions (6 cases) covering all combinations of telemeter availability and cloud auth presence
  • make test-unit passes
  • make check (gofmt + govet) clean

Assisted by: Claude (Opus 4.6)

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved telemetry configuration generation so it no longer stops early when the telemetry client isn’t available. The telemetry enable/disable flag is always set appropriately, and configuration continues gracefully even if access-token retrieval fails, while still attempting to populate organization/account fields.
  • Tests

    • Added/expanded telemetry configuration tests to verify stable returned keys when telemetry availability changes and to confirm disconnected/no-auth scenarios complete successfully without errors.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@sg00dwin: This pull request references Jira Issue OCPBUGS-90370, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Fixes continuous console pod rollouts caused byGetTelemetryConfiguration()producing different ConfigMap content on consecutive sync cycles when telemeter-client availability flickers.

Bug: Users are logged out of the web console every ~5 minutes. The operator creates ~11 replica sets in 50 minutes because the console-config ConfigMap alternates between two states, each triggering a deployment rollout that destroys in-memory sessions.

Root cause: GetTelemetryConfiguration() has an early return when telemeter-client is unavailable that produces a map with a completely different key set than the available path:

  • Unavailable: {CLUSTER_ID, SEGMENT_*, TELEMETER_CLIENT_DISABLED} — skips ORGANIZATION_ID, ACCOUNT_MAIL
  • Available: {CLUSTER_ID, SEGMENT_*, ORGANIZATION_ID, ACCOUNT_MAIL} — skips TELEMETER_CLIENT_DISABLED

When availability flickers (node reboots, monitoring pod restarts), ApplyConfigMap correctly detects the data change, bumps resourceVersion, which changes the deployment annotation, triggering a new ReplicaSet.

Fix:

  • Remove the early return so the function always sets all keys regardless of telemeter-client availability
  • Make GetAccessToken errors non-fatal (log + continue with empty token) so disconnected clusters don't fail the sync

Result: The telemetry config map always contains the same key set. Values may differ (empty vs populated), but key set stability means the ConfigMap only changes when values actually change.

Reviewer note

With this change, when telemeter-client is unavailable the console-config ConfigMap will include ORGANIZATION_ID: "" and ACCOUNT_MAIL: "" (empty strings) rather than omitting those keys entirely. These values are passed through to the console frontend (bridge) as telemetry config. Empty strings should be functionally equivalent to absent keys — the frontend won't send those fields in telemetry events — but I'd appreciate a reviewer familiar with the bridge telemetry consumption confirming that's harmless.

Testing

  • Added 3 test functions (6 cases) covering all combinations of telemeter availability and cloud auth presence
  • make test-unit passes
  • make check (gofmt + govet) clean

Assisted by: Claude (Opus 4.6)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3c1125fc-e8a2-4372-be25-d386e7967190

📥 Commits

Reviewing files that changed from the base of the PR and between ce92dda and cc61faf.

📒 Files selected for processing (2)
  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • openshift/console (manual)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
📜 Recent review details
🧰 Additional context used
🔀 Multi-repo context openshift/console

Linked repositories findings

Based on my cross-repository exploration of openshift/console, I've identified critical consumer patterns for the telemetry ConfigMap changes:

openshift/console

Frontend telemetry consumption patterns:

  1. Segment Analytics [::openshift/console::] — frontend/packages/console-dynamic-plugin-sdk/src/api/segment-analytics.ts

    • Explicitly checks: window.SERVER_FLAGS.telemetry?.TELEMETER_CLIENT_DISABLED === 'true'
    • This boolean check expects TELEMETER_CLIENT_DISABLED to be a string value
    • The PR ensures this key is always present with string values ("true" or "false"), which is compatible with this check
  2. Telemetry Hook [::openshift/console::] — frontend/packages/console-shared/src/hooks/useTelemetry.ts

    • Safely accesses: window.SERVER_FLAGS.telemetry?.ORGANIZATION_ID
    • Safely accesses: window.SERVER_FLAGS.telemetry?.ACCOUNT_MAIL
    • Both use optional chaining and handle undefined values gracefully
    • The PR's change to always include these keys (with empty string values when unavailable) is fully compatible
  3. Backend Configuration [::openshift/console::] — pkg/serverconfig/config.go and pkg/server/server.go

    • Telemetry is managed as a MultiKeyValue map in server configuration
    • Configuration is passed to frontend via SERVER_FLAGS
    • No dependencies on specific key presence/absence patterns detected

Impact Assessment:

No breaking changes detected. The PR's guarantee to always include TELEMETER_CLIENT_DISABLED, ORGANIZATION_ID, and ACCOUNT_MAIL as string keys (with empty values when services unavailable) is:

  • Safe for the segment-analytics boolean string check
  • Safe for optional property access patterns in telemetry consumption
  • Solves the root cause of ConfigMap instability that triggers pod restarts

Walkthrough

GetTelemetryConfiguration in sync_v400.go no longer returns early when the telemeter client is disabled; it now falls through to attempt access-token retrieval (continuing with an empty token on failure) and then calls GetOrganizationMeta to populate ORGANIZATION_ID and ACCOUNT_MAIL. Three new tests and supporting harness code in sync_v400_test.go validate this behavior across telemeter availability and cloud-auth presence states.

Changes

Telemetry Configuration Resilience

Layer / File(s) Summary
Core behavior change: fallthrough and org-meta fetch
pkg/console/operator/sync_v400.go
Removes the return after setting TELEMETER_CLIENT_DISABLED; adds an error-tolerant GetAccessToken call (logs debug and uses empty string on failure), then unconditionally calls GetOrganizationMeta to populate ORGANIZATION_ID and ACCOUNT_MAIL.
Test infrastructure, helpers, and validation
pkg/console/operator/sync_v400_test.go
Extended imports for cache/indexer, JSON marshalling, and OpenShift listers. Added newIndexer and newTestConsoleOperator helpers constructing a fake consoleOperator with cache-backed listers for telemetry ConfigMap, ClusterVersion (test cluster ID), telemeter Deployment (AvailableReplicas dependent on flag), and pull-secret. Added sortedKeys utility for deterministic key ordering in assertions. Added TestGetTelemetryConfiguration_StableKeySet (validates required keys and TELEMETER_CLIENT_DISABLED across telemeter/cloud-auth scenarios), TestGetTelemetryConfiguration_KeySetStableAcrossAvailabilityChange (compares key sets between telemeter-unavailable and telemeter-available with cloud auth, allowing only TELEMETER_CLIENT_DISABLED value to differ), and TestGetTelemetryConfiguration_DisconnectedClusterNoError (verifies no error on disconnected clusters with TELEMETER_CLIENT_DISABLED, ORGANIZATION_ID, and ACCOUNT_MAIL populated).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically references the Jira issue (OCPBUGS-90370) and describes the main fix: stabilizing telemetry config to prevent continuous console pod rollouts, which directly matches the changeset.
Description check ✅ Passed The description provides detailed analysis of the root cause, solution, test coverage, and testing verification (make test-unit and make check), but lacks explicit testing steps and browser conformance coverage as specified in the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names (function titles and sub-tests) are stable, static strings with no dynamic content, generated suffixes, timestamps, UUIDs, or node/namespace names.
Test Structure And Quality ✅ Passed The three new test functions demonstrate strong quality: each tests a focused behavior, all assertions include meaningful diagnostic messages, and tests follow existing table-driven patterns in the...
Microshift Test Compatibility ✅ Passed PR adds only standard Go unit tests (func TestXxx(t *testing.T)), not Ginkgo e2e tests. The custom check applies exclusively to Ginkgo e2e tests, making it not applicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The pull request adds only standard Go unit tests (not Ginkgo e2e tests) in pkg/console/operator/sync_v400_test.go. The SNO compatibility check applies only to Ginkgo e2e tests, which this PR does...
Topology-Aware Scheduling Compatibility ✅ Passed This PR only modifies telemetry configuration logic in GetTelemetryConfiguration() and adds tests; it introduces no deployment manifests, pod scheduling constraints, affinity rules, topology spread...
Ote Binary Stdout Contract ✅ Passed PR modifies operator sync loop code (sync_v400.go and its tests). No process-level code (main, init, TestMain, BeforeSuite, etc.) is present. The sole logging addition is klog.V(4).Infof() within t...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Tests are standard Go unit tests, not Ginkgo e2e tests. They use in-memory test objects, contain no IPv4 assumptions, and require no external connectivity. The check for IPv6-only disconnected CI c...
No-Weak-Crypto ✅ Passed Code review found no weak cryptography usage: no MD5, SHA1, DES, RC4, 3DES, Blowfish, or ECB; no custom crypto implementations; no non-constant-time secret comparisons.
Container-Privileges ✅ Passed PR modifies telemetry config logic (sync_v400.go) and adds tests; no container manifests gain privileged/host settings. Console and downloads deployments maintain secure contexts: runAsNonRoot, all...
No-Sensitive-Data-In-Logs ✅ Passed Logging only captures error messages from GetAccessToken() failures, not the sensitive token itself. Token value never logged; error logs contain generic messages about missing fields or API errors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from TheRealJon and spadgett June 22, 2026 19:29
@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sg00dwin
Once this PR has been reviewed and has the lgtm label, please assign therealjon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/console/operator/sync_v400_test.go`:
- Around line 476-489: The test currently only verifies that specific shared
keys are present in both configurations, but does not assert that the complete
key sets are stable across availability changes. After the existing loop that
checks shared key presence, add an assertion that compares the full normalized
key lists from configUnavailable and configAvailable (after removing the
intentional TELEMETER_CLIENT_DISABLED exception from the unavailable config).
Use the already-computed keysUnavailable and keysAvailable variables to perform
a comprehensive comparison and fail the test if any unexpected keys are missing
or added, ensuring stronger regression coverage for ConfigMap-churn bugs.
- Around line 318-333: The test is ignoring errors from Indexer.Add calls (such
as cvIndexer.Add(cv)) and json.Marshal operations instead of handling them
explicitly. Create a test helper function marked with t.Helper() that wraps
these operations and uses t.Fatalf() to fail the test immediately if an error
occurs, then replace all instances of ignored Indexer.Add errors (lines 318,
333, 350, 378) and json.Marshal errors (lines 366, 374) with calls to this new
helper function. This ensures test setup failures are caught immediately rather
than silently propagating broken state.

In `@pkg/console/operator/sync_v400.go`:
- Around line 488-490: The TELEMETER_CLIENT_DISABLED key in telemetryConfig is
conditionally added only when telemeterClientIsAvailable is false, causing
non-deterministic ConfigMap updates when telemeter availability fluctuates.
Refactor the logic to always include the TELEMETER_CLIENT_DISABLED key in
telemetryConfig, setting its value to "true" when telemeterClientIsAvailable is
false and "false" when it is true. This ensures a stable, deterministic key-set
that prevents ConfigMap churn from telemeter availability oscillations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4f8c4fbd-88b8-48b6-8b25-dcc2325169c7

📥 Commits

Reviewing files that changed from the base of the PR and between 014243c and c309132.

📒 Files selected for processing (2)
  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • openshift/console (manual)
📜 Review details
🧰 Additional context used
📓 Path-based instructions (8)
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*.go: Follow Go coding standards and patterns documented in CONVENTIONS.md
Organize imports according to conventions documented in CONVENTIONS.md
Use gofmt to format Go code with standard formatting
Run go vet checks on all Go packages

Follow Go coding standards and patterns as documented in CONVENTIONS.md, including proper import organization

Organize Go code following the repository structure: main entry point in cmd/console/main.go, API constants in pkg/api/, operator command setup in pkg/cmd/operator/, and version command in pkg/cmd/version/

**/*.go: Use gofmt for formatting Go code
Follow standard Go naming conventions
Group imports in order: standard lib, 3rd party, kube/openshift, internal (marked with comments)
Use meaningful error messages with context in Go code
Set status conditions using status.Handle* functions with type prefixes (*Degraded, *Progressing, *Available, *Upgradeable)
Use typed errors and wrap errors to preserve stack context

**/*.go: Imports must be grouped with comments in the order: standard lib, 3rd party, kube, openshift, and operator (internal)
Wrap errors with context using fmt.Errorf("failed to X: %w", err) pattern to provide meaningful error messages

**/*.go: Do NOT use deprecated ioutil package functions. Use os.ReadFile() instead of ioutil.ReadFile(), os.WriteFile() instead of ioutil.WriteFile(), and io.ReadAll() instead of ioutil.ReadAll() (Go 1.16+)
Do NOT use net.Dial() directly. Use (&net.Dialer{}).DialContext() to support context cancellation
Use %w verb in fmt.Errorf() to wrap errors and maintain error chains, not %v which loses the error chain
Add descriptive context to errors when wrapping them, including relevant identifiers and state information
Use specific error type checks (e.g., apierrors.IsNotFound()) instead of string matching with strings.Contains() on error.Error()
Always propagate the parent context in function calls instead of using context.Background(), to respect parent cont...

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go

⚙️ CodeRabbit configuration file

**/*.go: Review Go code following OpenShift operator patterns.
See CONVENTIONS.md for coding standards and patterns.

Refer to the following skills based on CODE PATTERNS, not just file paths:

Refer to /controller-review when code contains:

  • Controller struct types (e.g., type *Controller struct)
  • func New*Controller( factory functions
  • factory.New().WithFilteredEventsInformers( pattern
  • .ToController( method calls
  • Sync(ctx context.Context, controllerContext factory.SyncContext) methods
  • operatorConfig.Spec.ManagementState checks
  • status.NewStatusHandler or status.Handle* functions

Refer to /sync-handler-review when code contains:

  • Main operator sync functions (e.g., sync_v400.go content)
  • Sequential resource syncing with early returns
  • Incremental reconciliation loops
  • Multiple resourceapply.Apply*() calls in sequence
  • Dependency ordering of ConfigMaps → Secrets → Service Accounts → RBAC → Services → Deployments → Routes
  • Feature gate conditional logic

Refer to /go-quality-review for all Go code to check:

  • Deprecated imports: ioutil.ReadFile, ioutil.WriteFile, ioutil.ReadAll
  • Deprecated patterns: Dial without DialContext
  • Error handling: missing %w in fmt.Errorf
  • Code smells: deep nesting (4+ levels), functions >100 lines
  • Magic values: unexplained numbers/strings
  • Context propagation: context.Background() instead of passed ctx
  • Missing godoc on exported functions

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
{pkg,cmd}/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Use gofmt for code formatting on pkg and cmd directories

{pkg,cmd}/**/*.go: Format code using gofmt -w ./pkg ./cmd
Run go vet checks on all Go packages in ./pkg and ./cmd

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
**/*sync*.go

📄 CodeRabbit inference engine (CONVENTIONS.md)

Implement sync loops (sync_v400) incrementally: start from zero, create/update missing requirements, and return to continue on next loop

**/*sync*.go: Sync handler implementations must use incremental sync pattern where each reconciliation loop returns early on error rather than collecting errors and continuing
Sync resources in dependency order: ConfigMaps/Secrets → Service Accounts → RBAC (Roles, RoleBindings) → Services → Deployments → Routes
Return early on errors to maintain incremental sync behavior instead of continuing execution or logging and proceeding
Use resourceapply.Apply*() functions to handle both resource creation and update operations
Ensure deleted resources are cleaned up from the cluster when removed from config, with proper handling for NotFound errors
Check feature gates before syncing gated resources to ensure conditional resource reconciliation
Status updates must reflect actual reconciliation state by adding conditions as operations complete using status handlers with HandleProgressingOrDegraded and HandleAvailable
Respect ManagementState configurations and implement cleanup logic for Removed state
Avoid mutating live objects; instead build desired state separately before applying
Do not sync all resources regardless of errors; stop reconciliation on dependency failures

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
**/operator/**/*.go

📄 CodeRabbit inference engine (Custom checks)

When deployment manifests, operator code, or controllers are added/modified, ensure they do not introduce scheduling constraints assuming standard HA topology. Check ControlPlaneTopology for SingleReplica/DualReplica/HighlyAvailableArbiter/External modes before applying constraints. Use required anti-affinity with maxUnavailable >= 1 (not maxUnavailable: 0). Cap replica counts to schedulable nodes. Exclude arbiter nodes on TNA. Avoid master nodeSelectors on HyperShift. Use library-go DeploymentController hooks (WithTopologyAwareReplicasHook, WithTopologyAwareSchedulingHook, WithControlPlaneNodeSelectorHook).

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}

⚙️ CodeRabbit configuration file

**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}: Injection prevention (prodsec-skills):

  • SQL: parameterized queries only; no string concatenation
  • Command: no shell=True, os.system, or backtick exec with user input
  • LDAP/XPath: escape special characters in filters
  • Path traversal: canonicalize paths, reject ../
  • Deserialization: no pickle/yaml.load()/eval on untrusted data
  • Prototype pollution: no recursive merge of untrusted objects
  • Validate at trust boundaries with allow-lists, not deny-lists
  • Normalize Unicode and anchor regexes (^$); watch for ReDoS

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
**

⚙️ CodeRabbit configuration file

**: # OpenShift Console Operator - AI Context Hub

This file serves as the central AI documentation hub for the OpenShift Console Operator project. AI assistants (Claude, Cursor, Copilot, CodeRabbit, etc.) use this and the linked documents to understand project context.

Go Version and Dependencies

Go Version and Dependencies

  • Go version: 1.24.0 (toolchain: go1.24.4)
  • Dependency management: Uses go.mod with vendoring
  • Build flags: Use GOFLAGS="-mod=vendor" for builds and tests to ensure vendored dependencies are used
  • Key dependencies: openshift/api, openshift/library-go, k8s.io client libraries
  • Go version: 1.24.0 (toolchain: go1.24.4)
  • Dependency management: Uses go.mod with vendoring
  • Build flags: Use GOFLAGS="-mod=vendor" for builds and tests to ensure vendored dependencies are used
  • Key dependencies: openshift/api, openshift/library-go, k8s.io client libraries

Quick Reference

This Repository

Document Purpose
ARCHITECTURE.md System architecture, components, repository structure
CONVENTIONS.md Go coding standards, patterns, import organization
TESTING.md Testing patterns, commands, debugging
README.md Project README with setup instructions

Console Repository (openshift/console)

For frontend-related guidelines, see the openshift/console repository:

Document Purpose
STYLEGUIDE.md Frontend code style guidelines
INTERNATIONALIZATION.md i18n patterns and translation guidelines
CONTRIBUTING.md Contribution guidelines for the console project

Project Summary

The **console-operator...

Files:

  • pkg/console/operator/sync_v400.go
  • pkg/console/operator/sync_v400_test.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow testing patterns and commands documented in TESTING.md

Follow testing patterns and commands as documented in TESTING.md, including running unit tests with 'make test-unit' and checks with 'make check'

**/*_test.go: Use table-driven tests for comprehensive coverage
Use httptest for HTTP handler testing in Go
Include proper cleanup functions in tests
Test both success and failure paths

Check and handle all errors in tests using t.Fatalf() or t.Errorf(); do not ignore errors with blank identifiers

Files:

  • pkg/console/operator/sync_v400_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Review test code for quality and patterns.

Refer to /unit-test-review when test is in pkg//*_test.go:**

  • Table-driven test structure with test cases
  • Use of go-test/deep for struct comparisons
  • Test naming conventions (TestFunctionName)
  • Error handling with wantErr pattern
  • Edge case coverage (nil, empty, boundary values)
  • Proper assertions with helpful error messages
  • Test isolation (no shared mutable state)

Refer to /e2e-test-review when test contains:

  • framework.MustNewClientset(t, nil) or similar e2e framework usage
  • wait.Poll or wait.PollImmediate patterns
  • retry.RetryOnConflict for updates
  • Cleanup via defer functions
  • Console/operator CR manipulations
  • Test assertions on cluster state

Suggest to use /e2e-test-review when:

  • PR adds new feature requiring e2e coverage
  • Test file is empty or skeleton
  • Comments indicate "TODO: add test"

Review for common issues:

  • Missing cleanup (defer statements)
  • Using time.Sleep instead of wait.Poll
  • Missing context timeouts
  • Vague error messages in assertions
  • Tests without table-driven structure when testing multiple cases
  • Ignoring errors with _
  • Tests without assertions

Files:

  • pkg/console/operator/sync_v400_test.go
pkg/**/*_test.go

📄 CodeRabbit inference engine (.claude/skills/unit-test-review.md)

pkg/**/*_test.go: Go unit tests in pkg/**/*_test.go files should use table-driven test pattern for clarity and completeness with struct containing test cases
Test function names should be descriptive and clearly indicate what is being tested (e.g., TestGetNodeComputeEnvironments, TestNewRouteConfig)
Test case names within table-driven tests should explain the scenario being tested (e.g., 'Custom hostname and TLS secret set'), not be vague (e.g., 'test1')
Use github.com/go-test/deep package for struct comparisons to show exact differences instead of simple equality checks or manual field-by-field comparisons
Test both success and failure paths, including edge cases such as empty inputs (nil, "", empty slices/maps), boundary values (0, -1, max int), missing labels/fields, duplicate values, and large inputs
Organize test code using Arrange-Act-Assert pattern: setup test data in Arrange phase, call the function in Act phase, verify results in Assert phase
Check error presence using pattern (err != nil) != tt.wantErr instead of ignoring errors with underscore or missing error checks
When checking specific error messages, use strings.Contains to verify error content instead of just checking error presence
Extract common test setup into helper functions rather than duplicating setup code across multiple tests
Write specific assertion error messages with context (e.g., 'expected %d nodes, got %d') rather than vague messages (e.g., 'failed')
Inline simple test data in test structs; extract complex fixtures into testdata files or helper functions
Avoid red flags in tests: no table-driven tests for multi-scenario functions, tests without subtests, tests depending on execution order, global mutable state, hardcoded sleeps, tests without assertions, ignoring errors with underscore, testing implementation details instead of behavior, or overly complex test setup

Files:

  • pkg/console/operator/sync_v400_test.go
🔇 Additional comments (1)
pkg/console/operator/sync_v400_test.go (1)

4-22: LGTM!

Comment thread pkg/console/operator/sync_v400_test.go Outdated
Comment thread pkg/console/operator/sync_v400_test.go
Comment thread pkg/console/operator/sync_v400.go Outdated
@sg00dwin

Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@sg00dwin: This pull request references Jira Issue OCPBUGS-90370, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sg00dwin sg00dwin force-pushed the OCPBUGS-90370-configmap-reconcile-loop branch from c309132 to 53adcf0 Compare June 22, 2026 20:14
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@sg00dwin: This pull request references Jira Issue OCPBUGS-90370, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary

Fixes continuous console pod rollouts caused byGetTelemetryConfiguration()producing different ConfigMap content on consecutive sync cycles when telemeter-client availability flickers.

Bug: Users are logged out of the web console every ~5 minutes. The operator creates ~11 replica sets in 50 minutes because the console-config ConfigMap alternates between two states, each triggering a deployment rollout that destroys in-memory sessions.

Root cause: GetTelemetryConfiguration() has an early return when telemeter-client is unavailable that produces a map with a completely different key set than the available path:

  • Unavailable: {CLUSTER_ID, SEGMENT_*, TELEMETER_CLIENT_DISABLED} — skips ORGANIZATION_ID, ACCOUNT_MAIL
  • Available: {CLUSTER_ID, SEGMENT_*, ORGANIZATION_ID, ACCOUNT_MAIL} — skips TELEMETER_CLIENT_DISABLED

When availability flickers (node reboots, monitoring pod restarts), ApplyConfigMap correctly detects the data change, bumps resourceVersion, which changes the deployment annotation, triggering a new ReplicaSet.

Fix:

  • Remove the early return so the function always sets all keys regardless of telemeter-client availability
  • Make GetAccessToken errors non-fatal (log + continue with empty token) so disconnected clusters don't fail the sync

Result: The telemetry config map always contains the same key set. Values may differ (empty vs populated), but key set stability means the ConfigMap only changes when values actually change.

Reviewer note

With this change, when telemeter-client is unavailable the console-config ConfigMap will include ORGANIZATION_ID: "" and ACCOUNT_MAIL: "" (empty strings) rather than omitting those keys entirely. These values are passed through to the console frontend (bridge) as telemetry config. Empty strings should be functionally equivalent to absent keys — the frontend won't send those fields in telemetry events — but I'd appreciate a reviewer familiar with the bridge telemetry consumption confirming that's harmless.

Testing

  • Added 3 test functions (6 cases) covering all combinations of telemeter availability and cloud auth presence
  • make test-unit passes
  • make check (gofmt + govet) clean

Assisted by: Claude (Opus 4.6)

Summary by CodeRabbit

Release Notes

  • Bug Fixes

  • Updated telemetry configuration generation to be more resilient: it now always sets the telemeter enabled/disabled flag based on availability, continues even if access-token retrieval fails, and still attempts to populate organization and account details in degraded states.

  • Tests

  • Expanded telemetry configuration tests to verify a stable set of configuration keys across availability changes and to confirm disconnected/absent-auth scenarios don’t fail while still producing expected telemetry fields.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sg00dwin sg00dwin force-pushed the OCPBUGS-90370-configmap-reconcile-loop branch from 53adcf0 to ce92dda Compare June 22, 2026 20:45
…le pod rollouts

Remove early return in GetTelemetryConfiguration() so the function
always produces the same key set regardless of telemeter-client
availability. Make GetAccessToken errors non-fatal for disconnected
clusters. Add key set stability tests.

Assisted by: Claude (Opus 4.6)
@sg00dwin sg00dwin force-pushed the OCPBUGS-90370-configmap-reconcile-loop branch from ce92dda to cc61faf Compare June 22, 2026 20:54
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@sg00dwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator cc61faf link true /test e2e-aws-operator
ci/prow/e2e-gcp-ovn cc61faf link true /test e2e-gcp-ovn
ci/prow/e2e-aws-console cc61faf link true /test e2e-aws-console

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants