OCPBUGS-77344: enable dcm on route changes by jcmoraisjr · Pull Request #778 · openshift/router

jcmoraisjr · 2026-05-25T17:59:29Z

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Summary by CodeRabbit

New Features
- Added error-detection helpers for HAProxy server management.
- Added in-place update vs replace flow for backend endpoints to minimize disruptive changes.
Bug Fixes
- Improved health-check handling: disabling checks explicitly marks servers healthy and supports reconfiguration when checks are missing.
- More robust dynamic endpoint reconciliation with better handling of protocol/port/weight changes.
Refactor
- Simplified server add/update/delete workflows and centralized endpoint table management for dynamic route updates.

openshift-ci-robot · 2026-05-25T17:59:35Z

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-25T17:59:42Z

Walkthrough

Router and config-manager now track per-endpoint Weight and VerifyHostname. Backend APIs use endpoint pointers and implement update/replace semantics, health-check enable/disable and explicit health-setting commands. Router rebuilds endpoint tables and performs dynamic in-place updates when compatible; ReplaceRouteEndpoints diffs endpoints by ID and applies add→update→delete.

Changes

Endpoint-Driven Server Management Refactor

Layer / File(s)	Summary
Type system and ConfigManager interface updates `pkg/router/template/types.go`	`Endpoint` gains `Weight` and `VerifyHostname`. `ConfigManager.ReplaceRouteEndpoints` final parameter changes from `weight int32` to `activeEndpoints int`.
Router endpoint table and dynamic route logic `pkg/router/template/router.go`	Adds `updateEndpointTable`; refactors dynamic add/update/replace/remove to use stored EndpointTable, introduces `dynamicallyUpdateRoute` for in-place updates, and persists endpoint-table changes during service-weight redistribution.
Config manager reconciliation and deletion `pkg/router/template/configmanager/haproxy/manager.go`	`ReplaceRouteEndpoints` now diffs endpoints by ID and deep-equality, applies add→update→delete order, aggregates errors, and performs scale-out health-check enablement with fallback replace when HAProxy reports checks unconfigured. Deletion calls pass endpoint pointers.
HAProxy backend server management API refactor `pkg/router/template/configmanager/haproxy/backend.go`	`AddServer`, `UpdateServer`, `ReplaceServer`, `DeleteServer`, and health-check APIs accept `*Endpoint`. Adds error classifiers, derives drain from `Endpoint.Weight`, adds `innerSetServerHealth` and `apiSetServerHealth`, and updates execCommand validation.
Backend dynamic-update tests and fixtures `pkg/router/template/configmanager/haproxy/backend_test.go`, `pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go`	`TestBackendDynamicUpdate` expanded with old/new endpoint pairs, replace command coverage, h2c variants, and failure-mode expectations. Test harness builds ServiceAliasConfig/ServiceUnit and uses new backend signatures; fakeConfigManager signature updated.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

openshift/router#763: Related prior work on HAProxy dynamic backend add/delete and backend update/replace flows.

Suggested reviewers

gcs278

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	❓ Inconclusive	The custom check is for Ginkgo test code (BeforeEach, It blocks), but this PR uses standard Go testing with testify assertions, not Ginkgo.	Clarify whether check applies to all test code or only Ginkgo tests. This PR uses Go testing.T, not Ginkgo.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title references a Jira issue (OCPBUGS-77344) and mentions enabling DCM on route changes, which aligns with the PR's primary objective to enable dynamic configuration management for route updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	The codebase uses the standard Go testing framework (testing.T), not Ginkgo. The check is specifically for Ginkgo test names, making it inapplicable to this PR.
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests are added in this PR—all 32 new test files are standard Go unit tests, so the MicroShift test compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR contains only unit tests using Go's standard testing package; no Ginkgo e2e tests (It(), Describe(), etc.) are present that would require SNO compatibility checks.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies router HAProxy backend and endpoint handling. No deployment manifests, controllers, or scheduling constraints introduced.
Ote Binary Stdout Contract	✅ Passed	No OTE Binary Stdout violations found. PR modifies internal utility packages with no process-level functions or stdout writes that would corrupt JSON output.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	This PR contains no new Ginkgo e2e tests. It modifies existing standard Go unit tests and implementation files only, so the IPv6/disconnected network compatibility check does not apply.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jcmoraisjr · 2026-05-25T18:00:19Z

/jira refresh

openshift-ci-robot · 2026-05-25T18:00:26Z

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-25T18:01:35Z

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Summary by CodeRabbit

Release Notes

New Features

Added error detection helpers for HAProxy server management operations.

Introduced improved server endpoint replacement logic with better fallback handling.

Bug Fixes

Enhanced health check state management with explicit server health updates.

Improved dynamic endpoint updates with more robust comparison and fallback behavior.

Refactor

Refined server addition, update, and deletion workflows for greater reliability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcmoraisjr · 2026-05-25T18:02:32Z

/assign @gcs278

coderabbitai

🧹 Nitpick comments (2)

pkg/router/template/router.go (2)

886-896: 💤 Low value

Consider using oldConfig.EndpointTable[serviceKey] for removed service endpoints.

When removing a service from a route, using service.EndpointTable relies on the invariant that the ServiceUnit's endpoint table matches what was previously synced to this backend. Using oldConfig.EndpointTable[serviceKey] would be more explicit about removing exactly what was added to this specific backend.

While the state should normally be consistent due to locking, using the old config's endpoint table would be more defensive.

♻️ Suggested change

 	for serviceKey := range oldConfig.ServiceUnits {
-		if service, ok := r.findMatchingServiceUnit(serviceKey); ok {
-			if _, newFound := newConfig.ServiceUnits[serviceKey]; !newFound {
-				// Service was removed, remove the endpoints.
-				if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {
+		if _, newFound := newConfig.ServiceUnits[serviceKey]; !newFound {
+			if service, ok := r.findMatchingServiceUnit(serviceKey); ok {
+				// Service was removed from route, remove the endpoints that were in the old config.
+				if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, oldConfig.EndpointTable[serviceKey]); err != nil {
 					log.Info("router will reload due to error removing endpoints", "backendKey", backendKey, "serviceKey", serviceKey, "error", err)
 					failed = true
 				}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 886 - 896, The code currently
passes service.EndpointTable when removing endpoints for services present in
oldConfig.ServiceUnits but no longer in newConfig; instead, use the exact
endpoint table that was part of the previous route config to be defensive.
Update the removal call in the loop that iterates oldConfig.ServiceUnits (around
findMatchingServiceUnit and the RemoveRouteEndpoints call) to use
oldConfig.EndpointTable[serviceKey] (or the appropriate map/key in oldConfig)
rather than service.EndpointTable, preserving backendKey and serviceKey when
calling r.dynamicConfigManager.RemoveRouteEndpoints.

957-972: 💤 Low value

Misleading weight comparison in log message.

The log at line 966 compares oldWeight (from cfg.ServiceUnitNames, the calculated per-endpoint weight) with newWeight (from cfg.ServiceUnits, the route-level service weight). These values have different meanings and scales, making the log confusing for debugging.

The actual replacement logic is correct since it compares the full endpoint structs including their Weight fields.

♻️ Suggested change for clearer logging

-			log.V(4).Info("updating weight for sibling service due to endpoint change", "service", otherServiceKey, "backendKey", backendKey, "oldWeight", oldWeight, "newWeight", newWeight)
+			log.V(4).Info("updating sibling service endpoints due to weight redistribution", "service", otherServiceKey, "backendKey", backendKey)

Alternatively, compare the actual endpoint weights:

oldEpWeight := int32(0)
newEpWeight := int32(0)
if len(oldOtherEndpoints) > 0 {
    oldEpWeight = oldOtherEndpoints[0].Weight
}
if len(newOtherEndpoints) > 0 {
    newEpWeight = newOtherEndpoints[0].Weight
}
log.V(4).Info("updating sibling service endpoints", "service", otherServiceKey, "backendKey", backendKey, "oldEpWeight", oldEpWeight, "newEpWeight", newEpWeight)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 957 - 972, The log message in the
loop that updates sibling services is misleading because it compares oldWeight
(from oldWeights/cfg.ServiceUnitNames per-endpoint weight) to newWeight
(cfg.ServiceUnits route-level weight); change the log to report endpoint-level
weights instead: read the first endpoint Weight from oldOtherEndpoints and
newOtherEndpoints (guarding for empty slices) and log those as oldEpWeight and
newEpWeight in the log.V(4).Info call inside the loop that references
otherServiceKey, backendKey, oldOtherEndpoints, newOtherEndpoints and the
ReplaceRouteEndpoints error path so the message reflects actual endpoint weight
changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/router/template/router.go`:
- Around line 886-896: The code currently passes service.EndpointTable when
removing endpoints for services present in oldConfig.ServiceUnits but no longer
in newConfig; instead, use the exact endpoint table that was part of the
previous route config to be defensive. Update the removal call in the loop that
iterates oldConfig.ServiceUnits (around findMatchingServiceUnit and the
RemoveRouteEndpoints call) to use oldConfig.EndpointTable[serviceKey] (or the
appropriate map/key in oldConfig) rather than service.EndpointTable, preserving
backendKey and serviceKey when calling
r.dynamicConfigManager.RemoveRouteEndpoints.
- Around line 957-972: The log message in the loop that updates sibling services
is misleading because it compares oldWeight (from
oldWeights/cfg.ServiceUnitNames per-endpoint weight) to newWeight
(cfg.ServiceUnits route-level weight); change the log to report endpoint-level
weights instead: read the first endpoint Weight from oldOtherEndpoints and
newOtherEndpoints (guarding for empty slices) and log those as oldEpWeight and
newEpWeight in the log.V(4).Info call inside the loop that references
otherServiceKey, backendKey, oldOtherEndpoints, newOtherEndpoints and the
ReplaceRouteEndpoints error path so the message reflects actual endpoint weight
changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1a3caf78-e491-44da-b467-71d20f596926

📥 Commits

Reviewing files that changed from the base of the PR and between 6761134 and a5ac103.

📒 Files selected for processing (6)

pkg/router/template/configmanager/haproxy/backend.go
pkg/router/template/configmanager/haproxy/backend_test.go
pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
pkg/router/template/configmanager/haproxy/manager.go
pkg/router/template/router.go
pkg/router/template/types.go

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time. This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before. Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields. https://redhat.atlassian.net/browse/OCPBUGS-77344

openshift-ci · 2026-05-26T19:44:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gcs278. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/router/template/router.go (1)

982-996: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove endpoints per route, not from the full ServiceUnit table.

This helper now sends service.EndpointTable to every associated backend. If two routes select different target ports on the same service, Line 995 will attempt to delete endpoints that were never configured for some of those backends, which turns a normal endpoint deletion into a forced reload.

Proposed fix

-func (r *templateRouter) dynamicallyRemoveEndpoints(service ServiceUnit) bool {
+func (r *templateRouter) dynamicallyRemoveEndpoints(id ServiceUnitKey, service ServiceUnit) bool {
 	if r.dynamicConfigManager == nil || !r.synced {
 		return false
 	}
@@
 	for backendKey := range service.ServiceAliasAssociations {
-		if _, ok := r.state[backendKey]; !ok {
+		cfg, ok := r.state[backendKey]
+		if !ok {
 			continue
 		}
@@
-		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {
+		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, cfg.EndpointTable[id]); err != nil {
 			// Error dynamically modifying the config, so return false to cause a reload to happen.
 			log.Info("router will reload as the ConfigManager could not dynamically remove endpoints for backend", "backendKey", backendKey, "error", err)
 			return false
 		}
 	}

You'll also need the matching call-site update at Line 1014.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 982 - 996, The
dynamicallyRemoveEndpoints helper is passing the entire
ServiceUnit.EndpointTable to every associated backend, causing
RemoveRouteEndpoints(backendKey, service.EndpointTable) to delete endpoints that
don't belong to specific backends; change dynamicallyRemoveEndpoints (and its
call-site) to pass only the route-specific endpoints for each backend (i.e., the
endpoints that correspond to that backendKey) instead of service.EndpointTable,
updating the use of ServiceAliasAssociations and the call-site that invokes
dynamicallyRemoveEndpoints so each backend removal receives the correct
per-route endpoint slice/map.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 370-375: When innerDeleteServer(oldEP) fails after you already
called innerSetServerState(oldEP, false), restore the original server state to
avoid leaving the fallback endpoint drained; change the code around the
innerDeleteServer call to capture the delete error, call
b.innerSetServerState(oldEP, true) to revert the change, and return a combined
or wrapped error if the restore also fails (reference innerSetServerState and
innerDeleteServer to locate the logic).
- Around line 361-363: The code only updates HAProxy weight when oldEP.Weight !=
newEP.Weight via b.innerUpdateServerWeight, but fails to toggle the server's
state between "ready" and "drain" when the weight crosses the zero boundary (<=0
vs >0), causing drained endpoints to still serve or recovered ones to remain
drained (and passthrough always sends 100%). Modify the branch handling in the
function that compares oldEP and newEP so that when weight changes you detect a
crossing of the drain threshold (oldEP.Weight > 0 && newEP.Weight <= 0 =>
transition to "drain"; oldEP.Weight <= 0 && newEP.Weight > 0 => transition to
"ready") and invoke the appropriate HAProxy state-change routine in addition to
calling b.innerUpdateServerWeight(newEP, isPassthrough); ensure isPassthrough
still results in correct weight payload but do the state flip via the same
server-state API used elsewhere.

In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 524-545: The code must only attempt to re-enable the singleton
health check if that exact old endpoint still exists; before calling
backend.EnableHealthCheck on oldEndpoints[0] (and before calling
backend.ReplaceServer/adding to newEPs), verify that the endpoint is still
present in the current service/backend endpoints (e.g. check svc.Endpoints or
the current backend server list) and skip this whole branch if the endpoint no
longer exists so a prior ReplaceServer/delete isn't turned into a forced reload.

In `@pkg/router/template/router.go`:
- Around line 886-890: When removing a service from a route, the code currently
passes the raw service list (service.EndpointTable) to RemoveRouteEndpoints;
instead it should pass the route-scoped endpoint slice that was actually
attached to this backend (the EndpointTable stored on the service unit from the
old route config). Update the call in the loop over oldConfig.ServiceUnits so
that, after locating the removed service via r.findMatchingServiceUnit, you
retrieve the service unit from oldConfig.ServiceUnits[serviceKey] (or the local
variable representing that unit) and pass that unit's EndpointTable (the
PreferPort-filtered slice) to
r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, ... ) instead of
service.EndpointTable, keeping the same backendKey and error handling.

---

Outside diff comments:
In `@pkg/router/template/router.go`:
- Around line 982-996: The dynamicallyRemoveEndpoints helper is passing the
entire ServiceUnit.EndpointTable to every associated backend, causing
RemoveRouteEndpoints(backendKey, service.EndpointTable) to delete endpoints that
don't belong to specific backends; change dynamicallyRemoveEndpoints (and its
call-site) to pass only the route-specific endpoints for each backend (i.e., the
endpoints that correspond to that backendKey) instead of service.EndpointTable,
updating the use of ServiceAliasAssociations and the call-site that invokes
dynamicallyRemoveEndpoints so each backend removal receives the correct
per-route endpoint slice/map.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aa126d80-4b4b-44bb-841c-1121e049641b

📥 Commits

Reviewing files that changed from the base of the PR and between a5ac103 and 5dbcbdf.

📒 Files selected for processing (6)

pkg/router/template/configmanager/haproxy/backend.go
pkg/router/template/configmanager/haproxy/backend_test.go
pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
pkg/router/template/configmanager/haproxy/manager.go
pkg/router/template/router.go
pkg/router/template/types.go

coderabbitai · 2026-05-26T19:58:13Z

+	if oldEP.Weight != newEP.Weight {
+		return false, b.innerUpdateServerWeight(newEP, isPassthrough)
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle weight changes that cross the drain boundary.

Line 361 only updates HAProxy's weight. If the endpoint moves from > 0 to <= 0 or back, the server never flips between ready and drain, so a drained endpoint can keep serving or a recovered endpoint can stay drained. Passthrough is especially affected here because innerUpdateServerWeight() always sends 100%.

Suggested fix

if oldEP.Weight != newEP.Weight { - return false, b.innerUpdateServerWeight(newEP, isPassthrough) + if err := b.innerUpdateServerWeight(newEP, isPassthrough); err != nil { + return false, err + } + if (oldEP.Weight <= 0) != (newEP.Weight <= 0) { + if err := b.innerSetServerState(newEP, true); err != nil { + return false, err + } + } } return false, nil

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 361 - 363, The code only updates HAProxy weight when oldEP.Weight != newEP.Weight via b.innerUpdateServerWeight, but fails to toggle the server's state between "ready" and "drain" when the weight crosses the zero boundary (<=0 vs >0), causing drained endpoints to still serve or recovered ones to remain drained (and passthrough always sends 100%). Modify the branch handling in the function that compares oldEP and newEP so that when weight changes you detect a crossing of the drain threshold (oldEP.Weight > 0 && newEP.Weight <= 0 => transition to "drain"; oldEP.Weight <= 0 && newEP.Weight > 0 => transition to "ready") and invoke the appropriate HAProxy state-change routine in addition to calling b.innerUpdateServerWeight(newEP, isPassthrough); ensure isPassthrough still results in correct weight payload but do the state flip via the same server-state API used elsewhere.

coderabbitai · 2026-05-26T19:58:14Z

+	if err := b.innerSetServerState(oldEP, false); err != nil {
+		return err
+	}
+	if err := b.innerDeleteServer(oldEP); err != nil {
+		return err
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the old server state if replace aborts before the add.

After Line 370 puts the old server in maint, a delete failure on Line 373 returns an error but leaves that endpoint drained until the reload path completes. That reintroduces a transient availability drop on the fallback path.

Suggested fix

if err := b.innerSetServerState(oldEP, false); err != nil { return err } if err := b.innerDeleteServer(oldEP); err != nil { + if rollbackErr := b.innerSetServerState(oldEP, true); rollbackErr != nil { + return fmt.Errorf("deleting old server: %v; restoring old server state: %v", err, rollbackErr) + } return err }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/backend.go` around lines 370 - 375, When innerDeleteServer(oldEP) fails after you already called innerSetServerState(oldEP, false), restore the original server state to avoid leaving the fallback endpoint drained; change the code around the innerDeleteServer call to capture the delete error, call b.innerSetServerState(oldEP, true) to revert the change, and return a combined or wrapped error if the restore also fails (reference innerSetServerState and innerDeleteServer to locate the logic).

coderabbitai · 2026-05-26T19:58:14Z

 		if len(oldEndpoints) == 1 {
-			// enabling also for the former single endpoint as well
-			newEPs = append(newEPs, oldEndpoints[0])
+			ep := &oldEndpoints[0]
+			if !ep.NoHealthCheck {
+				// The backend was previously in the single server scenario, so health check should be enabled.
+				// Dynamically enabling health check only works if health check is configured, and we only
+				// configure health check upfront in dynamically added servers.
+				// So, we are trying to enable health check first, and if HAProxy responds that it is not
+				// configured, we'll need to remove and add it again.
+				err := backend.EnableHealthCheck(ep)
+				if backend.IsHealthCheckNotConfiguredError(err) {
+					// Health check not configured on this server. Replace it dynamically to reconfigure with health check.
+					err = backend.ReplaceServer(entry.backend, svc, ep, ep, cm.workingDir, cm.defaultDestinationCA)
+					if err == nil {
+						// Server replaced successfully, mark to enable health check later.
+						newEPs = append(newEPs, ep)
+					}
+				}
+				if err != nil {
+					// Failed either enabling health check or replacing backend server.
+					errs = append(errs, err)
+				}
+			}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only carry forward the old singleton health check if that server still exists.

This branch runs after the delete phase, but it still calls EnableHealthCheck on oldEndpoints[0] whenever len(oldEndpoints) == 1. If the old singleton was replaced outright, that endpoint is already gone and this turns a valid dynamic replace into a forced reload.

Proposed fix

- if len(oldEndpoints) == 1 { + if len(oldEndpoints) == 1 { + if _, removed := deletedEndpoints[oldEndpoints[0].ID]; removed { + goto enableNewHealthChecks + } ep := &oldEndpoints[0] if !ep.NoHealthCheck { // The backend was previously in the single server scenario, so health check should be enabled. // Dynamically enabling health check only works if health check is configured, and we only // configure health check upfront in dynamically added servers. @@ } } } + enableNewHealthChecks: for _, ep := range newEPs { if !ep.NoHealthCheck { if err := backend.EnableHealthCheck(ep); err != nil { errs = append(errs, err)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/configmanager/haproxy/manager.go` around lines 524 - 545, The code must only attempt to re-enable the singleton health check if that exact old endpoint still exists; before calling backend.EnableHealthCheck on oldEndpoints[0] (and before calling backend.ReplaceServer/adding to newEPs), verify that the endpoint is still present in the current service/backend endpoints (e.g. check svc.Endpoints or the current backend server list) and skip this whole branch if the endpoint no longer exists so a prior ReplaceServer/delete isn't turned into a forced reload.

coderabbitai · 2026-05-26T19:58:14Z

+	for serviceKey := range oldConfig.ServiceUnits {
+		if _, newFound := newConfig.ServiceUnits[serviceKey]; !newFound {
+			if service, ok := r.findMatchingServiceUnit(serviceKey); ok {
+				// Service was removed from route, remove the endpoints that were in the old config.
+				if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the route-scoped endpoint slice when a service is removed from a route.

service.EndpointTable is the raw service list, not the PreferPort-filtered set that was actually attached to this backend. On a port-specific route, Line 890 can try to delete servers that never existed in the route backend and unnecessarily drop back to reload.

Proposed fix

- if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil { + if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, oldConfig.EndpointTable[serviceKey]); err != nil {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/router/template/router.go` around lines 886 - 890, When removing a service from a route, the code currently passes the raw service list (service.EndpointTable) to RemoveRouteEndpoints; instead it should pass the route-scoped endpoint slice that was actually attached to this backend (the EndpointTable stored on the service unit from the old route config). Update the call in the loop over oldConfig.ServiceUnits so that, after locating the removed service via r.findMatchingServiceUnit, you retrieve the service unit from oldConfig.ServiceUnits[serviceKey] (or the local variable representing that unit) and pass that unit's EndpointTable (the PreferPort-filtered slice) to r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, ... ) instead of service.EndpointTable, keeping the same backendKey and error handling.

openshift-ci · 2026-05-26T23:36:58Z

@jcmoraisjr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 25, 2026

openshift-ci Bot requested review from bentito and ironcladlou May 25, 2026 18:00

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 25, 2026

openshift-ci Bot assigned gcs278 May 25, 2026

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from a5ac103 to 5dbcbdf Compare May 26, 2026 19:44

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Conversation

jcmoraisjr commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

jcmoraisjr commented May 25, 2026

Uh oh!

openshift-ci-robot commented May 25, 2026

Uh oh!

openshift-ci-robot commented May 25, 2026

Summary by CodeRabbit

Release Notes

Uh oh!

jcmoraisjr commented May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcmoraisjr commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading