WIP: tooling: Update verify-feature-promotion logic for installer-focused features#2903
Conversation
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThe PR marks Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning, 1 inconclusive)
✅ Passed checks (12 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Comment |
|
/test verify-feature-promotion |
|
Hello @everettraven! Some important instructions when contributing to openshift/api: |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@everettraven: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/codegen/cmd/featuregate-test-analyzer.go`:
- Around line 1092-1104: The hardcoded stub in getJobsForFeatureGateFromSippy is
bypassing the real Sippy lookup and ignoring the release, featureGate, and
variant inputs. Remove the temporary out slice and early return so the function
proceeds to the actual query logic, and keep the existing
getJobsForFeatureGateFromSippy path using the provided parameters to fetch the
correct jobs.
- Around line 1046-1078: The triage handling in verifyJobPassRate is too coarse
and is using all job runs in the pass-rate calculation, which causes triaged
failures to suppress unrelated variants and still count against the denominator.
Update the flow around getTriagedTestFailuresFromSippy and verifyJobPassRate so
triage is keyed by the full variant identity returned by Sippy, not just test
name, and filter out triaged failures before setting
TestResults.TotalRuns/SuccessfulRuns/FailedRuns. Make sure the loop over jobRuns
only increments failures for non-triaged failing runs and that pass-rate math
reflects the filtered set consistently in both affected call sites.
- Around line 1051-1052: The triage lookup result from
getTriagedTestFailuresFromSippy in featuregate-test-analyzer.go is using the
returned err without handling it, which can make failures look untriaged. Update
the logic around triagedTestFailures, err to check and handle the error
immediately: propagate it or stop processing rather than continuing with an
empty triaged set. Keep the fix localized to the code path that consumes
getTriagedTestFailuresFromSippy so the triage counting logic only runs when the
lookup succeeds.
- Around line 999-1008: The defaultTransport setup in featuregate-test-analyzer
should not disable TLS verification for Sippy requests. Remove the
InsecureSkipVerify setting from the tls.Config used by defaultTransport, and
rely on normal certificate validation so the HTTP client still enforces HTTPS
trust checks while preserving the existing transport configuration.
- Around line 993-1016: `verifyInstallFeatureGatePromotion` and the Sippy helper
chain are not propagating cancellation, so requests can outlive `Run(ctx)`.
Update `verifyInstallFeatureGatePromotion` to pass `ctx` into
`getJobsForFeatureGateFromSippy`, then thread it through `getJobRunsFromSippy`
and `getTriagedTestFailuresFromSippy`, and use it when building the HTTP
requests instead of calling `client.Get(...)` directly. Keep the existing
symbols in place, only add context plumbing so request lifetimes follow the
caller’s deadline/cancelation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 382c96e4-2e33-4b88-b0fe-749c605219d5
📒 Files selected for processing (7)
features.mdfeatures/features.gopayload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yamlpayload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yamltools/codegen/cmd/featuregate-test-analyzer.go
| defaultTransport := &http.Transport{ | ||
| Proxy: http.ProxyFromEnvironment, | ||
| ForceAttemptHTTP2: true, | ||
| MaxIdleConns: 100, | ||
| IdleConnTimeout: 90 * time.Second, | ||
| TLSHandshakeTimeout: 10 * time.Second, | ||
| ExpectContinueTimeout: 1 * time.Second, | ||
| TLSClientConfig: &tls.Config{ | ||
| InsecureSkipVerify: true, | ||
| }, |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="tools/codegen/cmd/featuregate-test-analyzer.go"
echo "== line count =="
wc -l "$file"
echo "== around reported lines =="
sed -n '960,1035p' "$file"
echo "== later helper context =="
sed -n '1035,1125p' "$file"Repository: openshift/api
Length of output: 5480
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="tools/codegen/cmd/featuregate-test-analyzer.go"
echo "== line count =="
wc -l "$file"
echo "== around reported lines =="
sed -n '960,1035p' "$file"
echo "== later helper context =="
sed -n '1035,1125p' "$file"Repository: openshift/api
Length of output: 5480
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="tools/codegen/cmd/featuregate-test-analyzer.go"
echo "== line count =="
wc -l "$file"
echo "== around reported lines =="
sed -n '960,1035p' "$file"
echo "== later helper context =="
sed -n '1035,1125p' "$file"Repository: openshift/api
Length of output: 5480
Don't disable TLS verification for Sippy requests.
InsecureSkipVerify: true allows a MITM to tamper with the job/triage data used here, which can change promotion decisions.
Suggested fix
defaultTransport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
- TLSClientConfig: &tls.Config{
- InsecureSkipVerify: true,
- },
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| defaultTransport := &http.Transport{ | |
| Proxy: http.ProxyFromEnvironment, | |
| ForceAttemptHTTP2: true, | |
| MaxIdleConns: 100, | |
| IdleConnTimeout: 90 * time.Second, | |
| TLSHandshakeTimeout: 10 * time.Second, | |
| ExpectContinueTimeout: 1 * time.Second, | |
| TLSClientConfig: &tls.Config{ | |
| InsecureSkipVerify: true, | |
| }, | |
| defaultTransport := &http.Transport{ | |
| Proxy: http.ProxyFromEnvironment, | |
| ForceAttemptHTTP2: true, | |
| MaxIdleConns: 100, | |
| IdleConnTimeout: 90 * time.Second, | |
| TLSHandshakeTimeout: 10 * time.Second, | |
| ExpectContinueTimeout: 1 * time.Second, | |
| } |
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 1005-1007: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
InsecureSkipVerify: true,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures
(missing-ssl-minversion-go)
🪛 OpenGrep (1.23.0)
[ERROR] 1006-1008: TLS certificate verification is disabled via InsecureSkipVerify. This allows man-in-the-middle attacks. Remove InsecureSkipVerify or set it to false.
(coderabbit.tls.go-insecure-skip-verify)
[ERROR] 1006-1008: TLS certificate verification is disabled via InsecureSkipVerify. This allows man-in-the-middle attacks. Remove InsecureSkipVerify or set it to false.
(coderabbit.tls.go-insecure-skip-verify)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 999 - 1008, The
defaultTransport setup in featuregate-test-analyzer should not disable TLS
verification for Sippy requests. Remove the InsecureSkipVerify setting from the
tls.Config used by defaultTransport, and rely on normal certificate validation
so the HTTP client still enforces HTTPS trust checks while preserving the
existing transport configuration.
Source: Linters/SAST tools
There was a problem hiding this comment.
We already do this everywhere else and this hasn't been a problem.
There was a problem hiding this comment.
@everettraven Fair point on consistency. I agree this isn’t unique to this PR, but I still think the concern is valid: InsecureSkipVerify: true disables certificate validation for data that feeds promotion decisions, so “it hasn’t been a problem yet” is more luck than guarantee.
That said, if you’d rather keep this PR scoped to the installer-logic change, I’m okay deferring this as broader cleanup instead of blocking here. If you want, I can help turn it into a follow-up issue to remove the insecure TLS config across the related Sippy clients.
| func getJobsForFeatureGateFromSippy(client *http.Client, release, featureGate string, variant JobVariant) ([]sippyJob, error) { | ||
| // TODO: Remove this once done testing the other logic. | ||
|
|
||
| out := []sippyJob{ | ||
| { | ||
| Name: "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-installer-dualstack-ipv4-primary-techpreview", | ||
| }, | ||
| { | ||
| Name: "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-installer-dualstack-ipv6-primary-techpreview", | ||
| }, | ||
| } | ||
|
|
||
| return out, nil |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Remove the hardcoded job stub before merging.
This early return makes the real Sippy query below unreachable and ignores release, featureGate, and variant. Every install gate now gets evaluated against the same two AWS dualstack techpreview jobs, which will misreport coverage for other platforms and topologies.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 1092 - 1104, The
hardcoded stub in getJobsForFeatureGateFromSippy is bypassing the real Sippy
lookup and ignoring the release, featureGate, and variant inputs. Remove the
temporary out slice and early return so the function proceeds to the actual
query logic, and keep the existing getJobsForFeatureGateFromSippy path using the
provided parameters to fetch the correct jobs.
| return false | ||
| } | ||
|
|
||
| func verifyInstallFeatureGatePromotion(featureGate string, jobVariant JobVariant) (*TestingResults, error) { |
There was a problem hiding this comment.
suggest going generic, there may be other components that have to prove their fg is ready via a job overall result, not tests within the job.
| ColumnField: "variants", | ||
| OperatorValue: "contains", | ||
| Value: fmt.Sprintf("Topology:%s", variant.Topology), | ||
| }, |
There was a problem hiding this comment.
Nice! So in a couple days we could make this a query for variant Capability:FGName (assuming we name this correctly to match the fg)
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/codegen/cmd/featuregate-test-analyzer.go (1)
1105-1129: 🎯 Functional Correctness | 🟠 MajorCount empty failed-run cases as failures
Failed job runs with no reported
FailedTestNamesstill fall through toSuccessfulRuns++, so installer or early infra failures get counted as passes. Treat those runs as failures unless they’re explicitly known/triaged.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codegen/cmd/featuregate-test-analyzer.go` around lines 1105 - 1129, The job-run counting logic in the loop over jobRuns is treating failed runs with no FailedTestNames as successes because only untriagedTestFailures increments FailedRuns. Update the featuregate test analyzer to count any jobRun with OverallResult == "F" and not KnownFailure as a failed run, even when FailedTestNames is empty, while still preserving the existing triage filtering for reporting untriaged failures in the jobRun/triagedTestFailures path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tools/codegen/cmd/featuregate-test-analyzer.go`:
- Around line 1105-1129: The job-run counting logic in the loop over jobRuns is
treating failed runs with no FailedTestNames as successes because only
untriagedTestFailures increments FailedRuns. Update the featuregate test
analyzer to count any jobRun with OverallResult == "F" and not KnownFailure as a
failed run, even when FailedTestNames is empty, while still preserving the
existing triage filtering for reporting untriaged failures in the
jobRun/triagedTestFailures path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 39488995-d772-40a5-9c03-1230580f496e
📒 Files selected for processing (2)
tools/codegen/cmd/featuregate-test-analyzer.gotools/codegen/pkg/sippy/json_types.go
No description provided.