TRT-2564: Cache Risk Analysis responses by sosiouxme · Pull Request #3409 · openshift/sippy

sosiouxme · 2026-04-07T00:05:11Z

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
Query results are written to the cache with a timestamp.
On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

New Features
- Global caching introduced across APIs, CLI commands, background processors and materialized-view refresh tracking.
Changed
- Refresh/seed flows and server endpoints now use optional cache clients; matview refreshes record cache timestamps; test/result generation and risk analysis are cache-aware and refactored for unified result specs.
Removals
- /api/canary endpoint removed.
Tests
- Extensive tests added for cache and matview behaviors.

🤖 Assisted by Claude Code

also minor refactoring to prep for caching 🤖 Assisted by Claude Code

... which replaces caching the tests at the api response

openshift-ci-robot · 2026-04-07T00:05:14Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot · 2026-04-07T00:05:15Z

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.

Query results are written to the cache with a timestamp.

On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sosiouxme · 2026-04-07T01:05:37Z

/retest-required

openshift-ci-robot · 2026-04-07T01:10:38Z

Scheduling required tests:
/test e2e

sosiouxme · 2026-04-13T11:13:20Z

@coderabbitai help

coderabbitai · 2026-04-13T11:13:27Z

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
- @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
- @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve merge conflict to automatically resolve merge conflicts.
@coderabbitai autofix to automatically fix issues identified in unresolved review comments.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

sosiouxme · 2026-04-13T12:48:55Z

@coderabbitai review

coderabbitai · 2026-04-13T12:49:02Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-13T12:49:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 110815a6-b9a1-4431-bd10-0c6a67897948

📥 Commits

Reviewing files that changed from the base of the PR and between f81c9a4 and e06b56a.

📒 Files selected for processing (1)

pkg/api/tests.go

✅ Files skipped from review due to trivial changes (1)

pkg/api/tests.go

Walkthrough

Threads a cache client through commands, server, processors, and API flows; adds generic GetDataFromCacheOrMatview and RefreshMatviewKey helpers; records matview refresh timestamps in cache; adapts function signatures and command flags to accept and propagate a cache client.

Changes

Cohort / File(s)	Summary
Command entry points `cmd/sippy-daemon/main.go`, `cmd/sippy/load.go`, `cmd/sippy/refresh.go`, `cmd/sippy/seed_data.go`	Add `CacheFlags` and initialize `cacheClient`; propagate `cacheClient` into flows; replace pinned-timestamp usage with cache-aware `RefreshData` signature; tolerate nil cache with warnings or abort on hard errors where appropriate.
Cache core & tests `pkg/api/cache.go`, `pkg/api/cache_test.go`	Add `GetDataFromCacheOrMatview[T]`, `RefreshMatviewKey`, and adjust `CalculateRoundedCacheDuration` to honor `RequestOptions.Expiry`; extensive tests validate nil-cache, hits/misses, matview refresh semantics, timestamp ordering, and cache write behavior.
API logic & types `pkg/api/tests.go`, `pkg/api/job_runs.go`	Introduce `TestResultsSpec` with `build` methods; rename result slice types to exported `TestsAPIResult`; make tests and job-run analysis accept `cache.Cache` and use cache-aware generation; remove canary endpoint and standalone BuildTestsResults builders.
Server & processors `pkg/sippyserver/server.go`, `pkg/sippyserver/pr_commenting_processor.go`	Wire `cache.Cache` into `Server`, `WorkProcessor`, and `AnalysisWorker`; change `NewWorkProcessor` and `RefreshData` signatures; record per-matview refresh timestamps via `recordMatviewRefreshTime`; update handlers to pass `s.cache`.
Cache API types `pkg/apis/cache/cache.go`	Add `Expiry time.Duration` to exported `RequestOptions` and update CR time-rounding documentation.
Scripts / minor docs `scripts/updatesuites.go`	Add clarifying comment that the script’s refresh call does not update matview cache timestamps.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Cache
    participant DB
    participant BigQuery

    Client->>Server: Request data (e.g., /api/tests)
    Server->>Cache: GET RefreshMatviewKey(matview)
    Cache-->>Server: refresh_time or nil

    alt refresh_time missing OR cache_timestamp >= refresh_time
        Server->>Cache: GET cached data key
        Cache-->>Server: cached_value + cache_timestamp or MISS
        alt cache hit and valid
            Server->>Client: Return cached_value
        else
            Server->>DB: Query matview (or BigQuery fallback)
            DB-->>Server: query_result
            Server->>Cache: SET cached data key = {Val, Timestamp=now} (with cacheDuration)
            Cache-->>Server: set OK
            Server->>Client: Return generated data
        end
    else refresh_time > cache_timestamp
        Server->>DB: Query matview (or BigQuery)
        DB-->>Server: query_result
        Server->>Cache: SET RefreshMatviewKey(matview) = now (TTL 24h)
        Cache-->>Server: set OK
        Server->>Cache: SET cached data key = {Val, Timestamp=now} (with cacheDuration)
        Cache-->>Server: set OK
        Server->>Client: Return generated data
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Ote Binary Stdout Contract	❌ Error	Pull request introduces OTE Binary Stdout Contract violations through process-level stdout writes in Cobra command initialization.	Replace fmt.Fprintf(os.Stdout) with fmt.Fprintf(os.Stderr) in sippy-daemon and fmt.Println with fmt.Fprintln(os.Stderr) in PrintVersion function.
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling	❓ Inconclusive	Unable to access modified source code files in sandbox environment to verify Go error handling patterns including error wrapping, nil pointer checks, and panic justification.	Provide access to modified files (cache.go, load.go, refresh.go, server.go) to verify error wrapping with fmt.Errorf %w, cacheClient nil checks, and panic call justifications.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'TRT-2564: Cache Risk Analysis responses' clearly identifies the main change: adding caching for Risk Analysis responses. It is concise, specific, and directly relates to the primary objective of the changeset.
Sql Injection Prevention	✅ Passed	Pull request uses proper parameterization for all database queries via BigQuery named parameters and GORM placeholders, with no direct user input concatenation into SQL.
Excessive Css In React Should Use Styles	✅ Passed	The pull request modifies exclusively Go backend code files (cmd/, pkg/, and scripts/ directories) related to caching functionality and database operations. The custom check for 'Excessive CSS in React Should Use Styles' is specifically designed for React/JavaScript components and inline CSS styling patterns. Since all changes in this PR are confined to Go server-side code with no modifications to React components in the sippy-ng/ directory, this check is not applicable to this pull request.
Single Responsibility And Clear Naming	✅ Passed	Pull request maintains clear single responsibility and naming principles with action-oriented function names and descriptive types appropriately scoped to their packages.
Stable And Deterministic Test Names	✅ Passed	The pull request uses only standard Go testing package, not Ginkgo. All 27 test functions follow func Test* naming convention with static, descriptive names containing no dynamic information.
Test Structure And Quality	✅ Passed	PR contains standard Go tests with testing.T, not Ginkgo BDD-style tests. Custom check for Ginkgo tests is not applicable.
Microshift Test Compatibility	✅ Passed	This pull request does not introduce any new Ginkgo e2e tests. The repository is the Sippy performance monitoring tool, not a Kubernetes test suite. Only unit tests in standard Go testing package were modified.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	This PR contains only standard Go unit tests using testing.T framework, not Ginkgo e2e tests. SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only Go application code without introducing Kubernetes manifests or operator code, making it out of scope for topology-aware scheduling compatibility assessment.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	The PR does not add any Ginkgo e2e tests. The only test file added contains standard Go unit tests with mocked dependencies.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-04-13T12:49:55Z

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.

Query results are written to the cache with a timestamp.

On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

New Features

Added caching infrastructure for test results and job run risk analysis to improve response times.

Implemented automatic cache invalidation tracking for materialized view refreshes to ensure data freshness.

Removals

Removed /api/canary endpoint.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/api/tests.go (2)

686-690: ⚠️ Potential issue | 🟡 Minor

Fix the copied completion log label.

This is the BigQuery path, but the message still says "buildTestsResultsFromPostgres completed". That makes latency/debug logs ambiguous when you're comparing the BQ and Postgres builders.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 686 - 690, The completion log message is wrong
for the BigQuery path: update the Debug message in the log.WithFields block that
currently says "buildTestsResultsFromPostgres completed" to the correct label
(e.g., "buildTestsResultsFromBigQuery completed") so the elapsed/testReports
metrics are clearly attributed; locate the logging call that uses elapsed and
testReports (the log.WithFields(...) followed by .Debug(...)) and replace the
string to match the BigQuery builder function name.

322-335: ⚠️ Potential issue | 🟠 Major

Normalize equivalent period values before they reach the cache key.

buildTestsResultsFromPostgres only treats "twoDay" specially, so "", "default", and "current" all hit the same 7-day matview. Right now those values are serialized into different TestResultsSpec cache keys, which fragments the cache and cuts the hit rate for equivalent requests.

Suggested fix

 	period := req.URL.Query().Get("period")
 	if period != "" && period != "default" && period != "current" && period != "twoDay" {
 		RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{"code": http.StatusBadRequest, "message": "Unknown period"})
 		return
 	}
+	if period == "" || period == "default" {
+		period = "current"
+	}
 
 	spec := TestResultsSpec{
 		Release:        release,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 322 - 335, Normalize the incoming period value
to a canonical token before constructing TestResultsSpec so equivalent periods
share the same cache key: map ""/"default"/"current" to the same normalized
value (e.g., ""), leaving "twoDay" untouched, then use that normalized variable
when building the TestResultsSpec and calling
spec.buildTestsResultsFromPostgres(dbc, cacheClient); update the validation
logic that currently checks period to still allow the accepted inputs but
perform the normalization immediately prior to creating the TestResultsSpec.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sippy/load.go`:
- Around line 148-152: In the cache initialization block around GetCacheClient,
fix the error variable mismatch by wrapping and returning cacheErr (not err)
when cacheErr != nil; update references in the component-readiness-cache path
and any early-return logic so that errors.Wrap or the returned error uses
cacheErr with a message like "couldn't get cache client", and keep cacheClient
nil when cacheErr is present to preserve existing error hygiene.

In `@cmd/sippy/refresh.go`:
- Around line 4-6: The current cache setup turns a best-effort write into a hard
failure; instead of returning the error when initializing the cache client, log
the error (using logrus) and set cacheClient = nil so RefreshData can proceed
(RefreshData already handles a nil cacheClient). Update the cache initialization
code that currently returns the error (around the cacheClient
setup/WriteTimestamp call) to log the failure and continue, ensuring calls to
RefreshData use the possibly-nil cacheClient.

In `@pkg/api/cache_test.go`:
- Around line 637-640: The test is supposed to hit the equality branch but loses
it because seedRefreshTimestamp serializes times with time.RFC3339 (no
sub-second precision) while cachedAt usually has nanoseconds; fix by normalizing
cachedAt to the same precision before seeding so the parsed refresh time equals
cachedAt — e.g., truncate or round cachedAt to seconds (or format+parse with
time.RFC3339) so the value passed to seedMatviewCache/cachedAt matches what
seedRefreshTimestamp writes for testMatview in mc and the equality branch is
exercised.

In `@pkg/api/cache.go`:
- Around line 212-215: The current comparison in the cache invalidation logic
uses lastRefresh.Before(cacheVal.Timestamp) which treats equal timestamps as
stale; change the condition to allow equality (i.e., treat refresh newer than
cache as miss, but equal or older as a hit). Locate the block around the cache
check (where cacheVal.Timestamp and lastRefresh are compared) and replace the
strict Before check with a comparison that returns the cached value when
lastRefresh is equal or before cacheVal.Timestamp (for example, use a
!lastRefresh.After(cacheVal.Timestamp) style check) so equal timestamps reuse
the cache.

In `@pkg/api/tests.go`:
- Around line 444-458: Change buildTestsResultsFromPostgres to accept and use a
request context and pass it into GetDataFromCacheOrMatview instead of
context.TODO(); update the local generator closure to call
spec.buildTestsResults with that ctx (so the closure uses the same context) and
ensure the GetDataFromCacheOrMatview invocation uses the incoming ctx.
Specifically, modify the signature of buildTestsResultsFromPostgres to take ctx
context.Context, change generator := func(ctx context.Context) ... to
capture/forward the outer ctx to spec.buildTestsResults, and replace the
context.TODO() argument in the GetDataFromCacheOrMatview call with the incoming
ctx; update any callers of buildTestsResultsFromPostgres accordingly.

In `@scripts/updatesuites.go`:
- Around line 153-156: The call to sippyserver.RefreshData(&db.DB{DB: dbc}, nil,
false) passes nil for the cache client so the new matview refresh timestamps are
never written; change RefreshData usage so it either receives a real cache
client or makes the script explicitly offline-only: locate
sippyserver.RefreshData and the call site using db.DB and dbc, then either (A)
construct and pass the appropriate cache client (Redis client/adapter used
elsewhere in the app) instead of nil so timestamps are persisted, or (B) add a
guard in this script to refuse to run against a live deployment (e.g., require
an --offline flag or check an env var) and return an error if no offline mode is
set, documenting that RefreshData must not be invoked with a nil cache in
production.

---

Outside diff comments:
In `@pkg/api/tests.go`:
- Around line 686-690: The completion log message is wrong for the BigQuery
path: update the Debug message in the log.WithFields block that currently says
"buildTestsResultsFromPostgres completed" to the correct label (e.g.,
"buildTestsResultsFromBigQuery completed") so the elapsed/testReports metrics
are clearly attributed; locate the logging call that uses elapsed and
testReports (the log.WithFields(...) followed by .Debug(...)) and replace the
string to match the BigQuery builder function name.
- Around line 322-335: Normalize the incoming period value to a canonical token
before constructing TestResultsSpec so equivalent periods share the same cache
key: map ""/"default"/"current" to the same normalized value (e.g., ""), leaving
"twoDay" untouched, then use that normalized variable when building the
TestResultsSpec and calling spec.buildTestsResultsFromPostgres(dbc,
cacheClient); update the validation logic that currently checks period to still
allow the accepted inputs but perform the normalization immediately prior to
creating the TestResultsSpec.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 57ca2b8b-6b2a-4111-a4fa-a6ccd9c7f29a

📥 Commits

Reviewing files that changed from the base of the PR and between c14224a and 55b4ca0.

📒 Files selected for processing (11)

cmd/sippy-daemon/main.go
cmd/sippy/load.go
cmd/sippy/refresh.go
cmd/sippy/seed_data.go
pkg/api/cache.go
pkg/api/cache_test.go
pkg/api/job_runs.go
pkg/api/tests.go
pkg/sippyserver/pr_commenting_processor.go
pkg/sippyserver/server.go
scripts/updatesuites.go

coderabbitai · 2026-04-13T12:57:48Z

+	cachedAt := time.Now().UTC().Add(-5 * time.Minute)
+	seedMatviewCache(t, mc, spec, testResult{Value: "cached"}, cachedAt)
+	// Refresh at the exact same time as the cache entry
+	seedRefreshTimestamp(mc, testMatview, cachedAt)


⚠️ Potential issue | 🟡 Minor

Make this test hit the equality branch.

seedRefreshTimestamp serializes with time.RFC3339, so the refresh value loses sub-second precision. As written, cachedAt usually still carries nanos, which means the parsed refresh time is before the cached time, not equal to it. This test can pass without ever exercising the equality case.

Suggested fix

- cachedAt := time.Now().UTC().Add(-5 * time.Minute) + cachedAt := time.Now().UTC().Add(-5 * time.Minute).Truncate(time.Second)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cachedAt := time.Now().UTC().Add(-5 * time.Minute)

seedMatviewCache(t, mc, spec, testResult{Value: "cached"}, cachedAt)

// Refresh at the exact same time as the cache entry

seedRefreshTimestamp(mc, testMatview, cachedAt)

cachedAt := time.Now().UTC().Add(-5 * time.Minute).Truncate(time.Second)

seedMatviewCache(t, mc, spec, testResult{Value: "cached"}, cachedAt)

// Refresh at the exact same time as the cache entry

seedRefreshTimestamp(mc, testMatview, cachedAt)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/api/cache_test.go` around lines 637 - 640, The test is supposed to hit the equality branch but loses it because seedRefreshTimestamp serializes times with time.RFC3339 (no sub-second precision) while cachedAt usually has nanoseconds; fix by normalizing cachedAt to the same precision before seeding so the parsed refresh time equals cachedAt — e.g., truncate or round cachedAt to seconds (or format+parse with time.RFC3339) so the value passed to seedMatviewCache/cachedAt matches what seedRefreshTimestamp writes for testMatview in mc and the equality branch is exercised.

coderabbitai · 2026-04-13T12:57:48Z

+		if lastRefresh.Before(cacheVal.Timestamp) {
+			// not invalidated by a newer refresh, so use it (if we don't know the last refresh, still use it)
+			return cacheVal.Val, nil
+		}


⚠️ Potential issue | 🟡 Minor

Allow equal timestamps to reuse the cache.

Line 212 uses a strict Before check, so a refresh timestamp equal to the cached timestamp is treated as stale. The invalidation rule described in this PR is "refresh newer than cache => miss", so equality should still return the cached value.

Suggested fix

- if lastRefresh.Before(cacheVal.Timestamp) { + if !lastRefresh.After(cacheVal.Timestamp) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if lastRefresh.Before(cacheVal.Timestamp) {

// not invalidated by a newer refresh, so use it (if we don't know the last refresh, still use it)

return cacheVal.Val, nil

}

if !lastRefresh.After(cacheVal.Timestamp) {

// not invalidated by a newer refresh, so use it (if we don't know the last refresh, still use it)

return cacheVal.Val, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/api/cache.go` around lines 212 - 215, The current comparison in the cache invalidation logic uses lastRefresh.Before(cacheVal.Timestamp) which treats equal timestamps as stale; change the condition to allow equality (i.e., treat refresh newer than cache as miss, but equal or older as a hit). Locate the block around the cache check (where cacheVal.Timestamp and lastRefresh are compared) and replace the strict Before check with a comparison that returns the cached value when lastRefresh is equal or before cacheVal.Timestamp (for example, use a !lastRefresh.After(cacheVal.Timestamp) style check) so equal timestamps reuse the cache.

it will make no difference in practice and i believe the PR code is more legible.

@sosiouxme, fair point — exact timestamp equality between a matview refresh and a cache write is effectively impossible in practice, and lastRefresh.Before(cacheVal.Timestamp) does read more naturally than the double-negation alternative. Happy to defer to your judgement here.

coderabbitai · 2026-04-13T12:57:48Z

+	// NOTE: does not update timestamps to invalidate cached matview data; not clear if the use case for this script requires that.
 	sippyserver.RefreshData(&db.DB{
 		DB: dbc,
 	}, nil, false)


⚠️ Potential issue | 🟡 Minor

This still skips cache invalidation after the migration runs.

RefreshData(..., nil, ...) means the matview refresh timestamps introduced in this PR are never written, so Redis-backed RA/test caches can keep serving pre-migration data until TTL expiry. If this script can be run against a live Sippy deployment, it should either plumb a real cache client or be explicitly limited to offline use.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/updatesuites.go` around lines 153 - 156, The call to sippyserver.RefreshData(&db.DB{DB: dbc}, nil, false) passes nil for the cache client so the new matview refresh timestamps are never written; change RefreshData usage so it either receives a real cache client or makes the script explicitly offline-only: locate sippyserver.RefreshData and the call site using db.DB and dbc, then either (A) construct and pass the appropriate cache client (Redis client/adapter used elsewhere in the app) instead of nil so timestamps are persisted, or (B) add a guard in this script to refuse to run against a live deployment (e.g., require an --offline flag or check an env var) and return an error if no offline mode is set, documenting that RefreshData must not be invoked with a nil cache in production.

that is fine with me. i doubt anyone uses this script against production. if we do and anyone cares, we'll have to add the ability to specify the cache.

@sosiouxme, understood — the inline NOTE comment already documents the limitation clearly, so anyone who needs to add cache support later has the context. That's a reasonable call.

📝 Learnings were identified and not saved (knowledge base disabled). Enable

petr-muller

/hold
/lgtm
/approve
/meow

Some comments inline, nothing serious. Feel free to unhold or address, your call.

petr-muller · 2026-04-13T17:29:26Z

+		}
+		logrus.Debugf("matview %q refreshed at %v, will not use earlier cache entry from %v", matview, lastRefresh, cacheVal.Timestamp)
+	} else if strings.Contains(err.Error(), "connection refused") {
+		logrus.WithError(err).Fatalf("redis URL specified but got connection refused; exiting due to cost issues in this configuration")


Fatal in a library is brutal 😁

could be a networking blip but I guess we can robustify if we run into this too often

i do not know the history of why it was done this way and it did not seem likely to be a common problem that a blip would kill the server (but maybe it's been happening all the time and we just didn't notice because they're deployed redundantly)

petr-muller · 2026-04-13T17:31:52Z

Suggested change

return errors.Wrap(cacheErr, "couldn't get cache client")

an LLM pointed me here, it's not that I read all code

petr-muller · 2026-04-13T17:35:26Z

 		"elapsed": elapsed,
 		"reports": len(testReports),
-	}).Debug("BuildTestsResults completed")
+	}).Debug("buildTestsResultsFromPostgres completed")


Suggested change

}).Debug("buildTestsResultsFromPostgres completed")

}).Debug("buildTestsResultsFromBigQuery completed")

petr-muller · 2026-04-13T17:38:56Z

+	generator := func(ctx context.Context) (testResults, []error) {
+		return spec.buildTestsResults(dbc, matview)
+	}
+	result, errs := GetDataFromCacheOrMatview(context.TODO(), cacheClient,


there should be a request somewhere upstream, which means there should be a ctx somewhere too, would it be hard to plumb it here to avoid the context.TODO()?

agreed, i guess i was being lazy

petr-muller · 2026-04-13T17:39:29Z

 		"elapsed": elapsed,
 		"reports": len(testReports),
-	}).Debug("BuildTestsResults completed")
+	}).Debug("buildTestsResultsFromPostgres completed")


Suggested change

}).Debug("buildTestsResultsFromPostgres completed")

}).Debug("buildTestsResults completed")

yeah... actually it needs a name change anyway.

openshift-ci · 2026-04-13T17:41:44Z

@petr-muller:

Details

In response to this:

/hold
/lgtm
/approve
/meow

Some comments inline, nothing serious. Feel free to unhold or address, your call.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-04-13T17:42:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, sosiouxme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [petr-muller,sosiouxme]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sosiouxme

edits incoming

sosiouxme · 2026-04-14T21:14:02Z

+		if lastRefresh.Before(cacheVal.Timestamp) {
+			// not invalidated by a newer refresh, so use it (if we don't know the last refresh, still use it)
+			return cacheVal.Val, nil
+		}


it will make no difference in practice and i believe the PR code is more legible.

sosiouxme · 2026-04-14T21:15:50Z

+		}
+		logrus.Debugf("matview %q refreshed at %v, will not use earlier cache entry from %v", matview, lastRefresh, cacheVal.Timestamp)
+	} else if strings.Contains(err.Error(), "connection refused") {
+		logrus.WithError(err).Fatalf("redis URL specified but got connection refused; exiting due to cost issues in this configuration")


i do not know the history of why it was done this way and it did not seem likely to be a common problem that a blip would kill the server (but maybe it's been happening all the time and we just didn't notice because they're deployed redundantly)

sosiouxme · 2026-04-14T21:33:12Z

+	generator := func(ctx context.Context) (testResults, []error) {
+		return spec.buildTestsResults(dbc, matview)
+	}
+	result, errs := GetDataFromCacheOrMatview(context.TODO(), cacheClient,


agreed, i guess i was being lazy

sosiouxme · 2026-04-14T21:48:39Z

 		"elapsed": elapsed,
 		"reports": len(testReports),
-	}).Debug("BuildTestsResults completed")
+	}).Debug("buildTestsResultsFromPostgres completed")


yeah... actually it needs a name change anyway.

sosiouxme · 2026-04-14T21:50:17Z

 		"elapsed": elapsed,
 		"reports": len(testReports),
-	}).Debug("BuildTestsResults completed")
+	}).Debug("buildTestsResultsFromPostgres completed")


sosiouxme · 2026-04-14T22:58:24Z

+	// NOTE: does not update timestamps to invalidate cached matview data; not clear if the use case for this script requires that.
 	sippyserver.RefreshData(&db.DB{
 		DB: dbc,
 	}, nil, false)


that is fine with me. i doubt anyone uses this script against production. if we do and anyone cares, we'll have to add the ability to specify the cache.

fix silly error found by coderabbit

as coderabbit pointed out, we probably don't really want a caching problem to prevent running the matview refreshes.

practically speaking, it doesn't matter

…uery gives a chance to cancel the query if the server request is canceled.

openshift-ci · 2026-04-15T20:46:11Z

New changes are detected. LGTM label has been removed.

openshift-ci-robot · 2026-04-15T20:46:59Z

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.

Query results are written to the cache with a timestamp.

On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

New Features

Added caching across test results, job-run risk analysis, and API responses to improve performance and use cache-aware refresh timestamps for matview freshness.

CLI commands for refresh/seed now support optional cache usage.

Changed

Test and risk endpoints now leverage cache when available.

Removals

Removed /api/canary endpoint.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pkg/api/cache.go (1)

220-221: ⚠️ Potential issue | 🟠 Major

Don't Fatal from this cache helper.

GetDataFromCacheOrMatview is used on request paths; a transient Redis outage here should degrade to regeneration, not terminate the process.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/cache.go` around lines 220 - 221, In GetDataFromCacheOrMatview, do
not call logrus.Fatalf (which exits the process) on a transient Redis
"connection refused" error; instead log non-fatally (e.g.,
logrus.WithError(err).Warnf or Errorf) and treat it as a cache miss by returning
control to the caller so the data can be regenerated (e.g., return a nil cache
result / an error that the caller interprets as "no cache"). Replace the Fatalf
call in the connection-refused branch with a non-fatal log and adjust the
function return path so it does not terminate the process.

🧹 Nitpick comments (1)

cmd/sippy/load.go (1)
148-152: Log the degraded cache mode here.

Swallowing cacheErr is fine for keeping the load moving, but it also means later matview refreshes run with cacheClient == nil, so refresh timestamps are never recorded and cached RA/test responses can stay stale until TTL expiry. Please surface this path with at least a warning/metric so a successful load does not hide disabled invalidation.
Possible tweak
 // likewise get a cache client if possible, though some things operate without it.
 cacheClient, cacheErr := f.CacheFlags.GetCacheClient()
 if cacheErr != nil {
+	log.WithError(cacheErr).Warn("cache unavailable; matview refreshes will skip invalidation timestamp writes")
 	cacheClient = nil // error hygiene, since we pass this down to quite a few functions
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/load.go` around lines 148 - 152, The code swallows cacheErr after
calling f.CacheFlags.GetCacheClient(), which hides degraded-cache mode; update
the block around f.CacheFlags.GetCacheClient() so that when cacheErr != nil you
log a warning (including cacheErr.Error()) and/or emit a metric indicating cache
invalidation is disabled before setting cacheClient = nil; reference the
cacheClient and cacheErr variables and the GetCacheClient call so the log/metric
is placed right where cacheErr is handled to surface the degraded cache path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/cache.go`:
- Around line 201-202: The json.Unmarshal error handling in the cache retrieval
path should treat corrupt cache entries as cache-misses instead of returning an
error; modify the block that unmarshals into cacheVal to log or warn about the
unmarshal failure (include error and cacheKey), optionally evict the bad entry,
and then fall through to call generateFn to produce the value and continue
normal execution rather than returning defaultVal and an error slice; update the
code that currently references json.Unmarshal, cacheVal, cacheKey, defaultVal,
and generateFn so the function returns the generated value (and no error) on
corrupt cache blobs.

In `@pkg/api/tests.go`:
- Around line 533-537: The cache key currently uses a constant
NewCacheSpec(testResultsBQ{}, "BigQueryTestsResults~", nil), causing all
BigQuery /api/tests requests to share one entry; update the
GetDataFromCacheOrGenerate call to build the cache spec from the actual request
spec (include release, period, filter, collapse or the request
struct/stringified key) so each unique request produces a unique NewCacheSpec
(replace the constant testResultsBQ{} key with the request-derived key) while
keeping the same cache, RequestOptions (testResultsCacheDuration) and the call
to GetDataFromCacheOrGenerate.

---

Duplicate comments:
In `@pkg/api/cache.go`:
- Around line 220-221: In GetDataFromCacheOrMatview, do not call logrus.Fatalf
(which exits the process) on a transient Redis "connection refused" error;
instead log non-fatally (e.g., logrus.WithError(err).Warnf or Errorf) and treat
it as a cache miss by returning control to the caller so the data can be
regenerated (e.g., return a nil cache result / an error that the caller
interprets as "no cache"). Replace the Fatalf call in the connection-refused
branch with a non-fatal log and adjust the function return path so it does not
terminate the process.

---

Nitpick comments:
In `@cmd/sippy/load.go`:
- Around line 148-152: The code swallows cacheErr after calling
f.CacheFlags.GetCacheClient(), which hides degraded-cache mode; update the block
around f.CacheFlags.GetCacheClient() so that when cacheErr != nil you log a
warning (including cacheErr.Error()) and/or emit a metric indicating cache
invalidation is disabled before setting cacheClient = nil; reference the
cacheClient and cacheErr variables and the GetCacheClient call so the log/metric
is placed right where cacheErr is handled to surface the degraded cache path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 093e1c8b-5796-4aa8-958a-4556161ef67a

📥 Commits

Reviewing files that changed from the base of the PR and between 55b4ca0 and 49e5677.

📒 Files selected for processing (8)

cmd/sippy/load.go
cmd/sippy/refresh.go
pkg/api/cache.go
pkg/api/cache_test.go
pkg/api/job_runs.go
pkg/api/tests.go
pkg/apis/cache/cache.go
pkg/sippyserver/server.go

✅ Files skipped from review due to trivial changes (1)

pkg/sippyserver/server.go

🚧 Files skipped from review as they are similar to previous changes (3)

cmd/sippy/refresh.go
pkg/api/cache_test.go
pkg/api/job_runs.go

openshift-merge-bot · 2026-04-15T20:59:41Z

Scheduling required tests:
/test e2e

this is not strictly necessary since the results were already cached at the request level and we don't have a way to find out when the backing tables in BQ are refreshed. but it seemed like a good idea and not too much trouble to have the two forks work the same.

both postgres and BQ forks can use the same request processing code. also the "period" is now standardized for caching purposes.

fix problem coderabbit noted where the cache entry doesn't unmarshal for some reason... better to treat it like a cache miss.

openshift-ci-robot · 2026-04-15T22:34:16Z

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.

Query results are written to the cache with a timestamp.

On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

New Features

Global caching added for test results, job-run risk analysis, materialized-view refresh tracking, and API responses; CLI refresh/seed support optional cache usage.

Changed

Refresh workflow now records cache-based matview timestamps; endpoints and processors use cache when available; tests and result builders refactored to use cache-aware flows.

Removals

/api/canary endpoint removed.

Tests

Comprehensive tests added for cache/matview behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/api/cache.go (1)

140-146: ⚠️ Potential issue | 🟡 Minor

Don't let rounding make Expiry a no-op.

Once Line 143 runs, the duration is recomputed from now and the previously selected cacheOptions.Expiry is discarded. Any caller that sets both Expiry and CRTimeRoundingFactor will get the next rounding boundary from "now", not from now + Expiry.

Suggested fix

 func CalculateRoundedCacheDuration(cacheOptions cache.RequestOptions) time.Duration {
 	// require cacheDuration for persistence logic
 	cacheDuration := defaultCacheDuration
 	if cacheOptions.Expiry > 0 {
 		cacheDuration = cacheOptions.Expiry
 	}
 	if cacheOptions.CRTimeRoundingFactor > 0 {
 		now := time.Now().UTC()
+		expiresAt := now.Add(cacheDuration)
 		// Only cache until the next rounding duration
-		cacheDuration = now.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
+		cacheDuration = expiresAt.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
 	}
 	return cacheDuration
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/cache.go` around lines 140 - 146, The current logic overwrites
cacheDuration set from cacheOptions.Expiry when
cacheOptions.CRTimeRoundingFactor is present, causing Expiry to be ignored;
update the CRTimeRoundingFactor branch to compute the rounding boundary from now
+ cacheDuration (use the existing cacheDuration after applying Expiry) instead
of from now so that the final cacheDuration = roundedBoundary.Sub(now).
Concretely, after setting cacheDuration from cacheOptions.Expiry, when
CRTimeRoundingFactor > 0 compute target := now.Add(cacheDuration), then set
cacheDuration =
target.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
(or equivalent) so both cacheOptions.Expiry and
cacheOptions.CRTimeRoundingFactor are honored for cacheDuration.

pkg/api/tests.go (1)

251-257: ⚠️ Potential issue | 🟠 Major

Guard limit before slicing.

Line 253 and Line 286 only reject zero. A request like ?limit=-1 falls through to tests[:limit] and panics the handler.

Suggested fix

 func (tests TestsAPIResult) limit(req *http.Request) TestsAPIResult {
-	limit, _ := strconv.Atoi(req.URL.Query().Get("limit"))
-	if limit == 0 || len(tests) < limit {
+	limit, err := strconv.Atoi(req.URL.Query().Get("limit"))
+	if err != nil || limit <= 0 || len(tests) < limit {
 		return tests
 	}
 
 	return tests[:limit]
 }

 func (tests TestsAPIResultBQ) limit(req *http.Request) TestsAPIResultBQ {
-	limit, _ := strconv.Atoi(req.URL.Query().Get("limit"))
-	if limit == 0 || len(tests) < limit {
+	limit, err := strconv.Atoi(req.URL.Query().Get("limit"))
+	if err != nil || limit <= 0 || len(tests) < limit {
 		return tests
 	}
 
 	return tests[:limit]
 }

Also applies to: 284-290

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 251 - 257, The limit parsing in
TestsAPIResult.limit allows negative or parse-failed values through and can
cause a panic when slicing (tests[:limit]); update the TestsAPIResult.limit
method to validate the parsed limit from strconv.Atoi by checking the Atoi
error, treating non-positive (<=0) or failed parses as "no limit", and clamp the
positive limit to at most len(tests) before returning tests[:limit]; apply the
same guard/clamping logic to the other identical limit-handling block in this
file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/tests.go`:
- Around line 443-445: The fmt.Errorf calls that format the variable errs (a
[]error) use the "%s" verb; change both occurrences to use "%v" so the error
slice prints its elements correctly (update the fmt.Errorf invocations that
reference errs in pkg/api/tests.go to use "%v" instead of "%s").

---

Outside diff comments:
In `@pkg/api/cache.go`:
- Around line 140-146: The current logic overwrites cacheDuration set from
cacheOptions.Expiry when cacheOptions.CRTimeRoundingFactor is present, causing
Expiry to be ignored; update the CRTimeRoundingFactor branch to compute the
rounding boundary from now + cacheDuration (use the existing cacheDuration after
applying Expiry) instead of from now so that the final cacheDuration =
roundedBoundary.Sub(now). Concretely, after setting cacheDuration from
cacheOptions.Expiry, when CRTimeRoundingFactor > 0 compute target :=
now.Add(cacheDuration), then set cacheDuration =
target.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
(or equivalent) so both cacheOptions.Expiry and
cacheOptions.CRTimeRoundingFactor are honored for cacheDuration.

In `@pkg/api/tests.go`:
- Around line 251-257: The limit parsing in TestsAPIResult.limit allows negative
or parse-failed values through and can cause a panic when slicing
(tests[:limit]); update the TestsAPIResult.limit method to validate the parsed
limit from strconv.Atoi by checking the Atoi error, treating non-positive (<=0)
or failed parses as "no limit", and clamp the positive limit to at most
len(tests) before returning tests[:limit]; apply the same guard/clamping logic
to the other identical limit-handling block in this file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 80e6632c-b39c-4fe5-8870-31f91c66ec69

📥 Commits

Reviewing files that changed from the base of the PR and between 49e5677 and f81c9a4.

📒 Files selected for processing (4)

pkg/api/cache.go
pkg/api/tests.go
pkg/apis/cache/cache.go
pkg/sippyserver/server.go

🚧 Files skipped from review as they are similar to previous changes (2)

pkg/apis/cache/cache.go
pkg/sippyserver/server.go

openshift-merge-bot · 2026-04-15T22:58:19Z

Scheduling required tests:
/test e2e

openshift-ci-robot · 2026-04-16T00:51:02Z

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.

Query results are written to the cache with a timestamp.

On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

New Features

Global caching introduced across APIs, CLI commands, background processors and materialized-view refresh tracking.

Changed

Refresh/seed flows and server endpoints now use optional cache clients; matview refreshes record cache timestamps; test/result generation and risk analysis are cache-aware and refactored for unified result specs.

Removals

/api/canary endpoint removed.

Tests

Extensive tests added for cache and matview behaviors.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-bot · 2026-04-16T01:00:35Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-04-16T01:22:31Z

@sosiouxme: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

sosiouxme added 7 commits April 6, 2026 19:32

remove apparent cruft /api/canary

8235ba9

🤖 refreshMatview: store refresh timestamp

7092550

🤖 Assisted by Claude Code

provide a Redis cache client to RefreshData

326a472

cache.go: add method to cache matview-sourced data

24e6117

🤖 cache_test.go: add tests for matview-aware cache

7794f19

🤖 Assisted by Claude Code

🤖 plumb cache client for buildTestsResultsFromPostgres

429bf11

also minor refactoring to prep for caching 🤖 Assisted by Claude Code

buildTestsResultsFromPostgres: use matview-aware cache

55b4ca0

... which replaces caching the tests at the api response

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 7, 2026

openshift-ci bot requested review from stbenjam and xueqzhan April 7, 2026 00:05

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2026

petr-muller added a commit to petr-muller/sippy that referenced this pull request Apr 9, 2026

openshift#3409

530c547

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

petr-muller approved these changes Apr 13, 2026

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026

openshift-ci bot assigned petr-muller Apr 13, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026

sosiouxme commented Apr 14, 2026

View reviewed changes

sosiouxme added 5 commits April 15, 2026 16:30

sippy/load.go: wrap cacheErr instead of err

ec5ae17

fix silly error found by coderabbit

sippy/refresh.go: allow matview refresh when cache errs

60adc1d

as coderabbit pointed out, we probably don't really want a caching problem to prevent running the matview refreshes.

api/cache_test.go: remove RefreshAtExactCacheTime test case

86b23f5

practically speaking, it doesn't matter

api/tests.go: plumb server context to buildTestsResultsFromPostgres q…

4f7158c

…uery gives a chance to cancel the query if the server request is canceled.

api/tests.go: clean up pg method naming/logging

1429cb1

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2026

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/api/cache.go Outdated

Comment thread pkg/api/tests.go Outdated

sosiouxme added 3 commits April 15, 2026 18:25

api/tests.go: standardize test analysis spec creation

bf3ada2

both postgres and BQ forks can use the same request processing code. also the "period" is now standardized for caching purposes.

api/cache.go: handle broken cache gracefully

f81c9a4

fix problem coderabbit noted where the cache entry doesn't unmarshal for some reason... better to treat it like a cache miss.

sosiouxme force-pushed the 20260402-TRT-2564-RA-caching branch from 49e5677 to f81c9a4 Compare April 15, 2026 22:33

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/api/tests.go

minor formatting detail

e06b56a

	}).Debug("buildTestsResultsFromPostgres completed")
	}).Debug("buildTestsResultsFromBigQuery completed")

	}).Debug("buildTestsResultsFromPostgres completed")
	}).Debug("buildTestsResults completed")

Conversation

sosiouxme commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 7, 2026

Uh oh!

openshift-ci-robot commented Apr 7, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sosiouxme commented Apr 7, 2026

Uh oh!

openshift-ci-robot commented Apr 7, 2026

Uh oh!

sosiouxme commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026

Chat

CodeRabbit commands

Other keywords and placeholders

Status, support, documentation and community

Uh oh!

sosiouxme commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

petr-muller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sosiouxme commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

openshift-ci-robot commented Apr 7, 2026 •

edited by openshift-ci bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci bot

Loading

coderabbitai bot Apr 13, 2026 •

edited

Loading

coderabbitai bot Apr 13, 2026 •

edited

Loading

openshift-ci-robot commented Apr 15, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Apr 15, 2026 •

edited by openshift-ci bot

Loading