feat: make executor event handling timeout configurable#6278
feat: make executor event handling timeout configurable#6278sachin9058 wants to merge 34 commits intomindersec:mainfrom
Conversation
|
Thanks, nice cleanup to stop using a package-level constant directly inside the goroutine. However this only adds the field, it does not expose configuration to callers or to the application's config. Please:
I can re-review after those changes. |
|
@krrish175-byte Thanks for the detailed feedback! I've addressed the requested changes:
Also fixed lint issues and ensured all tests pass. Would appreciate a re-review. |
|
CI failure was due to parallel execution in a DB test causing unique constraint conflicts. Disabled parallel execution for the affected test to ensure proper isolation. All tests now pass locally with fresh runs. |
|
Addressed the lint issues by adding proper documentation for the exported method and handling the paralleltest rule appropriately for the DB test (which relies on shared state and cannot run safely in parallel). |
evankanderson
left a comment
There was a problem hiding this comment.
I'm particularly concerned about the change on line 133 of handler.go. Have you tested that this actually affects the timeout for executions?
| randomEntities := createTestRandomEntities(t) | ||
|
|
||
| for _, tt := range tests { | ||
| tt := tt // capture range variable |
There was a problem hiding this comment.
We don't need to do this since Go 1.22 or so.
| } | ||
| } | ||
|
|
||
| //nolint:paralleltest // DB test uses shared state, cannot run in parallel |
There was a problem hiding this comment.
Why can't this run in parallel? It seems like it has been doing so in the past without flakes, because each one operates on its own randomly-named profile.
There was a problem hiding this comment.
@evankanderson I enabled parallel execution for the subtests as suggested.
Adding t.Parallel() at the top-level caused unrelated tests (particularly those using shared gomock controllers and suite state) to fail due to increased concurrency.
To keep tests stable while still benefiting from parallel subtests, I disabled the top-level parallel lint checks with justification and kept subtests parallelized.
All tests and lints are now passing.
| zerolog.Ctx(ctx).Info(). | ||
| Dur("execution_timeout", executionTimeout). | ||
| Msg("executor event handler initialized") |
There was a problem hiding this comment.
Was this for debugging? It seems like it may be "noise" in normal operation -- maybe move it to the Debug level?
| nil, | ||
| nil, | ||
| nil, | ||
| 10*time.Second, |
There was a problem hiding this comment.
Rather than exposing a getter for the sole purpose of testing, I'd rather see a test which actually checks that the timeout happens earlier.
| ctx, cancel := context.WithTimeout(msgCtx, DefaultExecutionTimeout) | ||
| ctx := msgCtx | ||
| cancel := func() {} |
There was a problem hiding this comment.
What's going on here? It looks like this is removing the timeout on the execution context, rather than allowing it to be customized.
* Add VEX / ignore mitigations for CVE-2026-34040 * Update Trivy action to work with code scanning
mindersec#6211) * feat(inventory): add evaluation_outputs table for persisting structured eval data Implements Phase 1 of the Inventory Gathering feature (Issues mindersec#6192 and mindersec#6105). Changes: - Add evaluation_outputs table via migration 000116 - Add sqlc queries: InsertEvaluationOutput, GetEvaluationOutput, DeleteEvaluationOutputsByEvaluationIDs - Extend StoreEvaluationStatus to accept and persist structured output - Regenerate sqlc models, querier interface, and mocks Signed-off-by: daksh.pathak.ug24@nsut.ac.in * Update database/query/eval_outputs.sql Co-authored-by: Evan Anderson <evan.k.anderson@gmail.com> * chore(gen): run make gen for eval-outputs * Address PR review feedback: fix eval outputs error handling and sql param --------- Signed-off-by: daksh.pathak.ug24@nsut.ac.in Co-authored-by: Evan Anderson <evan.k.anderson@gmail.com>
…indersec#6259) Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.1.3 to 4.1.4. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Commits](go-jose/go-jose@v4.1.3...v4.1.4) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v4 dependency-version: 4.1.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ec#6264) fix(test): resolve flaky TestGetGrpcConnection and upgrade dependencies This commit fixes the flakiness in TestGetGrpcConnection by replacing the hardcoded port 9 with a dynamically assigned closed port. It also upgrades the Go version and BuildKit to resolve several high-severity security vulnerabilities.
…#6230) Bumps the tools group in /tools with 4 updates: [github.com/bufbuild/buf](https://github.com/bufbuild/buf), [github.com/golangci/golangci-lint/v2](https://github.com/golangci/golangci-lint), [github.com/openfga/cli](https://github.com/openfga/cli) and [golang.org/x/tools](https://github.com/golang/tools). Updates `github.com/bufbuild/buf` from 1.66.0 to 1.66.1 - [Release notes](https://github.com/bufbuild/buf/releases) - [Changelog](https://github.com/bufbuild/buf/blob/main/CHANGELOG.md) - [Commits](bufbuild/buf@v1.66.0...v1.66.1) Updates `github.com/golangci/golangci-lint/v2` from 2.11.2 to 2.11.4 - [Release notes](https://github.com/golangci/golangci-lint/releases) - [Changelog](https://github.com/golangci/golangci-lint/blob/main/CHANGELOG.md) - [Commits](golangci/golangci-lint@v2.11.2...v2.11.4) Updates `github.com/openfga/cli` from 0.7.10 to 0.7.12 - [Release notes](https://github.com/openfga/cli/releases) - [Changelog](https://github.com/openfga/cli/blob/main/CHANGELOG.md) - [Commits](openfga/cli@v0.7.10...v0.7.12) Updates `golang.org/x/tools` from 0.42.0 to 0.43.0 - [Release notes](https://github.com/golang/tools/releases) - [Commits](golang/tools@v0.42.0...v0.43.0) --- updated-dependencies: - dependency-name: github.com/bufbuild/buf dependency-version: 1.66.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tools - dependency-name: github.com/golangci/golangci-lint/v2 dependency-version: 2.11.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tools - dependency-name: github.com/openfga/cli dependency-version: 0.7.12 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tools - dependency-name: golang.org/x/tools dependency-version: 0.43.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: tools ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…uth flow (mindersec#6249) * ✨ Improve CLI help for repo command with examples Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * fix: correct repo command from add to register in CLI examples Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * ✨ Add pre-validation for repo register to validate flags before auth - Validate required inputs (--provider, --name/--all) - Prevent unnecessary authentication flow when input is incomplete - Improve CLI UX with clear error messages and examples Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * ✨ Fix validation logic for repo register command - Remove duplicate validation from RegisterCmd - Allow --all without requiring --provider - Use Cobra flags for validation consistency Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> --------- Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com>
…indersec#6252) * Move internal/engine/errors to pkg/engine/errors to reduce coupling Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * fix: resolve lint issues --------- Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com>
* removed git_pr_diffs feature flag Signed-off-by: DharunMR <maddharun56@gmail.com> * removed the flag from testcase Signed-off-by: DharunMR <maddharun56@gmail.com> --------- Signed-off-by: DharunMR <maddharun56@gmail.com>
) Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.23...4.18.1) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.18.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…sec#6267) Update documentation Co-authored-by: evankanderson <evankanderson@users.noreply.github.com>
mindersec#6270) fix(metrics): swap incorrectly assigned entity and profile duration metrics Signed-off-by: theycallmeaabie <theycallmeaabie@gmail.com>
* removed dependency_extract feature flag Signed-off-by: DharunMR <maddharun56@gmail.com> * solved lint issues Signed-off-by: DharunMR <maddharun56@gmail.com> --------- Signed-off-by: DharunMR <maddharun56@gmail.com>
…ersec#6266) * Move internal/engine/errors to pkg/engine/errors to reduce coupling Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * fix: resolve lint issues * refactor: decouple engine errors from internal/db via adapter layer * fix: add package comment and correct license header * fix: cleanup imports and resolve lint issues * fix: update generated protobuf files --------- Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com>
…constraint conflicts
3433b54 to
d662ebb
Compare
…indersec#6252) * Move internal/engine/errors to pkg/engine/errors to reduce coupling Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * fix: resolve lint issues --------- Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com>
* removed git_pr_diffs feature flag Signed-off-by: DharunMR <maddharun56@gmail.com> * removed the flag from testcase Signed-off-by: DharunMR <maddharun56@gmail.com> --------- Signed-off-by: DharunMR <maddharun56@gmail.com>
* removed dependency_extract feature flag Signed-off-by: DharunMR <maddharun56@gmail.com> * solved lint issues Signed-off-by: DharunMR <maddharun56@gmail.com> --------- Signed-off-by: DharunMR <maddharun56@gmail.com>
…ersec#6266) * Move internal/engine/errors to pkg/engine/errors to reduce coupling Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com> * fix: resolve lint issues * refactor: decouple engine errors from internal/db via adapter layer * fix: add package comment and correct license header * fix: cleanup imports and resolve lint issues * fix: update generated protobuf files --------- Signed-off-by: Sachin Kumar <sachinkumar905846@gmail.com>
test: add unit tests for shouldRemediate and shouldAlert
* added additional info for deleting ruletype Signed-off-by: DharunMR <maddharun56@gmail.com> * solved linting Signed-off-by: DharunMR <maddharun56@gmail.com> * added struct approach Signed-off-by: DharunMR <maddharun56@gmail.com> --------- Signed-off-by: DharunMR <maddharun56@gmail.com>
faf0913 to
9ecafed
Compare
|
Addressed all review comments:
All CI checks are now passing. Please take another look 🙏 |
evankanderson
left a comment
There was a problem hiding this comment.
Thanks for adding the configuration for the execution timeout. It seems like there's a lot of other (spurious) code movement and comment removal in this beyond what's needed for the PR. Could you revert those changes, as they mostly seem to be removing comments or altering formatting which is already accepted by gofmt and gci?
| // This allows us to cancel rule evaluation directly when terminationContext | ||
| // is cancelled. |
There was a problem hiding this comment.
Still wondering why we're removing this comment (and rewriting a bunch of other comments in this file)
There was a problem hiding this comment.
We still seem to be removing comments on lines 40-42, 99, 178-179, and 186 (at least). There's also some other comment "movement" (small edits / movement) that doesn't look intentional.
| // NOTE: we're _deliberately_ "escaping" from the parent context's Cancel/Done | ||
| // completion, because the default watermill behavior for both Go channels and | ||
| // SQL is to process messages sequentially, but we need additional parallelism | ||
| // beyond that. When we switch to a different message processing system, we | ||
| // should aim to remove this goroutine altogether and have the messaging system | ||
| // provide the parallelism. | ||
| // We _do_ still want to cancel on shutdown, however. | ||
| // TODO: Make this timeout configurable | ||
| // Escape parent cancellation but still support shutdown cancellation |
There was a problem hiding this comment.
It's fine to remove the TODO, but this code is subtle enough that I've broken it at least once and Stacklok had an outage around it, so I'd prefer to keep the full comment and context.
| } | ||
|
|
||
| ctx, cancel := context.WithTimeout(msgCtx, DefaultExecutionTimeout) | ||
| // ✅ FIX: Proper configurable timeout |
There was a problem hiding this comment.
Prefer not to include Emoji in comments. In particular, ✅ FIX does not really help future readers understand what was broken before, and why this part is important.
| Project: engcontext.Project{ID: inf.ProjectID}, | ||
| // TODO: extract Provider name from ProviderID? | ||
| Provider: engcontext.Provider{ | ||
| Name: inf.ProviderID.String(), |
There was a problem hiding this comment.
Provider names are meangingful strings like "dockerhub-something" or "githubapp-something". ProviderIDs are UUIDs that are mostly meaningless. Let's not pollute the meaningful namespace with other types of data.
There was a problem hiding this comment.
The provider ID is still not a provider name, and we still shouldn't be putting the ID in the name field.
| // record telemetry regardless of error. We explicitly record telemetry | ||
| // here even though we also record it in the middleware because the evaluation | ||
| // is done in a separate goroutine which usually still runs after the middleware | ||
| // had already recorded the telemetry. |
There was a problem hiding this comment.
Please keep this comment, as it will help prevent a future refactor which deletes useful information.
|
|
||
| // expect flush | ||
| for i := 0; i < parallelOps; i++ { | ||
| t.Log("waiting for flush") |
There was a problem hiding this comment.
This log message is helpful when debugging tests where a message was not received, as otherwise the test simply stalls with no output.
There was a problem hiding this comment.
See lines 88 and 54 for why we have these Log messages.
| elapsed := time.Since(start) | ||
|
|
||
| //Ensure execution timed out early | ||
| require.Less(t, elapsed, 3*time.Second, "execution did not timeout as expected") |
There was a problem hiding this comment.
You can also use a shared value in this function and the closure passed to executor.EXPECT() to verify that the "abort" branch got called.
|
|
||
| //nolint:paralleltest,tparallel // top-level parallel causes interference with other tests (gomock/shared state) | ||
| func TestCreateProfileStatusSingleRuleTransitions(t *testing.T) { | ||
| t.Parallel() |
There was a problem hiding this comment.
I didn't see this with:
$ go test -count 1 ./internal/db
ok github.com/mindersec/minder/internal/db 16.139s|
@evankanderson Thanks for the feedback! I’ve reverted unrelated changes and restored the original structure and comments. The PR now focuses only on making the execution timeout configurable, and applies the configured value without altering existing behavior. Please take another look. |
evankanderson
left a comment
There was a problem hiding this comment.
You should coordinate with #6285 -- in particular, that PR:
- Does not gratuitously rewrite existing comments and remove comments about subtle behavior like
msg.Copybeing needed to avoid memory sharing. - Adds a server config setting, so that the timeout is actually configurable.
This PR:
- Tests the actual timeout behavior, rather than simply reading configuration and hoping the timeout works.
| // This allows us to cancel rule evaluation directly when terminationContext | ||
| // is cancelled. |
There was a problem hiding this comment.
Still wondering why we're removing this comment (and rewriting a bunch of other comments in this file)
Wire executor timeout from server event config into the handler and keep the default fallback behavior for non-positive values. Add behavior-focused tests to validate timeout cancellation and configured timeout propagation, and preserve subtle handler comments for maintainability.
|
@evankanderson Can u review it again i have fixed the things according to what u asked me to do.. |
evankanderson
left a comment
There was a problem hiding this comment.
Sorry, a lot of PRs built up while I was out last week.
This PR still has a bunch of extraneous ("type 1") changes, along with a few type 2 changes that look to be accidental.
| Project: engcontext.Project{ID: inf.ProjectID}, | ||
| // TODO: extract Provider name from ProviderID? | ||
| Provider: engcontext.Provider{ | ||
| Name: inf.ProviderID.String(), |
There was a problem hiding this comment.
The provider ID is still not a provider name, and we still shouldn't be putting the ID in the name field.
| // This allows us to cancel rule evaluation directly when terminationContext | ||
| // is cancelled. |
There was a problem hiding this comment.
We still seem to be removing comments on lines 40-42, 99, 178-179, and 186 (at least). There's also some other comment "movement" (small edits / movement) that doesn't look intentional.
|
|
||
| if err := inf.WithExecutionIDFromMessage(msg); err != nil { | ||
| logger.Info(). | ||
| logger.Debug(). |
There was a problem hiding this comment.
Is this logging change intentional? If so, why?
| queued := pq.GetQueue() | ||
|
|
||
| go func() { | ||
| t.Log("Running eventer") |
There was a problem hiding this comment.
This log message can be helpful when debugging the asynchronous event sending/receiving nature and dropped / missed messages.
| engine.DefaultExecutionTimeout, | ||
| ) | ||
|
|
||
| t.Log("waiting for eventer to start") |
|
|
||
| // expect flush | ||
| for i := 0; i < parallelOps; i++ { | ||
| t.Log("waiting for flush") |
There was a problem hiding this comment.
See lines 88 and 54 for why we have these Log messages.
| func TestExecutorEventHandler_UsesConfiguredTimeout(t *testing.T) { | ||
| t.Parallel() | ||
|
|
||
| ctrl := gomock.NewController(t) | ||
| defer ctrl.Finish() | ||
|
|
||
| projectID := uuid.New() | ||
| providerID := uuid.New() | ||
| repositoryID := uuid.New() | ||
| executionID := uuid.New() | ||
|
|
||
| executor := mockengine.NewMockExecutor(ctrl) | ||
| executor.EXPECT(). | ||
| EvalEntityEvent(gomock.Any(), gomock.Any()). | ||
| DoAndReturn(func(ctx context.Context, _ *entities.EntityInfoWrapper) error { | ||
| deadline, ok := ctx.Deadline() | ||
| require.True(t, ok, "expected execution context deadline") | ||
| remaining := time.Until(deadline) | ||
| require.Greater(t, remaining, 40*time.Second) | ||
| require.Less(t, remaining, 80*time.Second) | ||
| return nil | ||
| }) | ||
|
|
||
| evt, err := eventer.New(context.Background(), nil, &serverconfig.EventConfig{Driver: "go-channel"}) | ||
| require.NoError(t, err) | ||
|
|
||
| handler := engine.NewExecutorEventHandler( | ||
| context.Background(), | ||
| evt, | ||
| nil, | ||
| executor, | ||
| 1*time.Minute, | ||
| ) | ||
|
|
||
| eiw := entities.NewEntityInfoWrapper(). | ||
| WithProviderID(providerID). | ||
| WithProjectID(projectID). | ||
| WithRepository(&minderv1.Repository{Name: "test"}). | ||
| WithID(repositoryID). | ||
| WithExecutionID(executionID) | ||
|
|
||
| msg, err := eiw.BuildMessage() | ||
| require.NoError(t, err) | ||
|
|
||
| require.NoError(t, handler.HandleEntityEvent(msg)) | ||
| handler.Wait() | ||
| } |
There was a problem hiding this comment.
This feels like it's testing the same thing as TestExecutorEventHandler_RespectsTimeout in a slightly different way. Do we need both? How does coverage change if we remove one of these tests?
Summary
This PR makes the execution timeout in
ExecutorEventHandlerconfigurable instead of hardcoded.Previously, the timeout was fixed using
DefaultExecutionTimeout. This change introduces a configurable field (executionTimeout) in the handler, allowing flexibility for different environments and workloads while preserving the existing default behavior.The existing TODO related to configurability has been addressed and removed.
No functional behavior changes are introduced.
Fixes #NONE
Testing
make build-minder-clisuccessfullymake lint-fix(no issues)go test ./...(all tests passing)Steps: