Skip to content

chore(CI): Debug artifact download#110

Open
mtodor wants to merge 1 commit intomainfrom
mtodor/ROX-32888-fix-docs-eval-update-try2
Open

chore(CI): Debug artifact download#110
mtodor wants to merge 1 commit intomainfrom
mtodor/ROX-32888-fix-docs-eval-update-try2

Conversation

@mtodor
Copy link
Copy Markdown
Collaborator

@mtodor mtodor commented Apr 10, 2026

Description

Fixes the download-artifact path issue in the model-evaluation workflow and adds debug tooling. download-artifact@v8 extracts a single artifact directly into the target path without creating a per-artifact subdirectory, breaking the expected path eval-results/eval-results-<model>/.... Fixed by checking both possible file locations: the subdirectory path (multiple artifacts) and the flat path (single artifact).

Validation

=== eval-results directory structure ===
eval-results:
mcpchecker-stackrox-mcp-e2e-out.json

AI-assisted development prompts

This PR was developed with AI assistance. Below are the prompts used:

Prompt: "This still does not work. I get: 'Warning: No results found for model gpt-5-mini'. But everything is downloaded properly. We need to test this somehow on a PR. Let's disable temporally creation of PR. List files and trigger it on PR. Research problem and prepare ticket to implement solution."
Response: Added temporary pull_request trigger, debug ls -R eval-results step, and commented out the PR creation step. Created beads task for the investigation.

Prompt: "This is what we got: eval-results: mcpchecker-stackrox-mcp-e2e-out.json"
Response: Confirmed that download-artifact@v8 places a single artifact flat at the target path with no subdirectory. Created fix ticket stackrox-mcp-ovo to handle both path layouts.

Prompt: "do it" (implement the fix)
Response: Applied the fix: checks subdirectory path first, falls back to flat path. Reverted all debug changes (removed pull_request trigger, debug step, restored PR creation).

Prompt: "Let's also put pattern matching back"
Response: Restored pattern: eval-results-* to the download-artifact step.

@mtodor mtodor requested a review from janisz as a code owner April 10, 2026 15:31
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
361 2 359 12
View the full list of 2 ❄️ flaky test(s)
::policy 1

Flake rate in main: 100.00% (Passed 0 times, Failed 12 times)

Stack Traces | 0s run time
- test violation 1
- test violation 2
- test violation 3
::policy 4

Flake rate in main: 100.00% (Passed 0 times, Failed 12 times)

Stack Traces | 0s run time
- testing multiple alert violation messages 1
- testing multiple alert violation messages 2
- testing multiple alert violation messages 3

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

E2E Test Results

Commit: 935f0a9
Workflow Run: View Details
Artifacts: Download test results & logs

=== Evaluation Summary ===

  ✓ list-clusters (assertions: 3/3)
  ✓ cve-detected-workloads (assertions: 3/3)
  ✓ cve-detected-clusters (assertions: 3/3)
  ✓ cve-nonexistent (assertions: 3/3)
  ✓ cve-cluster-does-exist (assertions: 3/3)
  ~ cve-cluster-does-not-exist (assertions: 2/3)
      - ToolsUsed: Required tool not called: server=stackrox-mcp, tool=, pattern=list_clusters
  ✓ cve-clusters-general (assertions: 3/3)
  ✓ cve-cluster-list (assertions: 3/3)
  ✓ cve-log4shell (assertions: 3/3)
  ✓ cve-multiple (assertions: 3/3)
  ✓ rhsa-not-supported (assertions: 2/2)

Tasks:      11/11 passed (100.00%)
Assertions: 31/32 passed (96.88%)
Tokens:     ~54953 (estimate - excludes system prompt & cache)
MCP schemas: ~12738 (included in token total)
Agent used tokens:
  Input:  18203 tokens
  Output: 21390 tokens
Judge used tokens:
  Input:  50449 tokens
  Output: 34291 tokens

@mtodor mtodor force-pushed the mtodor/ROX-32888-fix-docs-eval-update-try2 branch from 8b02455 to 935f0a9 Compare April 10, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants