feat: add certification levels for Compass integration#37
Open
GuyZivRH wants to merge 14 commits into
Open
Conversation
Implement hierarchical certification levels that aggregate gate results into standardized checks for Compass integration. Each level has specific requirements that must pass for certification. - Add abevalflow/certification.py with 16 check IDs across 3 levels - Extend Scorecard with certification and highest_certification fields - Add certification fact pushing to Compass (4 facts per evaluation) - Update aggregate_scorecard.py to compute and push certification - Add comprehensive documentation to compass_facts_integration.md - Add 24 unit tests for certification logic Ref: APPENG-5306
The validate task now writes validation.json to the workspace so downstream tasks can read the actual validation results instead of hardcoding validation_passed=True. - Update validate.yaml to write validation.json to reports dir - Update aggregate_scorecard.py to read validation results - Certification checks now reflect actual validation status Ref: APPENG-5306
Add CertificationPolicy schema that allows users to customize which checks are required for each certification level (foundational, trusted, certified) and override score thresholds via metadata.yaml. Key changes: - Add CertificationPolicy and CertificationLevelPolicy to schemas.py - Add certification_policy field to SubmissionMetadata - Update compute_certification() to accept optional policy - Support custom check lists and threshold overrides per level - Load and pass certification_policy in aggregate_scorecard.py - Add 14 new tests for policy configuration - Document new YAML options in compass_facts_integration.md Backward compatible: if no certification_policy, uses existing defaults.
Trusted and Certified levels now only include implemented checks: Trusted (was 6, now 4): - Removed: registry_governance, operational_policy_compliance Certified (was 5, now 3): - Removed: enterprise_behavioral_testing, continuous_optimization These checks remain in the CheckId enum for future implementation and can be re-added via certification_policy in metadata.yaml. Ref: APPENG-5306
- Add 4 new check IDs to enum for future implementation: - efficiency_cost_profiling - data_privacy_pii_handling - safety_toxicity_bias_guardrails - resilience_chaos_testing - Create Docs/certification_and_checks.md covering: - Overview of gates, checks, levels, scorecards, facts - All 20 check IDs with implementation status - What each check does and what's missing - Artifact applicability matrix - Configuration options - Implementation roadmap - File locations Ref: APPENG-5306
- Fix check merge logic to be conservative (keep failing checks) - Fix empty check list now fails level (all([]) bug) - Fix missing validation.json now fails validation (fail-closed) - Add source_implementation tracking for 'both' mode - Add tests for all fixed scenarios
…findings
- Compass facts now use hierarchy-enforced `passed` values:
- trusted.passed requires foundational.passed
- certified.passed requires trusted.passed and foundational.passed
- Add validation for threshold keys against CheckId enum
- Detect unresolved ${VAR} placeholders before sending requests
- Fix ENTERPRISE_SECURITY_REVIEW message when score < threshold but 0 findings
- Extract _push_raw_fact() helper to eliminate duplicate HTTP code
- Fix get_threshold() to use last-wins semantics (matching _collect_threshold_overrides)
- Add 22 new tests for hierarchy enforcement and validation
Ref: APPENG-5306
- Enforce certification hierarchy in compute_certification() - if foundational fails, trusted and certified automatically fail - Add UnresolvedEnvVarError handling in aggregate_scorecard.py to prevent pipeline crashes from misconfigured tokens - Apply unresolved env var check to push_gate_fact_from_config - Refactor push_certification_level_fact to use _push_raw_fact helper - Update _build_certification_summary_payload to use hierarchy-enforced passed values for consistency with level payloads - Update test for hierarchy enforcement behavior
- Remove redundant hierarchy_passed_map computation in push_certification_facts (hierarchy is now enforced at compute_certification level, so .passed values are already correct) - Remove hierarchy_passed parameter from _build_certification_level_payload and push_certification_level_fact - use level_result.passed directly - Add failure_reason field to Compass payload when hierarchy forces failure (all checks passed but level failed due to prerequisite level failing) - Fix empty fact_ref in UnresolvedEnvVarError handling by computing the intended fact_ref before the push call - Add dedicated TestHierarchyEnforcement test class with isolation tests for: - Trusted fails -> certified cascades - Foundational fails -> trusted cascades - Check results preserved despite hierarchy failure - All levels pass when requirements met - Update TestBuildCertificationLevelPayload and TestPushCertificationFactsHierarchy to reflect the new simpler API Addresses review findings: redundant hierarchy_passed_map, empty fact_ref, missing failure_reason field, and insufficient hierarchy cascade tests.
Change GateType.QUALITY to GateType.SECURITY for gates named "security" in hierarchy enforcement tests. The wrong type caused the tests to exercise quality checks instead of security checks, which was misleading even though the tests still passed (they test hierarchy behavior, not security-specific mapping).
7a38c3c to
3856865
Compare
Add artifact-type-specific certification profiles that can be selected at pipeline deployment level, allowing different check configurations for skills, agents, MCP servers, and plugins without per-submission configuration. Profiles: - skill: A/B testing focus (default) - agent: Reasoning and safety focus - mcp_server: API contracts and resilience - plugin: OpenAPI and auth validation - full: All checks enabled Implementation: - config/certification_profiles.yaml: Profile definitions (PM-owned) - certification.py: load_profile(), get_available_profiles() - aggregate_scorecard.py: --certification-profile argument - analyze-and-check-degradation.yaml: certification-profile parameter Priority order: submission metadata.yaml > pipeline profile > defaults Includes 8 new tests and documentation for extending the system with new scanners/checks.
If --certification-profile is explicitly provided but cannot be loaded (typo, missing file), raise the error instead of silently falling back to defaults. This prevents deployment misconfiguration from going unnoticed.
Quay sends digest tags as 'sha256-...' (dash) not 'sha256:...' (colon). The CEL filter was only checking for colon format, allowing digest tags to trigger monitoring runs. This caused 49+ spurious pending runs.
The prepare task was only writing validation results to Tekton results,
not to the workspace. This caused the scorecard aggregation to fail to
find validation.json, defaulting certification checks to failed.
Now writes validation.json to reports/{submission-dir}/ for downstream
tasks to consume.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements hierarchical certification levels (Foundational, Trusted, Certified) that aggregate gate results into standardized checks for Compass integration.
Changes
abevalflow/certification.pywith 20 check IDs and computation logicScorecardwithcertificationandhighest_certificationfieldscertification_policyfor customizing checks and thresholds per levelDocs/certification_and_checks.md)Configuration Example
Test Plan
Ref: APPENG-5306
Made with Cursor