Skip to content

feat: add certification levels for Compass integration#37

Open
GuyZivRH wants to merge 14 commits into
mainfrom
APPENG-5306/certification-levels
Open

feat: add certification levels for Compass integration#37
GuyZivRH wants to merge 14 commits into
mainfrom
APPENG-5306/certification-levels

Conversation

@GuyZivRH

Copy link
Copy Markdown
Collaborator

Summary

Implements hierarchical certification levels (Foundational, Trusted, Certified) that aggregate gate results into standardized checks for Compass integration.

  • Foundational (5 checks): Basic validation - structure, security, execution, quality, metadata
  • Trusted (4 checks): Production-ready - eval assets, advanced security, functional validation, instruction quality
  • Certified (3 checks): Enterprise-grade - enterprise structure, enterprise security, advanced agent validation

Changes

  • Add abevalflow/certification.py with 20 check IDs and computation logic
  • Extend Scorecard with certification and highest_certification fields
  • Add certification fact pushing to Compass (4 facts: foundational, trusted, certified, summary)
  • Add YAML-level certification_policy for customizing checks and thresholds per level
  • Plumb validation results from validate task to certification computation
  • Add comprehensive documentation (Docs/certification_and_checks.md)
  • Add 38 unit tests

Configuration Example

certification_policy:
  foundational:
    checks:
      - valid_skill_structure
      - basic_security_validation
    thresholds:
      basic_execution_validation: 0.5
  trusted:
    checks:
      - evaluation_assets
      - functional_validation

Test Plan

  • 38 certification tests pass
  • Existing scorecard/compass tests pass (118 total)
  • Manual: Trigger pipeline and verify certification in scorecard.json
  • Manual: Verify facts pushed to Compass (if endpoint configured)

Ref: APPENG-5306

Made with Cursor

GuyZivRH added 10 commits June 22, 2026 12:12
Implement hierarchical certification levels that aggregate gate results
into standardized checks for Compass integration. Each level has specific
requirements that must pass for certification.

- Add abevalflow/certification.py with 16 check IDs across 3 levels
- Extend Scorecard with certification and highest_certification fields
- Add certification fact pushing to Compass (4 facts per evaluation)
- Update aggregate_scorecard.py to compute and push certification
- Add comprehensive documentation to compass_facts_integration.md
- Add 24 unit tests for certification logic

Ref: APPENG-5306
The validate task now writes validation.json to the workspace so
downstream tasks can read the actual validation results instead of
hardcoding validation_passed=True.

- Update validate.yaml to write validation.json to reports dir
- Update aggregate_scorecard.py to read validation results
- Certification checks now reflect actual validation status

Ref: APPENG-5306
Add CertificationPolicy schema that allows users to customize which checks
are required for each certification level (foundational, trusted, certified)
and override score thresholds via metadata.yaml.

Key changes:
- Add CertificationPolicy and CertificationLevelPolicy to schemas.py
- Add certification_policy field to SubmissionMetadata
- Update compute_certification() to accept optional policy
- Support custom check lists and threshold overrides per level
- Load and pass certification_policy in aggregate_scorecard.py
- Add 14 new tests for policy configuration
- Document new YAML options in compass_facts_integration.md

Backward compatible: if no certification_policy, uses existing defaults.
Trusted and Certified levels now only include implemented checks:

Trusted (was 6, now 4):
- Removed: registry_governance, operational_policy_compliance

Certified (was 5, now 3):
- Removed: enterprise_behavioral_testing, continuous_optimization

These checks remain in the CheckId enum for future implementation
and can be re-added via certification_policy in metadata.yaml.

Ref: APPENG-5306
- Add 4 new check IDs to enum for future implementation:
  - efficiency_cost_profiling
  - data_privacy_pii_handling
  - safety_toxicity_bias_guardrails
  - resilience_chaos_testing

- Create Docs/certification_and_checks.md covering:
  - Overview of gates, checks, levels, scorecards, facts
  - All 20 check IDs with implementation status
  - What each check does and what's missing
  - Artifact applicability matrix
  - Configuration options
  - Implementation roadmap
  - File locations

Ref: APPENG-5306
- Fix check merge logic to be conservative (keep failing checks)
- Fix empty check list now fails level (all([]) bug)
- Fix missing validation.json now fails validation (fail-closed)
- Add source_implementation tracking for 'both' mode
- Add tests for all fixed scenarios
…findings

- Compass facts now use hierarchy-enforced `passed` values:
  - trusted.passed requires foundational.passed
  - certified.passed requires trusted.passed and foundational.passed
- Add validation for threshold keys against CheckId enum
- Detect unresolved ${VAR} placeholders before sending requests
- Fix ENTERPRISE_SECURITY_REVIEW message when score < threshold but 0 findings
- Extract _push_raw_fact() helper to eliminate duplicate HTTP code
- Fix get_threshold() to use last-wins semantics (matching _collect_threshold_overrides)
- Add 22 new tests for hierarchy enforcement and validation

Ref: APPENG-5306
- Enforce certification hierarchy in compute_certification() - if
  foundational fails, trusted and certified automatically fail
- Add UnresolvedEnvVarError handling in aggregate_scorecard.py to
  prevent pipeline crashes from misconfigured tokens
- Apply unresolved env var check to push_gate_fact_from_config
- Refactor push_certification_level_fact to use _push_raw_fact helper
- Update _build_certification_summary_payload to use hierarchy-enforced
  passed values for consistency with level payloads
- Update test for hierarchy enforcement behavior
- Remove redundant hierarchy_passed_map computation in push_certification_facts
  (hierarchy is now enforced at compute_certification level, so .passed values
  are already correct)
- Remove hierarchy_passed parameter from _build_certification_level_payload
  and push_certification_level_fact - use level_result.passed directly
- Add failure_reason field to Compass payload when hierarchy forces failure
  (all checks passed but level failed due to prerequisite level failing)
- Fix empty fact_ref in UnresolvedEnvVarError handling by computing the
  intended fact_ref before the push call
- Add dedicated TestHierarchyEnforcement test class with isolation tests for:
  - Trusted fails -> certified cascades
  - Foundational fails -> trusted cascades
  - Check results preserved despite hierarchy failure
  - All levels pass when requirements met
- Update TestBuildCertificationLevelPayload and TestPushCertificationFactsHierarchy
  to reflect the new simpler API

Addresses review findings: redundant hierarchy_passed_map, empty fact_ref,
missing failure_reason field, and insufficient hierarchy cascade tests.
Change GateType.QUALITY to GateType.SECURITY for gates named "security"
in hierarchy enforcement tests. The wrong type caused the tests to
exercise quality checks instead of security checks, which was misleading
even though the tests still passed (they test hierarchy behavior, not
security-specific mapping).
@GuyZivRH GuyZivRH force-pushed the APPENG-5306/certification-levels branch from 7a38c3c to 3856865 Compare June 22, 2026 11:36
GuyZivRH added 4 commits June 22, 2026 15:07
Add artifact-type-specific certification profiles that can be selected
at pipeline deployment level, allowing different check configurations
for skills, agents, MCP servers, and plugins without per-submission
configuration.

Profiles:
- skill: A/B testing focus (default)
- agent: Reasoning and safety focus
- mcp_server: API contracts and resilience
- plugin: OpenAPI and auth validation
- full: All checks enabled

Implementation:
- config/certification_profiles.yaml: Profile definitions (PM-owned)
- certification.py: load_profile(), get_available_profiles()
- aggregate_scorecard.py: --certification-profile argument
- analyze-and-check-degradation.yaml: certification-profile parameter

Priority order: submission metadata.yaml > pipeline profile > defaults

Includes 8 new tests and documentation for extending the system
with new scanners/checks.
If --certification-profile is explicitly provided but cannot be loaded
(typo, missing file), raise the error instead of silently falling back
to defaults. This prevents deployment misconfiguration from going
unnoticed.
Quay sends digest tags as 'sha256-...' (dash) not 'sha256:...' (colon).
The CEL filter was only checking for colon format, allowing digest tags
to trigger monitoring runs. This caused 49+ spurious pending runs.
The prepare task was only writing validation results to Tekton results,
not to the workspace. This caused the scorecard aggregation to fail to
find validation.json, defaulting certification checks to failed.

Now writes validation.json to reports/{submission-dir}/ for downstream
tasks to consume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant