Skip to content

examples: add RAIL Score responsible AI custom task template#1203

Open
SumitVermakgp wants to merge 1 commit intohuggingface:mainfrom
SumitVermakgp:feat/rail-score-custom-task
Open

examples: add RAIL Score responsible AI custom task template#1203
SumitVermakgp wants to merge 1 commit intohuggingface:mainfrom
SumitVermakgp:feat/rail-score-custom-task

Conversation

@SumitVermakgp
Copy link
Copy Markdown

Summary

Add a custom task template that evaluates model outputs across 8 responsible AI dimensions using RAIL Score as a MetricGrouping metric.

Changes

  • examples/custom_tasks_templates/custom_rail_score_task.py -- Complete self-contained file with RAILScoreComputation(SampleLevelComputation), MetricGrouping registration, prompt function, and task config

How it works

  • RAILScoreComputation calls the RAIL Score API on each generated response
  • Returns 9 named metrics (8 dimensions + overall), each normalized to 0-1
  • Uses MetricGrouping so all dimensions appear as separate columns in results
  • Domain context read from doc.specific metadata for domain-specific scoring
  • Lazy client initialization (no API key required at import time)

Usage

pip install rail-score-sdk
export RAIL_API_KEY="rail_..."

lighteval accelerate \
    "model_name=HuggingFaceH4/zephyr-7b-beta" \
    "rail_score:default|0" \
    --custom-tasks examples/custom_tasks_templates/custom_rail_score_task.py

Testing

  • Tested RAILScoreComputation.compute() with mock Doc and ModelResponse objects
  • All 9 metrics returned in correct 0-1 range
  • extend_enum registration verified
  • Edge cases (short output, empty response) handled
  • Formatted with ruff

Related

Add a custom task that evaluates model outputs across 8 responsible AI
dimensions (fairness, safety, reliability, transparency, privacy,
accountability, inclusivity, user_impact) using the RAIL Score API.

Uses MetricGrouping with SampleLevelComputation to return all 8
dimensions plus an overall score as separate named metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant