examples: add RAIL Score responsible AI custom task template by SumitVermakgp · Pull Request #1203 · huggingface/lighteval

SumitVermakgp · 2026-04-02T09:47:32Z

Summary

Add a custom task template that evaluates model outputs across 8 responsible AI dimensions using RAIL Score as a MetricGrouping metric.

Changes

examples/custom_tasks_templates/custom_rail_score_task.py -- Complete self-contained file with RAILScoreComputation(SampleLevelComputation), MetricGrouping registration, prompt function, and task config

How it works

RAILScoreComputation calls the RAIL Score API on each generated response
Returns 9 named metrics (8 dimensions + overall), each normalized to 0-1
Uses MetricGrouping so all dimensions appear as separate columns in results
Domain context read from doc.specific metadata for domain-specific scoring
Lazy client initialization (no API key required at import time)

Usage

pip install rail-score-sdk
export RAIL_API_KEY="rail_..."

lighteval accelerate \
    "model_name=HuggingFaceH4/zephyr-7b-beta" \
    "rail_score:default|0" \
    --custom-tasks examples/custom_tasks_templates/custom_rail_score_task.py

Testing

Tested RAILScoreComputation.compute() with mock Doc and ModelResponse objects
All 9 metrics returned in correct 0-1 range
extend_enum registration verified
Edge cases (short output, empty response) handled
Formatted with ruff

Add a custom task that evaluates model outputs across 8 responsible AI dimensions (fairness, safety, reliability, transparency, privacy, accountability, inclusivity, user_impact) using the RAIL Score API. Uses MetricGrouping with SampleLevelComputation to return all 8 dimensions plus an overall score as separate named metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples: add RAIL Score responsible AI custom task template#1203

examples: add RAIL Score responsible AI custom task template#1203
SumitVermakgp wants to merge 1 commit intohuggingface:mainfrom
SumitVermakgp:feat/rail-score-custom-task

SumitVermakgp commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SumitVermakgp commented Apr 2, 2026

Summary

Changes

How it works

Usage

Testing

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant