Add LICA-Bench: graphic design VLM evaluation (39 tasks, 7 domains) by purvanshi · Pull Request #1212 · huggingface/lighteval

purvanshi · 2026-04-15T01:24:26Z

Summary

Adds LICA-Bench to lighteval — a structured evaluation suite for vision-language models on graphic design artifacts, comprising 39 tasks across 7 domains: layout, typography, SVG, templates, temporal, Lottie, and category.

Benchmark code: https://github.com/purvanshi/lica-bench
Dataset: https://github.com/purvanshi/lica-dataset

Changes

src/lighteval/tasks/tasks/lica_bench.py — 39 LightevalTaskConfig entries (one per task), a shared lica_bench_prompt function that builds Doc objects with optional images for VLM evaluation, using purvanshi/lica-bench-eval as the HuggingFace dataset.
src/lighteval/tasks/tasks/lica_bench_prepare_hf_dataset.py — utility script to convert the lica-bench dataset into HuggingFace format and push to the Hub.

Domains

Domain	Tasks	Description
Category	2	Design category classification
Layout	8	Spatial arrangement understanding & generation
SVG	8	SVG graphic comprehension & generation
Template	5	Design template understanding & generation
Temporal	6	Temporal/animation understanding & generation
Typography	8	Text/font understanding & generation
Lottie	2	Lottie animation generation

Usage

# Run a single task
lighteval accelerate "model_name=<model>" "lica_bench:category-1|0"

# Run all lica_bench tasks (superset expansion)
lighteval accelerate "model_name=<model>" "lica_bench|0"

Dataset preparation

The HuggingFace dataset (purvanshi/lica-bench-eval) can be built from the source data:

pip install "lica-bench @ git+https://github.com/purvanshi/lica-bench.git"
python src/lighteval/tasks/tasks/lica_bench_prepare_hf_dataset.py \
    --dataset-root /path/to/lica-benchmarks-dataset

Test plan

All 39 LightevalTaskConfig entries created with correct names, subsets, and splits
lica_bench_prompt correctly builds Doc objects (text-only, with images, structured answers)
lighteval Registry auto-discovers all 39 tasks
End-to-end run with a VLM after HF dataset is published

Made with Cursor

Adds 39 tasks across 7 domains (layout, typography, SVG, templates, temporal, Lottie, category) from the LICA-Bench suite for evaluating vision-language models on graphic design artifacts. - Benchmark code: https://github.com/purvanshi/lica-bench - Dataset: https://github.com/purvanshi/lica-dataset Made-with: Cursor

purvanshi · 2026-04-15T01:24:41Z

Hi @NathanHB @clefourrier — this PR adds LICA-Bench, a graphic design evaluation suite (39 tasks across 7 domains) for VLMs. Would appreciate a review when you get a chance. Thanks!

Dataset: https://github.com/purvanshi/lica-dataset
Benchmark code: https://github.com/purvanshi/lica-bench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LICA-Bench: graphic design VLM evaluation (39 tasks, 7 domains)#1212

Add LICA-Bench: graphic design VLM evaluation (39 tasks, 7 domains)#1212
purvanshi wants to merge 1 commit intohuggingface:mainfrom
purvanshi:feature/lica-bench

purvanshi commented Apr 15, 2026

Uh oh!

purvanshi commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

purvanshi commented Apr 15, 2026

Summary

Changes

Domains

Usage

Dataset preparation

Test plan

Uh oh!

purvanshi commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant