Skip to content

Add LICA-Bench: graphic design VLM evaluation (39 tasks, 7 domains)#1212

Open
purvanshi wants to merge 1 commit intohuggingface:mainfrom
purvanshi:feature/lica-bench
Open

Add LICA-Bench: graphic design VLM evaluation (39 tasks, 7 domains)#1212
purvanshi wants to merge 1 commit intohuggingface:mainfrom
purvanshi:feature/lica-bench

Conversation

@purvanshi
Copy link
Copy Markdown

Summary

Adds LICA-Bench to lighteval — a structured evaluation suite for vision-language models on graphic design artifacts, comprising 39 tasks across 7 domains: layout, typography, SVG, templates, temporal, Lottie, and category.

Changes

  • src/lighteval/tasks/tasks/lica_bench.py — 39 LightevalTaskConfig entries (one per task), a shared lica_bench_prompt function that builds Doc objects with optional images for VLM evaluation, using purvanshi/lica-bench-eval as the HuggingFace dataset.
  • src/lighteval/tasks/tasks/lica_bench_prepare_hf_dataset.py — utility script to convert the lica-bench dataset into HuggingFace format and push to the Hub.

Domains

Domain Tasks Description
Category 2 Design category classification
Layout 8 Spatial arrangement understanding & generation
SVG 8 SVG graphic comprehension & generation
Template 5 Design template understanding & generation
Temporal 6 Temporal/animation understanding & generation
Typography 8 Text/font understanding & generation
Lottie 2 Lottie animation generation

Usage

# Run a single task
lighteval accelerate "model_name=<model>" "lica_bench:category-1|0"

# Run all lica_bench tasks (superset expansion)
lighteval accelerate "model_name=<model>" "lica_bench|0"

Dataset preparation

The HuggingFace dataset (purvanshi/lica-bench-eval) can be built from the source data:

pip install "lica-bench @ git+https://github.com/purvanshi/lica-bench.git"
python src/lighteval/tasks/tasks/lica_bench_prepare_hf_dataset.py \
    --dataset-root /path/to/lica-benchmarks-dataset

Test plan

  • All 39 LightevalTaskConfig entries created with correct names, subsets, and splits
  • lica_bench_prompt correctly builds Doc objects (text-only, with images, structured answers)
  • lighteval Registry auto-discovers all 39 tasks
  • End-to-end run with a VLM after HF dataset is published

Made with Cursor

Adds 39 tasks across 7 domains (layout, typography, SVG, templates,
temporal, Lottie, category) from the LICA-Bench suite for evaluating
vision-language models on graphic design artifacts.

- Benchmark code: https://github.com/purvanshi/lica-bench
- Dataset: https://github.com/purvanshi/lica-dataset

Made-with: Cursor
@purvanshi
Copy link
Copy Markdown
Author

Hi @NathanHB @clefourrier — this PR adds LICA-Bench, a graphic design evaluation suite (39 tasks across 7 domains) for VLMs. Would appreciate a review when you get a chance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant