fix: Add AfriMTEB and AFriE5 by Kosei1227 · Pull Request #4124 · embeddings-benchmark/mteb

Kosei1227 · 2026-02-20T19:05:40Z

Add AfriMTEB tasks and AfriE5 model

Description

This PR registers the AfriMTEB benchmark and adds several new datasets and the AfriE5 model focusing on African languages.

For more details, please see our paper: AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages.

New Benchmark

AfriMTEB: A new benchmark subset focused on African languages, registered as MTEB(Africa, v1) (alias AfriMTEB). It includes a comprehensive set of tasks across classification, clustering, retrieval, bitext mining, and STS.

New Datasets

AfriXNLI (PairClassification)
SIB200-14Classes (MultiLabelClassification)
EmotionAnalysisPlus (MultiLabelClassification)
AfriHateClassification (Classification)
AfriSentiClassification (Classification)
KinNewsClassification (Classification)
InjongoIntent (Classification)

New Models

McGill-NLP/AfriE5-Large-instruct: An AfriE5 model adapted from XLM-R.

Citation

If you use this benchmark or the AfriE5 model, please cite:

@article{uemura2025afrimteb,
  author = {Kosei Uemura and Miaoran Zhang and David Ifeoluwa Adelani},
  journal = {arXiv preprint arXiv:2510.23896},
  title = {AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages},
  url = {https://arxiv.org/abs/2510.23896},
  year = {2025},
}

Checklists

Dataset Checklist

Dataset added to the mteb/tasks directory
Task class implementation follows AbsTask* interface
Metadata is complete and correct
dataset_transform method implemented if necessary
Added to __init__.py

Model Checklist

Model added to mteb/models directory
Model metadata (ModelMeta) defined correctly
Loader function defined
Languages and other metadata fields populated
License information added

Samoed · 2026-02-21T08:15:06Z

Why do you change existing dataset?

@Kosei1227 This is still unresolved. Why do you change existing dataset?

@Samoed This is because AfriSenti under mteb library does not support Oromo. This change explictly support this language for better coverage.

Yes, but to include them they should be existing in dataset, but they don't exist in mteb repo. If you want to include them we need to update dataset repo

I discussed the language coverage of mteb/AfriSentiClassification in my team. For AfriSentiClassification, we use the original dataset: https://huggingface.co/datasets/shmuhammad/AfriSenti-twitter-sentiment. I’m not sure why mteb/AfriSentiClassification does not support Oromo, but since I cannot modify the existing dataset, let’s revert the changes in this file.

KennethEnevoldsen

Hi @Kosei1227, great to see this addition!!

There are some issues that we need to address before the merge, these are mainly caused by v1-v2 changes - but generally PR the metadata annotations looks good.

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

…sses.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Kosei1227 · 2026-02-28T19:44:45Z

Hi @Samoed, @KennethEnevoldsen,

Thank you for the detailed reviews! I have addressed the requested changes across 9 commits. Here is a summary of the updates:

1. Model Refinement (`e5_instruct.py`)

Switched to InstructSentenceTransformerModel for AfriE5-Large-instruct.
Moved parameters to loader_kwargs and removed the use of partial.
Updated the revision to the specific commit hash: 2bbf55df87c1ddd7b20c5626d6f97ca6178766b7.

2. Task Modernization (MTEB v2)

Refactored all new tasks (AfriHate, KinNews, Injongo, AfriXNLI, SIB200-14Classes, EmotionAnalysisPlus) to inherit directly from AbsTask* base classes, removing MultilingualTask.
Updated TaskMetadata to comply with v2 schema:
- Removed trust_remote_code: True.
- Updated category from s2s to t2c (Classification) or t2t (Pair Classification).
- Added fast_loading = True where appropriate.
- Standardized language codes to swh (Swahili) and gaz (Oromo) for consistency with MMTEB filters.
- Updated dataset revision to specific commit hashes.

3. SIB200 14-Classes

Converted the task from multi-label to single-label classification as requested.
Moved the implementation to mteb/tasks/classification/multilingual/sib200_14classes.py and updated the registry.
Updated the dataset_transform logic to return a single integer label.

4. Benchmark Configuration

Registered the MTEB_AFRICA_LITE benchmark in benchmarks.py, covering the 13 representative datasets and 9 languages mentioned in the paper.
Updated the reference for MTEB_AFRICA to the latest arXiv URL.

5. Descriptive Statistics

Generated and added descriptive statistics JSON files for InjongoIntent, AfriXNLI, EmotionAnalysisPlus, and SIB200-14Classes.
Note: Stats for AfriHate and KinNews are currently pending as they require dataset re-uploads/access updates to the MTEB organization as suggested by @KennethEnevoldsen. My Hugging Face username is KoseiUemura, if you'd like to add me to the organization!

Please let me know if there are any further adjustments needed!

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Kosei1227 · 2026-03-01T19:52:32Z

@Samoed
Hi ,

Can you delete this file then?

Thanks for flagging this.

We would prefer to keep SIB200_14Classes. Although it is related to the original SIB200 task, this 14-class variant is substantially more challenging and is a key contribution of our paper. In AfriMTEB, it plays an important role in evaluating fine-grained topic classification for multilngual settings, which is not covered by the original setup.

KennethEnevoldsen · 2026-03-21T20:15:19Z

Hi @Kosei1227 thanks for the changes and sorry about the late response, can I ask you to please answer this question regarding SIB200:

do you know how the new labels were obtained?

Currently, I am a bit unsure about the dataset it would be nice to know how it is different

dadelani · 2026-03-28T19:35:32Z

Hi @Kosei1227 thanks for the changes and sorry about the late response, can I ask you to please answer this question regarding SIB200:

do you know how the new labels were obtained?

Currently, I am a bit unsure about the dataset it would be nice to know how it is different

@KennethEnevoldsen and @Samoed , the original SIB-200 has 14 classes (section 2.3 of the paper), comparison of both are in Appendix https://aclanthology.org/2024.eacl-long.14.pdf#page=15.08 . The original SIB-200, removed infrequent classes since they were already challenging enough when it was created in 2023 but not anymore. Please, let me know if you have questions.

Quote from the paper:

"While the SIB-200 dataset only includes seven labels, we are also releasing another version of the dataset that is more challenging with all the 14 labels (excluding “uncategorized”). We compared the performance of English dataset using both seven and 14 labels in Appendix C."

KennethEnevoldsen

Thanks for the clarification @dadelani

Based on that I would change it to SIB200Classification.v2 instead.

The next steps we need now is that the datasets are re-uploaded under the mteb organization (to ensure that we can maintain them going forward). I one of you have an huggingface ID then I can add you to the organization. You can see how to push it to the hub in the documentation, but do ask if there are issues. Once the dataset it uploaded you also have to remove the dataset_transform (as it is applied before the upload)

I would also ask that you go over the comments above and resolve those that you believe are resolved

Once those I done, then I think we are pretty much there - thanks for taking the time so far!

github-actions · 2026-04-13T02:20:45Z

This pull request has been automatically marked as stale due to inactivity.

KennethEnevoldsen · 2026-04-13T11:49:40Z

@Kosei1227 should we get this PR finalized? I would love to have it merged into MTEB

Kosei1227 · 2026-04-13T14:42:38Z

@KennethEnevoldsen Thanks for following up! I'll finalize the PR by this weekend!
We are greatly looking forward to adding our benchmark and models to MTEB!

KennethEnevoldsen · 2026-04-14T14:58:01Z

great to hear - looking forward to it!

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Align the 14-label SIB200 task with reviewer feedback by renaming it to `SIB200Classification.v2` and marking `SIB200Classification` as superseded. Update the task registry, benchmark entries, and quality-test references so the renamed task resolves consistently. Made-with: Cursor

Kosei1227 · 2026-04-18T19:02:12Z

Hi @KennethEnevoldsen @Samoed, quick update on the Hugging Face rehost step: I authenticated as KoseiUemura with a write-scoped token and verified the HF client works locally, but creating dataset repos under the mteb/* namespace is still rejected by the Hub API with 403 Forbidden: You don't have the rights to create a dataset under the namespace "mteb".

On my side, hf auth whoami / HfApi.whoami() does not currently show mteb among my orgs, so it looks like either the org invite is still pending/not visible yet, or my current access does not include creating new dataset repos in the org.

The failure happens at repo creation, before upload, so if repo creation is restricted but push access to existing repos is allowed, pre-creating the target repos would also unblock me.
The six intended targets are mteb/AfriHateClassification, mteb/KinNewsClassification, mteb/InjongoIntent, mteb/AfriXNLI, mteb/EmotionAnalysis, and mteb/SIB200Classification.v2.

If you can confirm/add KoseiUemura with repo-creation rights, or pre-create these repos, I can complete the rehost asap.

Samoed · 2026-04-18T22:05:06Z

I authenticated as KoseiUemura with a write-scoped token and verified the HF client works locally

I don't think that you have access to mteb org. You can keep datasets under your username

KennethEnevoldsen · 2026-04-19T13:21:23Z

@Kosei1227 I have added you as a contibutor on mteb, you can now upload new datasets and edit your own datasets (but not edit existing datasets)

Point the AfriMTEB-related tasks and SIB200 v2 task to the new datasets hosted under the mteb namespace so task loading uses the transferred repositories going forward. Made-with: Cursor

Kosei1227 · 2026-04-19T14:58:29Z

@KennethEnevoldsen Thanks for inviting me to contributors! Now, I transferred all new datasets under mteb.

Samoed · 2026-04-19T15:01:00Z

@Kosei1227 Can you look into the comments? There are some unresolved ones

Kosei1227 · 2026-04-19T15:27:16Z

@Samoed All comments should be resolved now! Thanks for flagging up this point.

Restore the original Swahili subset key and remove the unnecessary fast-loading flag so the task only keeps the Oromo support changes. Made-with: Cursor

Remove the Oromo-specific AfriSenti task changes so the file matches the earlier branch version until the dataset can be updated safely. Made-with: Cursor

….v2 and remove old SIB200-14Classes stats Made-with: Cursor

Made-with: Cursor

KennethEnevoldsen

I believe this PR is good to merge - samoed do you have anything to add?

Kosei1227 added 11 commits February 20, 2026 13:48

dataset: Add AfriHateClassification dataset

8b858a3

dataset: Add AfriSentiClassification

55c144d

dataset: Add KinNewsClassification dataset

13a9f35

dataset: Add InjongoIntent dataset

f7e8cfb

dataset: Add emotion analysis dataset

9c63886

dataset: Add SIB200_14Classification dataset

0c2f1c9

dataset: Add AfriXNLI dataset

cef9f35

update init

2440170

model: Add AfriE5-Large-instruct

96ce1de

feat: register AfriMTEB benchmark

8b1423e

Merge upstream/main and resolve conflicts with directory renames

b095c40

Kosei1227 changed the title ~~Afrimteb afrie5~~ Add AfriMTEB and AFriE5 Feb 20, 2026

Samoed requested changes Feb 21, 2026

View reviewed changes

Samoed added the new benchmark Issues related to adding a new benchmark label Feb 21, 2026

KennethEnevoldsen reviewed Feb 21, 2026

View reviewed changes

Kosei1227 and others added 11 commits February 28, 2026 13:38

Update mteb/tasks/classification/multilingual/injongo_intent.py

908f3d6

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update mteb/tasks/multilabel_classification/multilingual/sib200_14cla…

c376fca

…sses.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

model: update AfriE5 model implementation and revision

f44c409

dataset: refactor AfriHateClassification to v2 format

f09f6b4

dataset: refactor KinNewsClassification to v2 format

9225c58

dataset: refactor InjongoIntent to v2 format and add stats

29f3d82

dataset: optimize and standardize AfriSentiClassification

668c317

dataset: refactor AfriXNLI to v2 format and add stats

398bad1

dataset: refactor SIB200_14Classes to single-label and add stats

8364bdf

dataset: refactor EmotionAnalysisPlus to v2 format and add stats

6664fdc

feat: register MTEB_AFRICA_LITE and update MTEB_AFRICA reference

16a19c9

Samoed reviewed Mar 1, 2026

View reviewed changes

Comment thread mteb/models/model_implementations/e5_instruct.py

Comment thread mteb/tasks/classification/multilingual/injongo_intent.py Outdated

Comment thread mteb/tasks/multilabel_classification/multilingual/sib200_14classes.py Outdated

Update mteb/tasks/classification/multilingual/injongo_intent.py

9ffd272

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Kosei1227 requested a review from Samoed March 21, 2026 16:37

KennethEnevoldsen reviewed Mar 29, 2026

View reviewed changes

Comment thread mteb/tasks/classification/multilingual/sib200_classification.py

github-actions Bot added the stale label Apr 13, 2026

github-actions Bot removed the stale label Apr 14, 2026

Kosei1227 and others added 2 commits April 18, 2026 13:43

Update mteb/tasks/classification/multilingual/sib200_classification.py

ef3f254

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

KennethEnevoldsen changed the title ~~Add AfriMTEB and AFriE5~~ dataset: Add AfriMTEB and AFriE5 Apr 19, 2026

KennethEnevoldsen changed the title ~~dataset: Add AfriMTEB and AFriE5~~ fix: Add AfriMTEB and AFriE5 Apr 19, 2026

Update task metadata to use transferred MTEB datasets.

4553007

Point the AfriMTEB-related tasks and SIB200 v2 task to the new datasets hosted under the mteb namespace so task loading uses the transferred repositories going forward. Made-with: Cursor

fix(tasks): revert unrelated AfriSenti changes

c8761ba

Restore the original Swahili subset key and remove the unnecessary fast-loading flag so the task only keeps the Oromo support changes. Made-with: Cursor

Samoed reviewed Apr 19, 2026

View reviewed changes

Comment thread mteb/tasks/classification/multilingual/afri_senti_classification.py Outdated

Samoed reviewed Apr 19, 2026

View reviewed changes

Comment thread mteb/tasks/classification/multilingual/afri_senti_classification.py Outdated

Kosei1227 and others added 4 commits April 19, 2026 13:43

fix(tasks): restore original AfriSenti task definition

3051a0f

Remove the Oromo-specific AfriSenti task changes so the file matches the earlier branch version until the dataset can be updated safely. Made-with: Cursor

fix(metadata): add missing descriptive stats for SIB200Classification…

ddad685

….v2 and remove old SIB200-14Classes stats Made-with: Cursor

fix(lint): fix ruff warnings in injongo_intent.py and afri_xnli.py

9b1d7c5

Made-with: Cursor

lint

6950193

KennethEnevoldsen approved these changes Apr 21, 2026

View reviewed changes

Samoed approved these changes Apr 21, 2026

View reviewed changes

Conversation

Kosei1227 commented Feb 20, 2026

Add AfriMTEB tasks and AfriE5 model

Description

New Benchmark

New Datasets

New Models

Citation

Checklists

Dataset Checklist

Model Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Kosei1227 Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Kosei1227 Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kosei1227 commented Feb 28, 2026

1. Model Refinement (e5_instruct.py)

2. Task Modernization (MTEB v2)

3. SIB200 14-Classes

4. Benchmark Configuration

5. Descriptive Statistics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kosei1227 commented Mar 1, 2026

Uh oh!

KennethEnevoldsen commented Mar 21, 2026

Uh oh!

dadelani commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

KennethEnevoldsen commented Apr 13, 2026

Uh oh!

Kosei1227 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Apr 14, 2026

Uh oh!

Kosei1227 commented Apr 18, 2026

Uh oh!

Samoed commented Apr 18, 2026

Uh oh!

KennethEnevoldsen commented Apr 19, 2026

Uh oh!

Kosei1227 commented Apr 19, 2026

1. Model Refinement (`e5_instruct.py`)

dadelani commented Mar 28, 2026 •

edited

Loading

Kosei1227 commented Apr 13, 2026 •

edited

Loading