Scandinavian Embedding Benchmark¶
This is the documentation for the Scandinavian Embedding Benchmark. This benchmark is intended to evaluate the sentence/document embeddings of language models for mainland Scandinavian Languages.
Info
The Scandinavian Embedding Benchmark has moved to MTEB. You can find the Scandinavian Leaderboard under the MTEB Leaderboard. To run the benchmark, add results etc. please refer to the MTEB documentation. The reason for the change is that 1) encourage others to evaluate on scandinavian tasks, 2) avoid duplication of effort, and 3) make it easier for users to compare models across languages. My hope is that this will lead to better models for Scandinavian languages.
Missing a model or information? That is great we would love to add it to MTEB. Please file an issue on MTEB and we will help get it added.
Intended uses for this benchmark:
- Evaluating document embeddings of Scandinavian language models
- Evaluating document embeddings of multilingual models for Scandinavian languages
- Allow ranking of competing Scandinavian and multilingual models using no more compute than what a consumer laptop can provide
Comparison to other benchmarks¶
If you use this benchmark for a relative ranking of language models where you plan to fine-tune the models I would recommend looking at ScandEval, which benchmarks the model using a cross-validated fine-tuning. It also includes structured prediction tasks such as named entity recognition. Many of the tasks in this embedding benchmark are also included in ScandEval, and an attempt has been made to use the same versions. A few tasks (ScandiQA) are included in ScandEval, but not in this benchmark as they are human translations of an English dataset.
The tasks within this benchmark are also included in the MTEB leaderboard, though the aggregation methods are slightly different. MTEB is primarily an English embedding benchmark, with a few multilingual tasks and additional languages. The tasks were also added to the MTEB leaderboard as a part of this project.