October 15, 20245 minute read
Top embedding models on the MTEB leaderboard
author
Yiren Lu@YirenLu
Solutions Engineer

What is the MTEB leaderboard?

The MTEB leaderboard, hosted on Hugging Face, is a comprehensive benchmark for assessing the performance of embedding models across a wide range of tasks. It provides a standardized way to evaluate and compare different models.

The leaderboard encompasses various tasks, including:

  1. Classification
  2. Clustering
  3. Pair classification
  4. Reranking
  5. Retrieval
  6. Semantic textual similarity (STS)
  7. Summarization

By evaluating models across these diverse tasks, the MTEB leaderboard offers a holistic view of embedding model capabilities.

Beyond the rankings: choosing the right model for your use case

While the MTEB leaderboard provides valuable information about model performance, it’s essential to understand that a high ranking doesn’t necessarily mean a model is the best fit for your specific use case. Several factors should be considered when selecting an embedding model:

  1. Task-specific performance: Some models may excel in certain tasks but underperform in others. Analyze the breakdown of scores across different tasks to identify models that perform well in areas relevant to your project.

  2. Computational requirements: Larger embedding models often require more computational resources. Consider your hardware and cost limitations and inference speed requirements when choosing a model.

  3. Domain relevance: The MTEB benchmark uses general-purpose datasets. If your application focuses on a specific domain (e.g., medical, legal, or financial), a domain-specific model might outperform general models.

To make an informed decision, it’s crucial to run thorough evaluations of potential models using datasets and tasks that closely resemble your specific use case. This approach ensures that you select the most suitable embedding model for your project’s unique requirements.

Top 5 models on the MTEB leaderboard

As of fall 2024, here are some of the top models on the MTEB leaderboard and their backgrounds:

  1. NV-Embed-v2: Developed by NVIDIA, NV-Embed-v2 is a generalist embedding model that fine-tunes a base LLM (Mistral 7B) to provide text embeddings.

  2. bge-en-icl: This model is developed by the Beijing Academy of Artificial Intelligence (BAAI). It’s part of the BAAI General Embedding (BGE) family, which includes a range of embedding models for both English and Chinese.

  3. stella_en_1.5B_v5: This model is built on top of the Alibaba-NLP/gte-large-en-v1.5 and Alibaba-NLP/gte-Qwen2-1.5B-instruct models. At 1.5B parameters, this is around 5x smaller than the other top 5 models, most of which are ~7B parameters.

  4. SFR-Embedding-2_R: The Salesforce/SFR-Embedding-2_R model is developed by the Salesforce AI Research team. It builds upon their previous work on the SFR-Embedding-Mistral model, which was trained on large datasets to improve text retrieval and semantic search capabilities.

  5. gte-Qwen2-7B-instruct: gte-Qwen2-7B-instruct is the latest model in the gte (General Text Embedding) model family. It’s made by Alibaba.

Domain-specific embedding models

While general-purpose models dominate the MTEB leaderboard, domain-specific embedding models can offer superior performance for specialized applications. Here are some examples of embedding models fine-tuned for specific domains:

  1. Medicine: PubMedBERT is fine-tuned on medical literature and clinical notes, making it well-suited for tasks in healthcare and biomedical research. Additionally, BioLORD is another model tailored for similar applications.

  2. Finance: Finance Embeddings from Investopedia, Voyage Finance, and BGE Base Financial Matryoshka are examples of models fine-tuned on financial datasets, offering improved performance for tasks such as sentiment analysis of financial news or SEC filings.

  3. Law: For legal applications, consider exploring the Domain-Specific Embeddings and Retrieval: Legal Edition, which discusses models fine-tuned on legal documents, enhancing their utility for legal research, contract analysis, and other law-related NLP tasks.

  4. Code: CodeBERT and GraphCodeBERT are designed specifically for programming language understanding, making them useful for code search, code completion, and bug detection tasks.

  5. Math: Math Similarity Model is tailored for mathematical tasks.

  6. Other languages:

These domain-specific models demonstrate the potential for tailored embedding solutions in specialized fields. When working with domain-specific tasks, it’s worth exploring these models alongside the top performers on the MTEB leaderboard to find the best fit for your particular use case.

Ship your first app in minutes.

Get Started

$30 / month free compute