The crux of the testing – revolves around the pivotal role of embedding, which may vary in efficacy depending on the language under consideration, thereby necessitating a tailored approach to maximize performance.
The quest for optimal text embedding revolves around two main types: static and dynamic.
In our analysis, we’ll scrutinize the top five multilingual embedding models from the MTEB Leader board against a baseline model.
Cohere Embed Multilingual v3.0 & Light v3.0: Cohere provides a proprietary embedding model accessible via API at $0.10/1M tokens, matching ada-002’s price. It offers document ranking during retrieval based on query-document topic matching and implements compression-aware training for efficient storage. The light version has an embedding dimension of 384.
intfloat/multilingual-e5-large: Trained using weak supervision in a contrastive manner with the CCPair dataset, sourced from diverse platforms. The model allows reducing embedding dimensions for memory and storage efficiency.
text-embedding-3-large: OpenAI’s high-performing model with adjustable embedding dimensions, boasting multilingual proficiency exceeding ada-002. It introduces flexibility in reducing dimensions for memory optimization.
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2: Based on SBERT architecture, this model employs a triple network structure for training. It utilizes anchor, positive, and negative examples to optimize embedding.
Generating QA Pairs: European Semester Country Reports for France and Italy serve as unbiased documents for question generation. GPT-3.5-turbo generates questions from context, translated into Italian and French for linguistic parity.
Evaluation Metrics:
Top 5 document retrieval assessed using Hit Rate and MRR. The newest OpenAI model excels, followed by Cohere’s proprietary system. text-embedding-3-large, with reduced dimensions, performs impressively. ada-002 lags behind the latest OpenAI model, indicating significant improvements. intfloat/multilingual-e5-large leads among open-source models.
Conclusion:
Customized evaluation enhances retrieval efficacy, considering varied model performance across languages. Personalized evaluation using proprietary documents is crucial for model selection.