Rerank 3 has a high-precision semantic reranking performance. Because of this, Rerank 3 excels at working with other RAG-capable generative AI models with longer context windows to increase response accuracy while keeping latency and costs down. The model archives cost reductions in retrieval-augmented generation systems by ensuring that only the most relevant information is fed to the generative model, especially in cases where the context is substantial, ranging from thousands to millions of documents. Rerank 3 features state-of-the-art capabilities including a 4K context length window, multi-aspect and semi-structured data search (emails, invoices, JSON documents, code, tables), 100+ language support, improved latency, and lower total cost of ownership.
In-depth benchmarking results and examples highlighting Rerank 3's state-of-the-art capabilities can be found in the official announcement. The model is now available on Cohere's hosted API, AWS Sagemaker, and the inference API in Elasticsearch to perform semantic reranking directly on existing Elasticsearch indexes.
Comments