Mistral AI launched the compact 'les Ministraux' models for edge use cases

At the one-year mark after Mistral 7B's release, Mistral AI has launched two small models, Ministral 3B and 8B, trained and optimized to deliver state-of-the-art performance in on-device and edge use cases, including on-device translation, offline assistants, and even robotics. Notably, les Ministraux feature performance that not only surpasses leading models in the sub-10B, but Ministral 3B's performance surpasses Mistral 7B's, making for a significant performance improvement achieved in just one year.

Ministral 3B and 8B accept up to 128K tokens as input (32K on vLLM). Ministral 8B also features an interleaved sliding-window attention pattern, which makes its inference even faster and more memory-efficient. In addition to being ideal candidates for use cases in which they work standalone, les Ministraux can also be used as intermediaries in multi-step agentic workflows, as their compact size and strong performance make these models ideal as intermediaries for function-calling tasks such as input parsing, task routing, and calling APIs.

Benchmarked against comparable models using Mistral AI's internal framework, Mistral 3B outperforms Gemma 2 2B and Llama 3.2 3B at popular tests, including MMLU, AGIEval, Winogrande, HumanEval, and GSM8K. On the same set of tests, Mistral 8B surpasses the likes of Llama 3.1 8B, Gemma 2 9B, and Mistral 7B. The instruction-tuned versions of the models correspondingly beat their competitors on evaluations such as MTBench, ArenaHard, and MATH.

Ministral 8B is priced at $0.1 per million tokens, while Ministral 3B costs $0.04 per million tokens. Both are accessible via Mistral's API and will soon be available from Mistral's cloud partners. While Mistral 3B is only available under the Mistral Commercial License, Mistral 8B is covered by Mistral's commercial and research licenses. The Mistral 8B Instruct model weights are available for download for research purposes only.