Mistral AI's newest model, Mistral Small 3, is a 24-billion-parameter LLM focused on efficiency and latency reduction suitable for most language and instruction-following-based tasks. The startup claims Mistral Small 3's performance is on par with larger models, like Llama 3.3 70B or Qwen 32B, and recommends it as a suitable replacement for closed-source models such as GPT 4o-mini. Mistral AI has released Mistral Small 3 under a permissive Apache 2.0, following its commitment to make its general purpose openly available. Mistral Small 3 is now available through multiple platforms besides Mistral AI's La Plateforme. These include: Hugging Face, Ollama, Kaggle, Together AI, and Fireworks AI. Additional releases are planned for NVIDIA NIM, Amazon SageMaker, Groq, Databricks, and Snowflake.
Mistral Small 3's underwent human preference evaluations and benchmark testing before release. The findings reveal a strong preference for Mistral Small 3 over Gemma 2 27B for generalist tasks and Qwen 2.5 32B for generalist and coding tasks. It is a tighter race with GPT 4o-mini, as evaluators preferred the latter slightly over 40% of the time in the generalist evaluation. Although the model does not achieve state-of-the-art performance in any benchmark, the comparison provides evidence for Mistral Small 3's performance claims. They show the model consistently outperforming Gemma 2 and holding its own against Llama 3.3 70B and Qwen 32B. Perhaps the most notable finding is that Mistral Small 3 scores higher than 4o-mini in the MMLU Pro and GPQA (main) benchmarks.
The model is particularly suited for tasks enhanced by Mistral Small 3's latency optimization, including conversational assistance, function calling, and subject-matter expert development. Quantized versions of Mistral Small 3 can run on an RTX 4090 GPU, a MacBook with 32GB RAM, or comparable hardware, making it accessible for users with limited resources or privacy concerns. The released checkpoints lack reinforcement learning or synthetic data training, but Mistral AI suggests they can make a great base model. Early adopters are evaluating Mistral Small 3 in diverse areas including finance (fraud detection), healthcare (patient triage), robotics (on-device control), and customer service.
Comments