Mistral AI Releases Mistral Small 3: A Fast, Efficient 24B Parameter Model

Mistral AI has released Mistral Small 3, a latency-optimized and efficient 24B-parameter open-source language model that matches the performance of models three times its size and can run locally on consumer hardware. Mistral Small 3 is available under an Apache 2.0 license.

Mistral AI's newest model, Mistral Small 3, is a 24-billion-parameter LLM focused on efficiency and latency reduction suitable for most language and instruction-following-based tasks. The startup claims Mistral Small 3's performance is on par with larger models, like Llama 3.3 70B or Qwen 32B, and recommends it as a suitable replacement for closed-source models such as GPT 4o-mini. Mistral AI has released Mistral Small 3 under a permissive Apache 2.0, following its commitment to make its general purpose openly available. Mistral Small 3 is now available through multiple platforms besides Mistral AI's La Plateforme. These include: Hugging Face, Ollama, Kaggle, Together AI, and Fireworks AI. Additional releases are planned for NVIDIA NIM, Amazon SageMaker, Groq, Databricks, and Snowflake.

Mistral Small 3's underwent human preference evaluations and benchmark testing before release. The findings reveal a strong preference for Mistral Small 3 over Gemma 2 27B for generalist tasks and Qwen 2.5 32B for generalist and coding tasks. It is a tighter race with GPT 4o-mini, as evaluators preferred the latter slightly over 40% of the time in the generalist evaluation. Although the model does not achieve state-of-the-art performance in any benchmark, the comparison provides evidence for Mistral Small 3's performance claims. They show the model consistently outperforming Gemma 2 and holding its own against Llama 3.3 70B and Qwen 32B. Perhaps the most notable finding is that Mistral Small 3 scores higher than 4o-mini in the MMLU Pro and GPQA (main) benchmarks.

The model is particularly suited for tasks enhanced by Mistral Small 3's latency optimization, including conversational assistance, function calling, and subject-matter expert development. Quantized versions of Mistral Small 3 can run on an RTX 4090 GPU, a MacBook with 32GB RAM, or comparable hardware, making it accessible for users with limited resources or privacy concerns. The released checkpoints lack reinforcement learning or synthetic data training, but Mistral AI suggests they can make a great base model. Early adopters are evaluating Mistral Small 3 in diverse areas including finance (fraud detection), healthcare (patient triage), robotics (on-device control), and customer service.

Subscribe

Mistral AI Releases Mistral Small 3: A Fast, Efficient 24B Parameter Model

Comments

Read Next

Yann LeCun's AMI Labs just raised Europe's largest seed round for its world models

Encyclopedia Britannica and Merriam-Webster are the latest publishers to sue OpenAI

Nscale announces Europe's largest Series C, Sheryl Sandberg and Nick Clegg join its board

Replit launches Agent 4 as part of its mission to make software development widely accessible

AI translations are flooding Wikipedia articles with errors and hallucinations