Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks
Mistral AI unveiled two new purpose-specific models, Codestral Mamba and Mathstral
Credit: Generated using Microsoft Designer

Mistral AI unveiled two new purpose-specific models, Codestral Mamba and Mathstral

Codestral Mamba, a coding model leveraging the Mamba architecture for speed and performance, and Mathstral, a state-of-the-art model for advanced mathematical reasoning, highlight the benefits of fine-tuning with high-quality data rather than aiming for size.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara

Mistral AI recently released Mathstral, a STEM-specialized model based on Mistral 7B, and Codestral Mamba, a model fine-tuned for coding applications based on the Mamba architecture.

Codestral Mamba is a small model that performs on par with larger, state-of-the-art, coding models based on the traditional Transformer architecture, including Codestral (22B). The Mamba architecture gives Codestral Mamba the capability of handling linear time inference and, theoretically, of modeling sequences of infinite length. These capabilities grant Codestral Mamba the speed expected from a coding assistant, regardless of the input length. Codestral Mamba is a 7B instructed model and can retrieve within up to 256k context tokens.

Codestral Mamba is available under the Apache 2.0 license and can be deployed using the mistral-inference SDK or TensorRT-LLM. It is also accessible for testing on Mistral AI's "la Plateforme."

Mathstral contributes to the effort to develop models with advanced mathematical capabilities that involve solving complex problems requiring multistep logical reasoning. Mathstral also furthers Mistral AI's commitment to supporting academic projects, as the model was produced in collaboration with Project Numina. Mathstral is an instruction-tuned model trained in STEM subjects and is built on Mistral 7B, achieving remarkable scores and state-of-the-art performance in industry-standard benchmarks testing advanced STEM knowledge.

Notably, Mathstral achieves 56.6% on MATH and 63.47% on MMLU, but it can improve on those scores with more inference-time computation. Mistral reports the model scored 68.37% on MATH with majority voting and 74.59% with a strong reward model among 64 candidates. Like Codestral Mamba, Mathstral was released under an Apache 2.0 license. The weights are available at Hugging Face, and the model can be tried out mistral-inference, and customized using mistral-finetune.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More