Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks

Mistral AI and NVIDIA announced Mistral NeMo, a 12B language model that can run anywhere

Mistral AI and NVIDIA released Mistral NeMo 12B, a state-of-the-art 12B-parameter language model for enterprise applications, featuring a 128K context length, FP8 data format, and optimized performance across various tasks, packaged as an NVIDIA NIM inference microservice for easy deployment.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Mistral AI and NVIDIA announced Mistral NeMo, a 12B language model that can run anywhere
Credit: NVIDIA

Mistral and NVIDIA recently announced the availability of Mistral NeMo, a small (12B parameters) language model both companies developed in collaboration. Mistral NeMo ships with a considerably sized 128K token-context window, and excels in reasoning, world knowledge, and coding, achieving state-of-the-art performance in those categories among comparably sized models, including Gemma 2 (9B), and Llama 3 (8B). The model is also trained on function-calling and is designed to support multilingual tasks, with solid performance in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Since Mistral NeMo was built using a traditional transformer architecture, Mistral AI has suggested that Mistral NeMo could easily replace Mistral 7B wherever the latter has been deployed. This way, users can benefit from improved instruction-following, reasoning, multi-turn conversations, and code generation (compared with Mistral 7B). Since Mistral NeMo was trained with quantization awareness, the model uses the FP8 data format for inference, reducing Mistral NeMo's memory footprint, making it faster to deploy, and having no impact on the model's accuracy and performance. All these features make Mistral NeMo ideal for enterprise-grade applications.

Mistral NeMo was trained on NVIDIA's DGX Cloud AI platform and optimized with NVIDIA TensorRT-LLM to achieve competitive performance on such a small memory footprint. Designed to fit in the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090, or NVIDIA RTX 4500 GPU, the model can be deployed on workstations for a high-efficiency, low-compute cost solution that offers increased security and privacy. The model can also be deployed in data centers and the cloud.

Released under the Apache 2.0 license, the Mistral NeMo weights for the instruct and base models are available for download at Hugging Face. It is also available on Mistral AI's Le Plateforme as open-mistral-nemo-2407. The model comes packaged as an NVIDIA NIM inference microservice at ai.nvidia.com, with a downloadable NIM container in the works.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More