Mistral AI recently made quite the entrance to the generative AI community, releasing both its mission statement, where it vows to become a leader of the open generative AI community, and Mistral 7B, an open-source pre-trained 7.3 billion parameter model that outperforms every open-source pre-trained model composed of up to 13 billion parameters, as evidenced by the current HuggingFace Open LLM Leaderboard. Mistral 7B is released under the Apache 2.0 license, which makes it unrestrictedly available.
Lately, there are two recurrent themes in discussions surrounding the future of generative AI. The first one is that specialized models that perform well in a specific set of tasks, that are compressed as much as possible, and that can be customized to a particular workflow are one of the best ways forward when developing AI-powered solutions, whether for an enterprise setting or the general public. For example, we recently covered the story of Nexusflow, who based their NexusRaven-13B on several models of the LlaMA family to achieve a 90%+ accuracy in function calling for cybersecurity tools.
The other theme is the near certainty that open-source models are better suited to this task than proprietary closed models because of their cost-effectiveness, the possibility to enhance and fine-tune as desired, and important privacy and data ownership concerns. Nexusflow is also a prime example of this, but also the development of open-source tools that allow safe, scalable deployment of LLMs like Anyscale's Ray (Anyscale is also working with Meta to advance the LlaMA ecosystem).
Both themes are behind Mistral AI's mission "to spearhead the revolution of open models." Their plan to achieve this mission is to progressively release open-source models that bridge the gap between black-box and open solutions, thus strengthening the positioning of open-source solutions as the best solution for a wide range of cases. This plan begins with Mistral 7B, a 7.3 billion-parameter model with a performance comparable to its 13 billion parameter competitors (Mistral ranks right next to Llama-13b-chat and Llama-13b).
In addition to its commendable performance, Mistral 7B can easily be fine-tuned to perform a wide range of tasks. To showcase this feature, Mistral 7B was fine-tuned for chat, resulting in the unmoderated Mistral 7B Instruct, a model that outperforms not only 7B models but also some of its 13B counterparts, such as Llama-2-13b-chat. Mistral claims it'll start working with the community to make Mistral 7B Instruct finely follow guardrails, allowing for its deployment in contexts requiring output moderation. The full technical details of Mistral 7B can be found here.
Mistral seems to have taken its first steps in the right direction and has already promised even more exciting future developments. The team has stated they are training larger models and shifting towards novel architectures. This will be reflected in their future releases, which will happen jointly with Mistral's commercial offerings, consisting of optimized proprietary models for on-premise or private cloud deployment. The models will be marketed as white box solutions, meaning the source code and weights will be publicly released.
Data Phoenix Newsletter
Join the newsletter to receive the latest updates in your inbox.