Snowflake has introduced Snowflake Arctic, a new large language model (LLM) focused on enterprise use cases that innovates by setting a new baseline for cost-efficient training and making its weights and code available with its data recipes and research insights. After identifying conversational SQL data copilots, code copilots, and RAG chatbots as essential enterprise use cases, Snowflake set out to deliver a model that excelled at SQL, code, and complex instruction following, in addition to producing grounded answers. The team at Snowflake combined these abilities into the enterprise intelligence metric by averaging the HumanEval+, MBPP+, Spider, and IFEval benchmarks. Reportedly, Arctic offers enterprise intelligence comparable to models with more substantial compute budgets. For instance, Arctic is at the same level or better than Llama 3 8B and Llama 2 70B despite having about half of their computing budget.

Arctic's training efficiency is due to its dense-MoE (mixture of experts) hybrid transformer architecture. It combines a 10B dense transformer with a 128x3.66B MoE MLP, amounting to 480B parameters, with 17B active parameters used during inference via top-2 expert gating. Snowflake's Arctic was designed and trained with these three insights in mind:

  • Model quality depends on the amount of parameters and experts, and how the two can combine. Thus, Snowflake increased Arctic's total intelligence capacity by leveraging considerable parameters and expert numbers. As a result, the model can preserve cost-efficiency in inference and training by engaging a smaller quantity of active parameters at a time.
  • To avoid the inefficacies related to training vanilla MoE architectures, Snowflake combined a dense transformer with a residual MoE component. The communication computation overlap hides a large chunk of the communication overhead, enabling the system to achieve training efficiency.
  • A dynamic three-phase training curriculum focused on general skills first (1 trillion tokens), then enterprise-specific skills like coding and SQL in the following phases (1.5 trillion and 1 trillion tokens).

Since efficient training is only half the story, Snowflake collaborates with NVIDIA to optimize inference for the NIM microservices powered by TensorRT-LLM. Snowflake is also working with the vLLM community, and the in-house development team is planning to enable efficient inference for several use cases in the coming weeks.

In parallel with the model release, the Snowflake development team is also releasing a 'cookbook' containing a variety of recipes on topics including pre-training, fine-tuning, inference, evaluation, modeling, data, systems, and infrastructure. This release aims to accelerate learning for those interested in building with Snowflake. In addition, Snowflake is also releasing model checkpoints under an Apache 2.0 license, its LoRA-based fine-tuning pipeline. Although the release focuses on enterprise intelligence as the set of relevant capabilities for the identified use cases, Snowflake has also made a series of 'academic benchmarks' (including GSM8K and MMLU) results available in the official announcement.

There are multiple ways to get started with Arctic. In addition to the serverless experience at Snowflake Cortex, the model is available in most model gardens and catalogs, including Amazon Web Services (AWS), Lamini, Microsoft Azure, NVIDIA API catalog, Perplexity, Replicate, and Together AI. Arctic can also be directly downloaded from Hugging Face, with inference and fine-tuning recipes available on GitHub. The model can also be taken for a spin on Streamlit Community Cloud or Hugging Face Streamlit Spaces. Finally, Snowflake is organizing an Arctic-themed hackathon, where developers can access mentorship and credits to build Arctic-powered applications.