xAI is releasing the base model weights and network architecture of Grok-1
Following Elon Musk's latest dispute with OpenAI, lawsuit included, xAI has followed through with the promise of delivering an open release of Grok-1. The weights and architecture have been released under an Apache 2.0 license, and the inference code is available on xAI's GitHub.
Amid Elon Musk's criticism and eventual lawsuit against OpenAI, and OpenAI's scathing response revealing emails where Musk seems to agree to OpenAI having a for-profit subsidiary and not open-sourcing the totality of its research outputs, Elon Musk promptly proclaimed that xAI would open-source Grok the following week. The company has partly lived up to the promise since it recently released the weights and architecture for Grok-1, xAI's 314 billion-parameter Mixture-of-Experts model. More precisely, the company is releasing the base model checkpoint from the Grok-1 pre-training phase, which means that unlike the model powering the chatbot experience for X Premium+ subscribers, the released base model is neither fine-tuned nor instruction-tuned.
Consequently, anyone wanting to operate the model as a chatbot or evaluate its performance at dialogue and other conversation-based tasks will need to undertake the additional work of fine-tuning Grok-1 for the desired tasks. This is hardly the only caveat to Grok-1's release. Its sheer size makes it impossible to run on anything besides datacenter-class hardware with enough processing power and RAM. Although Grok-1 is by far the largest open weights model to date, a model's amount of parameters only sets a broad guideline for the expected performance. It is commendable that xAI has decided to release the weights and network architecture under the Apache 2.0 license. The inference code is available for download on xAI's GitHub, via torrent, or a Hugging Face fast download instance. However, one must note that the company is not releasing training data nor including a way to connect the model to real-time information from the social media platform X.
The lack of a training data corpus does not impact the model's usability, but withholding this information is an unexpected decision coming from an individual accusing OpenAI of not being transparent enough. Without the training data, finding out which information Grok-1 was pre-trained on becomes impossible. It also makes it technically less open source than models released with more accompanying data, such as Pythia, Bloom, and OLMo. Moreover, in announcing Grok's launch, Musk touted the model's real-time access to the social media platform as "a massive advantage over other models." It has become clear that, at least for now, this is an advantage that Musk is cleverly keeping to himself.
Unquestionably, the open-weights release is a good thing for the community, and it may even succeed in putting OpenAI and other relevant players on the spot (at least temporarily). The model is also usable, although, as stated above, it is not broadly accessible, given that most of us lack access to the computing resources required to run it. xAI's move also fans the flames surrounding the uses and abuses of terms such as 'open' and 'open-source' in AI. For instance, what is the point of releasing everything but the most game-changing features of your model? Is it honest interest in open-source development or a question of moral high ground? If the former, this will hopefully be the first of many upcoming releases. Ideally, those future releases will continue to shape xAI's stance toward open development. If the latter, it is hard to believe that Grok-1 will amount to more than the latest curiosity in an already overhyped domain.
As a point of comparison with Grok-1, the Allen Institute for AI (AI2) recently released OLMo-7B and the OLMo framework. As part of a more believable commitment to open-source development, the latter includes the pre-training data, training code, model weights, training metrics, and other data related to the OLMo models' development.