MosaicML Introduces MPT-30B
MosaicML is excited to introduce MPT-30B, the latest addition to the Foundation Series of open-source models. This new model, trained with an 8k context length on H100s, is more powerful and outperforms the original GPT-3. The ML community has enthusiastically embraced the MosaicML Foundation Series models, with over 3 million
MosaicML is excited to introduce MPT-30B, the latest addition to the Foundation Series of open-source models. This new model, trained with an 8k context length on H100s, is more powerful and outperforms the original GPT-3. The ML community has enthusiastically embraced the MosaicML Foundation Series models, with over 3 million downloads of the MPT-7B base, -Instruct, -Chat, and -StoryWriter models.
The capabilities of MPT-7B have amazed us, with the community building impressive projects like LLaVA-MPT, GGML, and GPT4All. In light of this success, we are thrilled to expand the Foundation Series with MPT-30B, a commercial open-source model that offers even greater power and performance. Additionally, they are releasing two fine-tuned variants, MPT-30B-Instruct and MPT-30B-Chat, which excel at single-turn instruction following and multi-turn conversations, respectively.
The MPT-30B models come with special features that set them apart from other language models. They have an 8k token context window at training time, support for longer contexts via ALiBi, and efficient inference and training performance through FlashAttention. Furthermore, the MPT-30B family exhibits strong coding abilities due to its pretraining data mixture. It is the first language model trained on H100s, which are now available to MosaicML customers.
The size of MPT-30B has been optimized for easy deployment on a single GPU, making it accessible to users with 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision. Unlike comparable models such as Falcon-40B, MPT-30B can be served on a single datacenter GPU, reducing the minimum inference system cost.
To start using MPT-30B in production, there are multiple ways to customize and deploy it using the MosaicML Platform. You can customize MPT-30B through finetuning, domain-specific pretraining, or training from scratch using your private data. MosaicML Inference offers both Starter and Enterprise editions for deploying MPT-30B models. The Starter edition allows you to make API requests to MosaicML-hosted endpoints for MPT-30B-Instruct and other text generation and embedding models. On the other hand, the Enterprise edition gives you the flexibility to deploy custom MPT-30B models in your own private VPC, ensuring maximum model accuracy, cost efficiency, and data privacy.
MosaicML is excited to see the amazing projects the community and customers will build with MPT-30B. The model's combination of powerful text generation and strong programming capabilities makes it an appealing choice for a wide range of applications. To learn more about MPT-30B and how to customize and deploy it using the MosaicML platform, visit our website and explore the possibilities.