MosaicML, a platform for end-to-end development of AI and machine learning models, has released MosaicML Inference, a fully-managed inference service that provides secure, private, and affordable deployment for large models. The service aims to make deploying machine learning models as easy as possible, while also minimizing costs and maintaining strict data privacy requirements.
The service has two tiers: Starter and Enterprise. The Starter tier allows users to query off-the-shelf models hosted by MosaicML via a public API, which is great for prototyping AI use cases. On the other hand, the Enterprise tier provides users with the flexibility, security, and control of deploying their own models within their virtual private cloud (VPC).
MosaicML Inference is highly optimized to give users low latency and high hardware utilization, and it can handle even huge models that don't fit in a single GPU's memory. It has also been extensively profiled and can be several times cheaper than alternatives for a given query load. The service is designed to meet the strict security, privacy, and DevOps requirements of enterprise customers, and it can be deployed on multiple cloud platforms, reducing vendor-lock.
The Starter tier features models for text embedding and text completion, including open-source models like Instructor-Large and Instructor-XL from HKUNLP. The text completion models range in size from 1 to 20 billion parameters, and include open-source models like GPT2-XL from OpenAI, MPT-7B-Instruct from MosaicML, Dolly-12B from Databricks, and GPT-NeoX-20B from EleutherAI.
The Enterprise tier allows users to deploy any model they want, including models trained on their internal data for maximum prediction quality. Additionally, data never has to leave a user's secure environment, which lets them provide the AI features their organization needs while remaining compliant with regulations like SOC 2 and HIPAA.
In summary, MosaicML Inference aims to make deploying machine learning models easy and cost-effective while also maintaining strict data privacy requirements. With its Starter and Enterprise tiers, and support for multiple cloud platforms, it provides a flexible and secure solution for deploying large machine learning models.