MosaicML Inference: Secure, Private, and Affordable Deployment for Large Models

MosaicML, a platform for end-to-end development of AI and machine learning models, has released MosaicML Inference, a fully-managed inference service that provides secure, private, and affordable deployment for large models. The service aims to make deploying machine learning models as easy as possible, while also minimizing costs and maintaining strict data privacy requirements.

Hosting a model using MosaicML Inference is far cheaper than using an OpenAI API with a similar model size. This holds for text and code generation models, text embedding models, and image generation models. It’s also cheaper to use the APIs in our Starter tier than similar OpenAI APIs. All MosaicML measurements are taken on 40GB NVIDIA A100s with standard 512-token input sequences or 512x512 images.

The service has two tiers: Starter and Enterprise. The Starter tier allows users to query off-the-shelf models hosted by MosaicML via a public API, which is great for prototyping AI use cases. On the other hand, the Enterprise tier provides users with the flexibility, security, and control of deploying their own models within their virtual private cloud (VPC).

MosaicML Inference is highly optimized to give users low latency and high hardware utilization, and it can handle even huge models that don't fit in a single GPU's memory. It has also been extensively profiled and can be several times cheaper than alternatives for a given query load. The service is designed to meet the strict security, privacy, and DevOps requirements of enterprise customers, and it can be deployed on multiple cloud platforms, reducing vendor-lock.

The Starter tier features models for text embedding and text completion, including open-source models like Instructor-Large and Instructor-XL from HKUNLP. The text completion models range in size from 1 to 20 billion parameters, and include open-source models like GPT2-XL from OpenAI, MPT-7B-Instruct from MosaicML, Dolly-12B from Databricks, and GPT-NeoX-20B from EleutherAI.

The Enterprise tier allows users to deploy any model they want, including models trained on their internal data for maximum prediction quality. Additionally, data never has to leave a user's secure environment, which lets them provide the AI features their organization needs while remaining compliant with regulations like SOC 2 and HIPAA.

In summary, MosaicML Inference aims to make deploying machine learning models easy and cost-effective while also maintaining strict data privacy requirements. With its Starter and Enterprise tiers, and support for multiple cloud platforms, it provides a flexible and secure solution for deploying large machine learning models.

Subscribe

MosaicML Inference: Secure, Private, and Affordable Deployment for Large Models

Comments

Read Next

Cursor acquires code review startup Graphite

Ai2 launches Molmo 2, open-source multimodal models with advanced video understanding

Ai2's OLMo 3.1: truly open-source models with enhanced reasoning and instruction-following capabilities