Reducing NLP Inference costs through model specialisation
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
NLP inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put NLP models into production.
Meryem Arik
Meryem is the co-founder of TitanML - TitanML is an NLP development platform that focuses on deployability of LLMs - allowing businesses to build smaller and cheaper deployments of language models with ease. The TitanML platform automates much of the difficult MLOps and Inference Optimisation science to allow businesses to build and deploy state-of-the-art language models with ease.
Mistral AI acquired Austrian startup Emmi AI, which specializes in physics-based AI models for industrial engineering simulations, to build a comprehensive AI stack for aerospace, automotive, and semiconductor manufacturers.
Anthropic acquired Stainless, a developer tools startup that automates SDK generation to strengthen Claude's agent connectivity.
OpenAI launched personal finance tools in ChatGPT for U.S. Pro subscribers, allowing users to connect bank accounts from over 12,000 institutions via Plaid and receive AI-powered financial guidance using GPT-5.5's reasoning capabilities grounded in their actual spending and investment data.
Cambridge-based Tolemy Bio raised €1.4 million in pre-seed funding to develop Orbit, an AI-native platform that integrates fragmented cell biology data and virtual cell models to help biopharma teams optimize therapeutic development and manufacturing processes.
Anthropic launched an expanded suite of legal AI tools featuring over 20 MCP connectors for Claude that enable it to connect to platforms like Thomson Reuters and DocuSign, plus 12 practice-area plugins designed to assist with specific legal work.
Data Phoenix is a live media platform for AI and Data professionals, covering technologies under the hood, best practices, and live demos from the builders shaping the industry, via original shows.
Copyright © 2026 Data Phoenix. Published with Ghost and Data Phoenix.
Privacy Policy | Terms of Service | Cookie Preferences
Comments