Reducing NLP Inference costs through model specialisation
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
NLP inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put NLP models into production.
Meryem Arik
Meryem is the co-founder of TitanML - TitanML is an NLP development platform that focuses on deployability of LLMs - allowing businesses to build smaller and cheaper deployments of language models with ease. The TitanML platform automates much of the difficult MLOps and Inference Optimisation science to allow businesses to build and deploy state-of-the-art language models with ease.
Beijing-based Moonshot AI raised $2 billion at a $20 billion valuation, quintupling its value in six months as investor interest in Chinese open-weight AI models surges due to increased customer demand.
Tekst raised $13.5 million in Series A funding to build its "Process Intelligence" technology that helps AI agents understand and automate complex enterprise workflows by automatically mapping the unwritten rules and context hidden in emails, documents, and institutional knowledge.
Google DeepMind launched Gemma 4 in April, a family of open-source AI models under Apache 2.0 license that delivers state-of-the-art reasoning across four sizes—including two edge-optimized models that run autonomous agentic workflows entirely offline on mobile and IoT devices.
Dex, an AI-powered recruiting platform for tech talent, raised $5.3 million in seed funding to expand its conversational AI agent that matches engineers with companies. The startup reports it has reached about $1.8M in ARR since it launched its paid services.
DeepSeek released DeepSeek-V4, an open-source 1.6-trillion-parameter model with a one-million-token context window that achieves near-frontier performance at roughly one-sixth the API cost of GPT-5.5 and Claude Opus 4.7.
Data Phoenix is a live media platform for AI and Data professionals, covering technologies under the hood, best practices, and live demos from the builders shaping the industry, via original shows.
Copyright © 2026 Data Phoenix. Published with Ghost and Data Phoenix.
Privacy Policy | Terms of Service | Cookie Preferences
Comments