Reducing NLP Inference costs through model specialisation
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
NLP inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put NLP models into production.
Meryem Arik
Meryem is the co-founder of TitanML - TitanML is an NLP development platform that focuses on deployability of LLMs - allowing businesses to build smaller and cheaper deployments of language models with ease. The TitanML platform automates much of the difficult MLOps and Inference Optimisation science to allow businesses to build and deploy state-of-the-art language models with ease.
Nscale raised $2 billion in Europe's largest Series C at a $14.6 billion valuation to accelerate AI infrastructure buildout globally. In parallel, Nscale announced the appointment of Sheryl Sandberg, Nick Clegg, and Susan Decker to its board.
Replit raised $400 million at a $9 billion valuation, effectively tripling its valuation since its last funding round. Replit also launched Agent 4, a faster AI coding agent that can be run in multiple parallel instances and that can handle more complex workflows than its predecessors.
Recent AI translations of Wikipedia articles have been found to contain substantial errors and hallucinations, causing outrage amongst the Wikipedia volunteers tasked with fighting the endless stream of AI slop that threatens the encyclopedia's survival and integrity.
GPT-5.3 Instant, OpenAI's most recent model update, brings improved conversational tone, flow, and relevance after widespread frustration with GPT-5.2's overbearing tone and unwarranted assumptions about its users' intent and emotional states.
OpenAI raised $110 billion at a $730 billion pre-money valuation from SoftBank, NVIDIA, and Amazon. The startup also secured strategic partnerships for infrastructure and scaling with Amazon and NVIDIA, as OpenAI continues to serve 900M weekly ChatGPT users and 1.6M weekly Codex users.
SF Bay Area media and education platform focused on AI and Data. As a voice of AI industry, Data Phoenix delivers news, practical knowledge, and helps companies be heard in the community.
Copyright © 2026 Data Phoenix. Published with Ghost and Data Phoenix.
Privacy Policy | Terms of Service | Cookie Preferences
Comments