Reducing NLP Inference costs through model specialisation
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
NLP inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put NLP models into production.
Meryem Arik
Meryem is the co-founder of TitanML - TitanML is an NLP development platform that focuses on deployability of LLMs - allowing businesses to build smaller and cheaper deployments of language models with ease. The TitanML platform automates much of the difficult MLOps and Inference Optimisation science to allow businesses to build and deploy state-of-the-art language models with ease.
Jeff Bezos's physical AI startup Prometheus has raised $12B at a $41B valuation to build AI tools that automate the design and manufacturing of complex physical products.
Niteshift, founded by two Datadog veterans, has raised $7M to build a model-agnostic cloud infrastructure layer for AI coding agents, betting that enterprises will want to avoid vendor lock-in with the major AI labs.
PhysicsX, a London-based AI engineering startup, has raised $300M at a $2.4B valuation to scale its physics simulation platform across industries like aerospace, semiconductors, and automotive.
Suno raised $400 million at a $5.4 billion valuation—more than doubling its worth in seven months—despite facing copyright lawsuits from Universal Music Group and Sony alleging unauthorized use of over 61,000 copyrighted works in its AI training data.
OpenAI expanded Codex with six role-specific plugins for jobs like sales and investment banking, a Sites feature for sharing work as hosted interactive webpages, and inline Annotations for targeted edits, as non-developer users grow three times faster than developers on the platform.
Data Phoenix is a live media platform for AI and Data professionals, covering technologies under the hood, best practices, and live demos from the builders shaping the industry, via original shows.
Copyright © 2026 Data Phoenix. Published with Ghost and Data Phoenix.
Privacy Policy | Terms of Service | Cookie Preferences
Comments