Reducing NLP Inference costs through model specialisation
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
This talk will discuss ways to reduce costs for NLP inference through a better choice of model, hardware, and model compression techniques.
NLP inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put NLP models into production.
Meryem Arik
Meryem is the co-founder of TitanML - TitanML is an NLP development platform that focuses on deployability of LLMs - allowing businesses to build smaller and cheaper deployments of language models with ease. The TitanML platform automates much of the difficult MLOps and Inference Optimisation science to allow businesses to build and deploy state-of-the-art language models with ease.
Interloom has raised a $16.5M seed round to develop a platform that captures undocumented operational expertise and transforms it into a permanent context layer for AI agents. With its "Context Graph", Interloom aims to address the critical knowledge gap that affects enterprise AI deployment.
Mistral has launched Forge, a platform that lets enterprises train AI models from scratch on their own proprietary data for greater accuracy and control.
Cursor recently released Composer 2, a new in-house coding model that vastly improves its predecessor's performance. While Composer 2's benchmark scores may not be outstanding, Cursor is betting that the model's lower price point and native integration to the coding environment will drive adoption.
Yann LeCun's AMI Labs raised a $1.03 billion seed round at a $3.5 billion valuation, Europe's largest seed round on record. The startup will use the raised money to continue developing world models that can be applied to robotics, industrial, and healthcare applications.
Encyclopedia Britannica and Merriam-Webster have sued OpenAI for copyright infringement and trademark infringement. The publishers accuse OpenAI of unlawful scraping and reproduction of their content and claim that falsely attributed hallucinations are damaging their reputations as trusted sources.
SF Bay Area media and education platform focused on AI and Data. As a voice of AI industry, Data Phoenix delivers news, practical knowledge, and helps companies be heard in the community.
Copyright © 2026 Data Phoenix. Published with Ghost and Data Phoenix.
Privacy Policy | Terms of Service | Cookie Preferences
Comments