A Whirlwind Tour of ML Model Serving Strategies (Including LLMs)

There are many recipes to serve machine learning models to end users today, and even though new ways keep popping up as time passes, some questions remain: How do we pick the appropriate serving recipe from the menu we have available, and how can we execute it as fast and efficiently as possible? In this talk, we’re going to go through a whirlwind tour of the different machine learning deployment strategies available today for both traditional ML systems and Large Language Models, and we’ll also touch on a few do’s and don’ts while we’re at it. This session will be jargonless but not buzzwordy- or meme-less.

​Speaker:
Ramon is a data scientist, researcher, and educator currently working in the Developer Relations team at Seldon in London. Prior to joining Seldon, he worked as a freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Before freelancing, Ramon wore different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.