GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

Dec 15, 2023

Dmitry Spodarets

GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

The task of testing and evaluating AI systems is extremely challenging, especially when it involves text and unstructured data. In the case of LLM-based applications, these challenges are magnified by the fact that there isn't "one correct answer" and by a combination of various external constraints such as topics that shouldn't be discussed.

Speaker:
Philip is the co-founder and CEO of Deepchecks. Philip is an experienced Data Scientist and in the past, he led a top-tier ML research group that tackled difficult problems from various disciplines (NLP, Computer Vision, Signal Processing, etc). Philip has a B.Sc. in Physics from the Hebrew University, which he obtained as part of the Talpiot excellence program, and an M.Sc. in Electrical Engineering from Tel Aviv University (Thesis in ML, accepted to IJCAI 2019). He was selected as a featured honoree in the Israeli Forbes 30 Under 30 list, class of 2021.

Comments

ElevenLabs launches an AI music generator that creates full songs from text prompts

ElevenLabs' Eleven Music is an AI music generation service that has reportedly been trained on properly licensed data. However, its commercial viability remains uncertain due to its restrictive terms of service, which contradict some public claims about commercial use permissions.

Aug 13, 2025

by Ellie Ramirez-Camara

News

DeepMind's Genie 3: A real-time world model for training AI agents

Genie 3 is a real-time world model that generates 3D environments that remain coherent for minutes at a time and can be customized using "promptable world events. DeepMind has highlighted Genie 3's potential for creating new training and education opportunities for a wide variety of agents.

Aug 11, 2025

by Ellie Ramirez-Camara

Data Phoenix Digest News

AI Highlights Review: July 23 – August 8

OpenAI's very busy week, from its funding round to its model launches; Clay's $100M Series C; Lovable's new unicorn status; Latent Labs' new no-code web interface; Google DeepMind's Aeneas; FuriosaAI's new partnership with LG AI Research; and more

Aug 08, 2025

by Ellie Ramirez-Camara

SF Bay Area media and education platform focused on AI and Data. As a voice of AI industry, Data Phoenix delivers news, practical knowledge, and helps companies be heard in the community.

Subscribe

GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

Comments

Read Next

ElevenLabs launches an AI music generator that creates full songs from text prompts

DeepMind's Genie 3: A real-time world model for training AI agents

AI Highlights Review: July 23 – August 8