GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

Dmitry Spodarets

· Dec 15, 2023

GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

The task of testing and evaluating AI systems is extremely challenging, especially when it involves text and unstructured data. In the case of LLM-based applications, these challenges are magnified by the fact that there isn't "one correct answer" and by a combination of various external constraints such as topics that shouldn't be discussed.

Speaker:
Philip is the co-founder and CEO of Deepchecks. Philip is an experienced Data Scientist and in the past, he led a top-tier ML research group that tackled difficult problems from various disciplines (NLP, Computer Vision, Signal Processing, etc). Philip has a B.Sc. in Physics from the Hebrew University, which he obtained as part of the Talpiot excellence program, and an M.Sc. in Electrical Engineering from Tel Aviv University (Thesis in ML, accepted to IJCAI 2019). He was selected as a featured honoree in the Israeli Forbes 30 Under 30 list, class of 2021.

Comments

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Jeff Bezos's physical AI startup Prometheus has raised $12B at a $41B valuation to build AI tools that automate the design and manufacturing of complex physical products.

Jun 16, 2026

by Ellie Ramirez-Camara

News

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

Niteshift, founded by two Datadog veterans, has raised $7M to build a model-agnostic cloud infrastructure layer for AI coding agents, betting that enterprises will want to avoid vendor lock-in with the major AI labs.

Jun 10, 2026

by Ellie Ramirez-Camara

News

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

PhysicsX, a London-based AI engineering startup, has raised $300M at a $2.4B valuation to scale its physics simulation platform across industries like aerospace, semiconductors, and automotive.

Jun 08, 2026

by Ellie Ramirez-Camara

News

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Suno raised $400 million at a $5.4 billion valuation—more than doubling its worth in seven months—despite facing copyright lawsuits from Universal Music Group and Sony alleging unauthorized use of over 61,000 copyrighted works in its AI training data.

Jun 03, 2026

by Ellie Ramirez-Camara

News

Codex now boasts plugins for white-collar work and other new features for Enterprise users

OpenAI expanded Codex with six role-specific plugins for jobs like sales and investment banking, a Sites feature for sharing work as hosted interactive webpages, and inline Annotations for targeted edits, as non-developer users grow three times faster than developers on the platform.

Jun 02, 2026

by Ellie Ramirez-Camara

Subscribe

GPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risks

Comments

Read Next

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Codex now boasts plugins for white-collar work and other new features for Enterprise users