The task of testing and evaluating AI systems is extremely challenging, especially when it involves text and unstructured data. In the case of LLM-based applications, these challenges are magnified by the fact that there isn't "one correct answer" and by a combination of various external constraints such as topics that shouldn't be discussed.
Speaker:
Philip is the co-founder and CEO of Deepchecks. Philip is an experienced Data Scientist and in the past, he led a top-tier ML research group that tackled difficult problems from various disciplines (NLP, Computer Vision, Signal Processing, etc). Philip has a B.Sc. in Physics from the Hebrew University, which he obtained as part of the Talpiot excellence program, and an M.Sc. in Electrical Engineering from Tel Aviv University (Thesis in ML, accepted to IJCAI 2019). He was selected as a featured honoree in the Israeli Forbes 30 Under 30 list, class of 2021.
Microsoft's rStar-Math technique enhances SLMs with complex math problem-solving skills
Microsoft's new rStar-Math technique enables small AI models to outperform OpenAI's o1-preview on complex mathematical problems, achieving 90% accuracy on the MATH benchmark. This innovative approach overcomes challenges associated with alternatives such as model distillation and reward models.
Comments