Aside from its consumer products, some of the Microsoft AI advancements that have gained the most notoriety are the Phi small language models, which use novel synthetic data generation techniques to deliver strong performance in a small package. Microsoft has made significant investments in this model family, with Phi-4 introduced in December 2024, a little over six months after Phi-3, which was unveiled in May.
Now, a research group from Microsoft Research Asia has published a pre-print detailing the rStar-Math technique. After being applied to small language models (SLMs), including Microsoft's Phi-3 and Qwen's 1.5 and 7B models, rStar-Math significantly improved those models' math problem-solving capabilities. The difference was remarkable enough that evaluations showed these small AI models outperforming OpenAI's o1-preview and o1-mini in the MATH benchmark (word problem solving). Additionally, Qwen-7B's enhanced math problem-solving capabilities did not trail far behind o1-mini in the AIME 2024 benchmark, where Qwen-7B (with rStar-Math) earned a score that placed it among the 20% of brightest high school students in the US.
At the heart of this innovation in AI math problem-solving is the use of Monte Carlo Tree Search (MCTS) which, broadly, finds and ranks the intermediate steps toward the correct solution of a math problem to ensure that the final answer is derived from high-quality intermediate steps. As an additional quality filter, the system generates reasoning steps using both natural language and Python code, with the natural language responses embedded as code comments. From those, only the steps containing verifiable running Python code are preserved.
Although rStar-Math is an early-version research product, it has already delivered promising results. Not only did it help strengthen the math problem-solving capabilities of SLMs, but the researchers point out it overcomes several challenges faced by alternative techniques to strengthen AI math problem-solving skills like model distillation, reward models, and some varieties of the now-popular test-time compute scaling technique. The code for rStar-Math was recently open-sourced by the research team, and is now available at https://github.com/microsoft/rStar.
Comments