"Reasoning" models have positioned themselves as the new era for foundation models largely due to their ability to handle complex problems, devise strategies for solving them, and even correct themselves when things start going wrong. However, for all their benefits, these models still have an important drawback: they still require hefty hardware to run. On the other hand, plenty on development has been made on small language models, which can deliver outstanding performance while running on resource-constrained hardware but without enhanced reasoning capabilities.

A new launch by Microsoft aims to bridge this gap. The company recently commemorated one year since Phi-3 was introduced by unveiling three small models with enhanced reasoning capabilities resulting from careful training using supervised fine-tuning on a set of curated examples of reasoning chains distilled from OpenAI o3-mini: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning.

According to Microsoft, "Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute," and the model's performance proves that data curation and high-quality synthetic data can lead to smaller models with performance rivaling that of their larger and more powerful counterparts.

The 14-billion parameter Phi-4-reasoning and its enhanced version, Phi-4-reasoning-plus, outperform significantly larger competitors on numerous benchmarks. Most notably, they achieve better results than OpenAI's o1-mini and a DeepSeek R1 distill on Llama 70B (DeepSeek-R1-Distill-Llama-70B) on mathematical reasoning (AIME 25, HMMT Feb 25, and OmniMath) and PhD-level science questions (GPQA). Additionally, Phi-4-reasoning-plus surpasses the massive 671-billion parameter DeepSeek-R1 model on the AIME and HMMT evaluations.

The Phi-4-mini-reasoning model offers impressive mathematical capabilities in a highly compact format, making it ideal for environments with tighter computational constraints. At just 3.8 billion parameters, it outperforms models more than twice its size on popular STEM evaluations (AIME 24, MATH-500, and GPQA Diamond), including distillations of DeepSeek R1 on Qwen-7B and Llama 8B. Because of Phi-4-mini-reasoning's high-quality, step-by-step problem-solving abilities, Microsoft recommends this model as ideal for "educational applications, embedded tutoring, and lightweight deployment on edge or mobile systems."

Now available via Azure AI Foundry and Hugging Face, these models represent Microsoft's commitment to expanding AI accessibility across diverse hardware environments. In particular, Microsoft has announced that the Phi-4-reasoning and Phi-4-mini-reasoning models will eventually be available to run on Copilot+ PC NPUs.