Phi-4 – small fashions, huge outcomes

Editorial Team
2 Min Read


The Phi-4 household is Microsoft’s newest development in small language fashions (SLMs), designed to excel in advanced reasoning duties whereas sustaining effectivity. The Phi-4 collection consists of three key fashions: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. The newly launched fashions are constructed with a transparent focus: ship superior reasoning efficiency with out the infrastructure calls for of trillion-parameter fashions. They strike an optimum stability between dimension and efficiency utilizing superior strategies resembling distillation, reinforcement studying, and punctiliously curated knowledge.

Phi-4-reasoning is a 14-billion parameter mannequin with a 32k token context window, skilled utilizing high-quality net knowledge and OpenAI o3-mini prompts. It excels in duties requiring detailed, multi-step reasoning resembling arithmetic, coding, and algorithmic downside fixing.

Phi-4-reasoning-plus builds upon this with extra fine-tuning utilizing 1.5x extra tokens and reinforcement studying, delivering even greater accuracy and inference-time efficiency.

Phi-4-mini-reasoning, with simply 3.8 billion parameters, was skilled on a million artificial math issues generated by DeepSeek R1. It targets use instances like academic instruments and cell apps, proving able to step-by-step downside fixing in resource-constrained environments.

What units Phi-4 aside isn’t just effectivity, however sheer functionality. On benchmarks like HumanEval+ and MATH-500:

  • Phi-4-reasoning-plus outperforms DeepSeek-R1 (671B parameters) on some duties, demonstrating that smarter coaching can beat brute pressure.
  • It additionally rivals OpenAI’s o3-mini and exceeds DeepSeek-R1-Distill-Llama-70B on advanced reasoning and planning duties.
  • Phi-4-mini-reasoning performs competitively with a lot bigger fashions and even tops some in math-specific benchmarks.

True to Microsoft’s Accountable AI framework, all Phi-4 fashions are skilled with sturdy security protocols. Publish-training includes supervised fine-tuning (SFT), direct desire optimization (DPO), and reinforcement studying from human suggestions (RLHF). Microsoft makes use of public datasets targeted on security, helpfulness, and equity – making certain broad usability whereas minimizing dangers.

All three fashions are freely obtainable by way of Hugging Face and Azure AI Foundry, permitting researchers, startups, and educators to combine high-performance reasoning into their very own purposes.

Share This Article