OpenAI releases GPT-5.2 after “code purple” Google risk alert

Editorial Team
4 Min Read


In trying to maintain up with (or forward of) the competitors, mannequin releases proceed at a gentle clip: GPT-5.2 represents OpenAI’s third main mannequin launch since August. GPT-5 launched that month with a brand new routing system that toggles between instant-response and simulated reasoning modes, although customers complained about responses that felt chilly and scientific. November’s GPT-5.1 replace added eight preset “persona” choices and targeted on making the system extra conversational.

Numbers go up

Oddly, though the GPT-5.2 mannequin launch is ostensibly a response to Gemini 3’s efficiency, OpenAI selected to not listing any benchmarks on its promotional web site evaluating the 2 fashions. As an alternative, the official weblog submit focuses on GPT-5.2’s enhancements over its predecessors and its efficiency on OpenAI’s new GDPval benchmark, which makes an attempt to measure skilled data work duties throughout 44 occupations.

In the course of the press briefing, OpenAI did share some competitors comparability benchmarks that included Gemini 3 Professional and Claude Opus 4.5 however pushed again on the narrative that GPT-5.2 was rushed to market in response to Google. “It is very important be aware this has been within the works for a lot of, many months,” Simo advised reporters, though selecting when to launch it, we’ll be aware, is a strategic choice.

In line with the shared numbers, GPT-5.2 Pondering scored 55.6 % on SWE-Bench Professional, a software program engineering benchmark, in comparison with 43.3 % for Gemini 3 Professional and 52.0 % for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 % versus Gemini 3 Professional’s 91.9 %.

GPT-5.2 benchmarks that OpenAI shared with the press.


Credit score:

OpenAI / Venturebeat


OpenAI says GPT-5.2 Pondering beats or ties “human professionals” on 70.9 % of duties within the GDPval benchmark (in comparison with 53.3 % for Gemini 3 Professional). The corporate additionally claims the mannequin completes these duties at greater than 11 occasions the velocity and fewer than 1 % of the price of human specialists.

GPT-5.2 Pondering additionally reportedly generates responses with 38 % fewer confabulations than GPT-5.1, in keeping with Max Schwarzer, OpenAI’s post-training lead, who advised VentureBeat that the mannequin “hallucinates considerably much less” than its predecessor.

Nonetheless, we at all times take benchmarks with a grain of salt as a result of it’s simple to current them in a means that’s optimistic to an organization, particularly when the science of measuring AI efficiency objectively hasn’t fairly caught up with company gross sales pitches for humanlike AI capabilities.

Unbiased benchmark outcomes from researchers exterior OpenAI will take time to reach. Within the meantime, for those who use ChatGPT for work duties, count on competent fashions with incremental enhancements and a few higher coding efficiency thrown in for good measure.

Share This Article