OpenAI releases GPT-5.2 after “code purple” Google risk alert

In trying to maintain up with (or forward of) the competitors, mannequin releases proceed at a gentle clip: GPT-5.2 represents OpenAI’s third main mannequin launch since August. GPT-5 launched that month with a brand new routing system that toggles between instant-response and simulated reasoning modes, although customers complained about responses that felt chilly and scientific. November’s GPT-5.1 replace added eight preset “persona” choices and targeted on making the system extra conversational.

Numbers go up

Oddly, though the GPT-5.2 mannequin launch is ostensibly a response to Gemini 3’s efficiency, OpenAI selected to not listing any benchmarks on its promotional web site evaluating the 2 fashions. As an alternative, the official weblog submit focuses on GPT-5.2’s enhancements over its predecessors and its efficiency on OpenAI’s new GDPval benchmark, which makes an attempt to measure skilled data work duties throughout 44 occupations.

In the course of the press briefing, OpenAI did share some competitors comparability benchmarks that included Gemini 3 Professional and Claude Opus 4.5 however pushed again on the narrative that GPT-5.2 was rushed to market in response to Google. “It is very important be aware this has been within the works for a lot of, many months,” Simo advised reporters, though selecting when to launch it, we’ll be aware, is a strategic choice.

In line with the shared numbers, GPT-5.2 Pondering scored 55.6 % on SWE-Bench Professional, a software program engineering benchmark, in comparison with 43.3 % for Gemini 3 Professional and 52.0 % for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 % versus Gemini 3 Professional’s 91.9 %.

GPT-5.2 benchmarks that OpenAI shared with the press.

Credit score:

OpenAI / Venturebeat

OpenAI says GPT-5.2 Pondering beats or ties “human professionals” on 70.9 % of duties within the GDPval benchmark (in comparison with 53.3 % for Gemini 3 Professional). The corporate additionally claims the mannequin completes these duties at greater than 11 occasions the velocity and fewer than 1 % of the price of human specialists.

GPT-5.2 Pondering additionally reportedly generates responses with 38 % fewer confabulations than GPT-5.1, in keeping with Max Schwarzer, OpenAI’s post-training lead, who advised VentureBeat that the mannequin “hallucinates considerably much less” than its predecessor.

Nonetheless, we at all times take benchmarks with a grain of salt as a result of it’s simple to current them in a means that’s optimistic to an organization, particularly when the science of measuring AI efficiency objectively hasn’t fairly caught up with company gross sales pitches for humanlike AI capabilities.

Unbiased benchmark outcomes from researchers exterior OpenAI will take time to reach. Within the meantime, for those who use ChatGPT for work duties, count on competent fashions with incremental enhancements and a few higher coding efficiency thrown in for good measure.

Insights

Tech Hubs

OpenAI releases GPT-5.2 after “code purple” Google risk alert

Numbers go up

Most Read

Trump administration nixes Biden-era well being IT insurance policies, together with AI ‘mannequin playing cards’

Within the blogs: Usually optimistic

The Operational Sign Authorized Leaders Ought to Pay Consideration To In 2026

Police in search of bikers dressed as Santa after man significantly injured in crash

Administration: ASL Interpreters At Briefings Would Forestall Trump From ‘Controlling His Picture’

Insights

Tech Hubs

Numbers go up

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Most Read

Trump administration nixes Biden-era well being IT insurance policies, together with AI ‘mannequin playing cards’

Within the blogs: Usually optimistic

The Operational Sign Authorized Leaders Ought to Pay Consideration To In 2026

Police in search of bikers dressed as Santa after man significantly injured in crash

Administration: ASL Interpreters At Briefings Would Forestall Trump From ‘Controlling His Picture’