The rumors have been true: OpenAI on Thursday introduced the discharge of its new frontier massive language mannequin (LLM) household, GPT-5.2.
It comes at a pivotal second for the AI pioneer, which has confronted intensifying strain since rival Google’s Gemini 3 LLM seized the highest spot on main third-party efficiency leaderboards and plenty of key benchmarks final month, although OpenAI leaders careworn in a press briefing that the timing of this launch had been mentioned and labored on properly upfront of the discharge of Gemini 3.
OpenAI describes GPT-5.2 as its "most succesful mannequin collection but for skilled data work," aiming to reclaim the efficiency crown with important positive aspects in reasoning, coding, and agentic workflows.
"It’s our most superior frontier mannequin and the strongest but available in the market for skilled use," Fidji Simo, OpenAI’s CEO of Functions, mentioned throughout a press briefing as we speak. "We designed 5.2 to unlock much more financial worth for folks. It's higher at creating spreadsheets, constructing shows, writing code, perceiving pictures, understanding lengthy context, utilizing instruments, and dealing with advanced, multi-step tasks."
GPT-5.2 includes a large 400,000-token context window — permitting it to ingest a whole lot of paperwork or massive code repositories without delay — and a 128,000 max output token restrict, enabling it to generate intensive stories or full purposes in a single go.
The mannequin additionally includes a data cutoff of August 31, 2025, guaranteeing it’s up-to-date with comparatively current world occasions and technical documentation. It explicitly contains "Reasoning token help," confirming the underlying structure makes use of the chain-of-thought processing popularized by the "o1" collection.
The 'Code Purple' Actuality Examine
The discharge arrives following The Info's report of an emergency "Code Purple" directive to OpenAI employees from CEO Sam Altman to enhance ChaTGPT — a transfer reportedly designed to mobilize sources following the "high quality hole" uncovered by Gemini 3. The Verge equally reported on the timing of GPT-5.2's launch forward of the official announcement.
Throughout the briefing, OpenAI executives acknowledged the directive however pushed again on the narrative that the mannequin was rushed solely to reply Google.
"It is very important be aware this has been within the works for a lot of, many months," Simo advised reporters. She clarified that whereas the "Code Purple" helped focus the corporate, it wasn't the only driver of the timeline.
"We introduced this Code Purple to essentially sign to the corporate that we wish to marshal sources in a single specific space… however that's not the explanation it's popping out this week particularly."
Max Schwarzer, lead of OpenAI's post-training group, echoed this sentiment to dispel the concept of a panic launch. "We've been planning for this launch since a really very long time in the past… this particular week we talked about many months in the past."
A spokesperson from OpenAI additional clarified that the "Code Purple" name utilized to ChatGPT as a product, not solely underlying mannequin improvement or the discharge of latest fashions.
Beneath the Hood: Prompt, Pondering, and Professional
OpenAI is segmenting the GPT-5.2 launch into three distinct tiers inside ChatGPT, a method doubtless designed to stability the large compute prices of "reasoning" fashions with person demand for velocity:
-
GPT-5.2 Prompt: Optimized for velocity and day by day duties like writing, translation, and data looking for.
-
GPT-5.2 Pondering: Designed for "advanced, structured work" and long-running brokers, this mannequin leverages deeper reasoning chains to deal with coding, math, and multi-step tasks.
-
GPT-5.2 Professional: The brand new heavyweight champion. OpenAI describes this as its "smartest and most reliable choice," delivering the best accuracy for tough questions the place high quality outweighs latency.
For builders, the fashions can be found instantly within the utility programming interface (API) as gpt-5.2, gpt-5.2-chat-latest (Prompt), and gpt-5.2-pro.
The Numbers: Beating the Benchmarks
The GPT-5.2 launch contains main metrics throughout most domains — particularly people who goal the "skilled data work" hole the place rivals have not too long ago gained floor.
OpenAI highlighted a brand new benchmark referred to as GDPval, which measures efficiency on "well-specified data work duties" throughout 44 occupations.
"GPT-5.2 Pondering is now state-of-the-art on that benchmark… and beats or ties high business professionals on 70.9% of well-specified skilled duties like spreadsheets, shows, and doc creation, in line with professional human judges," Simo mentioned.
Within the essential area of coding, OpenAI is claiming a decisive lead. Schwarzer famous that on SWE-bench Professional, a rigorous analysis of real-world software program engineering, GPT-5.2 Pondering units a brand new state-of-the-art rating of 55.6%.
He emphasised that this benchmark is "extra contamination resistant, difficult, numerous, and industrially related than earlier benchmarks like SWE-bench Verified."Different key benchmark outcomes embrace:
-
GPQA Diamond (Science): GPT-5.2 Professional scored 93.2%, edging out GPT-5.2 Pondering (92.4%) and surpassing GPT-5.1 Pondering (88.1%).
-
FrontierMath: On Tier 1-3 issues, GPT-5.2 Pondering solved 40.3%, a major soar from the 31.0% achieved by its predecessor.
-
ARC-AGI-1: GPT-5.2 Professional is reportedly the primary mannequin to cross the 90% threshold on this basic reasoning benchmark, scoring 90.5%
The Value of Intelligence
Efficiency comes at a premium. Whereas ChatGPT subscription pricing stays unchanged for now, the API prices for the brand new flagship fashions are steep in comparison with earlier generations, reflecting the excessive compute calls for of "considering" mode. They're additionally on the upper-end of API prices for the business.
-
GPT-5.2 Pondering: Priced at $1.75 per 1 million enter tokens and $14 per 1 million output tokens.
-
GPT-5.2 Professional: The prices soar considerably to $21 per 1 million enter tokens and $168 per 1 million output tokens.
GPT-5.2 Pondering is priced 40% greater within the API than the usual GPT-5.1 ($1.25/$10), signaling that OpenAI views the brand new reasoning capabilities as a tangible value-add somewhat than a mere effectivity replace.
The high-end GPT-5.2 Professional follows the identical sample, costing 40% greater than the earlier GPT-5 Professional ($15/$120). Whereas costly, it nonetheless undercuts OpenAI’s most specialised reasoning mannequin, o1-pro, which stays the costliest providing on the menu at a staggering $150 per million enter tokens and $600 per million output tokens.
OpenAI argues that regardless of the upper per-token price, the mannequin’s "larger token effectivity" and skill to resolve duties in fewer turns make it economically viable for high-value enterprise workflows.
Right here's the way it compares to the present API prices for different competing fashions throughout the LLM subject:
|
Mannequin |
Enter (/1M) |
Output (/1M) |
Complete Value |
Supply |
|
Qwen 3 Turbo |
$0.05 |
$0.20 |
$0.25 |
|
|
Grok 4.1 Quick (reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
Grok 4.1 Quick (non-reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
deepseek-chat (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
deepseek-reasoner (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
Qwen 3 Plus |
$0.40 |
$1.20 |
$1.60 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
$4.25 |
|
|
Claude Haiku 4.5 |
$1.00 |
$5.00 |
$6.00 |
|
|
Qwen-Max |
$1.60 |
$6.40 |
$8.00 |
|
|
Gemini 3 Professional (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.2 |
$1.75 |
$14.00 |
$15.75 |
|
|
Gemini 3 Professional (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Claude Sonnet 4.5 |
$3.00 |
$15.00 |
$18.00 |
|
|
Claude Opus 4.5 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.2 Professional |
$21.00 |
$168.00 |
$189.00 |
Picture Era: Nothing New But…However 'Extra to Come'
Throughout the briefing, VentureBeat requested the OpenAI contributors if the brand new launch included any enhance to picture technology capabilities, noting the joy round related options in current competitor launches like Google's Gemini 3 Picture aka Nano Banana Professional.
Sadly for these looking for to recreate the form of text-and-information heavy graphics and picture enhancing capabilities, OpenAI executives clarified that GPT-5.2 comes with no present picture enhancements over the prior GPT-5.1 and OpenAI's built-in DALL-E 3 and gpt-4o native picture technology fashions.
"On picture Gen, nothing to announce as we speak, however extra to come back," Simo mentioned. She acknowledged the recognition of the characteristic, including, "We all know this can be a crucial use case that folks love, that we launched [to] the market, and so positively extra to come back there."
Aidan Clark, OpenAI's lead of coaching, additionally declined to touch upon visible technology specifics, stating merely, "I can't actually converse to picture Gen myself."
The 'Mega-Agent' Period
Past uncooked scores, OpenAI is positioning GPT-5.2 because the engine for a brand new technology of "long-running brokers" able to executing multi-step workflows with out human hand-holding."
Field discovered that 5.2 can extract data from lengthy, advanced paperwork about 40% quicker, and in addition noticed a 40% enhance in reasoning accuracy for Life Sciences and healthcare," Simo mentioned.
She additionally famous that Notion reported the mannequin "outperforms 5.1 throughout each dimension… and it excels on the form of actually ambiguous, longer rising duties that outline actual data work."Schwarzer added that coding startups like Increase Code discovered the mannequin "delivered considerably stronger deep code capabilities than any prior mannequin," which is why it was chosen to energy their new code evaluate agent.Visible capabilities have additionally seen an improve.
OpenAI's launch weblog submit reveals an instance the place "a traveler stories a delayed flight, a missed connection, an in a single day keep in New York, and a medical seating requirement."
The result? "GPT‑5.2 manages your complete chain of duties—rebooking, special-assistance seating, and compensation—delivering a extra full consequence than GPT‑5.1."
A brand new analysis referred to as ScreenSpot-Professional, which assessments a mannequin's capability to know GUI screenshots, reveals GPT-5.2 Pondering attaining 86.3% accuracy, in comparison with simply 64.2% for GPT-5.1.
Science and Reliability
OpenAI leaders additionally careworn the mannequin's utility for scientific analysis, making an attempt to maneuver the dialog past easy chatbots to analysis assistants.
Aidan Clark, lead of the coaching group, shared an instance of a senior immunology researcher testing the mannequin.
"They examined it by asking it to generate an important unanswered questions in regards to the immune system," Clark mentioned. "That immunology researcher reported that GPT-5.2 produced sharper questions and stronger explanations for why these questions… matter in comparison with any earlier professional mannequin.
"Reliability was one other key focus. Schwarzer claimed the brand new mannequin "hallucinates considerably lower than GPT-5.1," noting that on a set of de-identified queries, "responses contained errors 38% much less usually."
The 'Vibe' Shift
Apparently, OpenAI acknowledged that not each person would possibly instantly favor the brand new fashions.
When requested why legacy fashions like GPT-5.1 would stay accessible, Schwarzer admitted that "fashions change a bit bit each time.
"Some customers could discover that they like the vibes of the earlier mannequin, regardless that we expect the most recent one is throughout the board typically significantly better," Schwarzer mentioned. He additionally famous that for some enterprise prospects who’ve "actually fine-tuned a immediate for a particular mannequin," there is perhaps "small regressions," necessitating entry to the older variations.
Security, 'Grownup Mode,' and Future Roadmap
Addressing security considerations, Simo confirmed that the corporate is getting ready to roll out an "Grownup Mode" within the first quarter of subsequent yr, following the implementation of a brand new age prediction system.
"We're within the strategy of enhancing that," Simo mentioned relating to the age prediction know-how.
"We wish to do this forward of launching grownup mode."Wanting additional forward, business stories recommend OpenAI is engaged on a extra elementary architectural shift below the codename "Undertaking Garlic," concentrating on a flagship launch in early 2026.
Whereas executives didn’t touch upon particular future roadmaps in the course of the briefing, Simo remained optimistic in regards to the economics of their present trajectory.
"In case you take a look at historic developments, compute has elevated about 3x yearly for the final three years," she defined. "Income has additionally elevated on the identical tempo… creating this virtuous cycle."
Clark added that effectivity is enhancing quickly: "The mannequin we're releasing as we speak achieves an excellent higher rating [on ARC-AGI] with virtually 400 occasions much less price and fewer compute related to it" in comparison with fashions from a yr in the past.
GPT-5.2 Prompt, Pondering, and Professional start rolling out in ChatGPT as we speak to paid customers (Plus, Professional, Staff, and Enterprise). The corporate notes the rollout shall be gradual to keep up stability.