OpenAI Releases GPT-5.4 With Pro and Thinking Versions

OpenAI released GPT-5.4 on Thursday, positioning it as its most capable and efficient frontier model for professional work. The release comes in three forms: a standard version, a reasoning-focused variant called GPT-5.4 Thinking, and a high-performance tier called GPT-5.4 Pro.

Contents

Benchmark Performance
Fewer Hallucinations
A New Approach to Tool Calling
Chain-of-Thought Safety

The API version supports context windows of up to 1 million tokens, the largest OpenAI has offered to date. The company also pointed to improved token efficiency, saying the model solves the same problems with significantly fewer tokens than its predecessor.

Benchmark Performance

GPT-5.4 set record scores on two computer use benchmarks, OSWorld-Verified and WebArena Verified, and scored 83% on OpenAI’s GDPval test for knowledge work tasks. It also topped Mercor’s APEX-Agents benchmark, which evaluates professional skills in law and finance.

Brendan Foody, CEO of Mercor, said the model “excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis,” adding that it delivers “top performance while running faster and at a lower cost than competitive frontier models.”

Fewer Hallucinations

OpenAI said GPT-5.4 is 33% less likely to make errors in individual claims compared to GPT-5.2, and overall responses are 18% less likely to contain errors. Reducing hallucinations has been a persistent challenge for large language models, and these figures represent a measurable step in that direction.

A New Approach to Tool Calling

One of the more technical changes in the release concerns how the API handles tool calling. The previous system required model prompts to define all available tools upfront, a process that consumed growing numbers of tokens as tool libraries expanded.

The new system, called Tool Search, allows the model to look up tool definitions only when needed. The result is faster and cheaper requests in environments where many tools are available, which matters particularly for developers building agent-based applications.

Chain-of-Thought Safety

OpenAI also introduced a new safety evaluation focused on chain-of-thought reasoning, the step-by-step commentary models produce when working through multi-step tasks. AI safety researchers have raised concerns that reasoning models might misrepresent this internal process, obscuring what they are actually doing.

OpenAI’s testing found that deception in the chain-of-thought is less likely in GPT-5.4 Thinking, with the company stating the results suggest “the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.” The evaluation does not eliminate the concern entirely, but it provides a formalized method for tracking it going forward.

The launch arrives at a sensitive moment for OpenAI. The company recently faced a surge in ChatGPT uninstalls after announcing a deal with the Department of Defense, and competitors including Anthropic have publicly challenged its messaging. GPT-5.4 is, in part, a product answer to that pressure.

Photo by BoliviaInteligente on Unsplash

This article is a curated summary based on third-party sources. Source: Read the original article

Benchmark Performance

Fewer Hallucinations

A New Approach to Tool Calling

More Read

Chain-of-Thought Safety

All the latest Foxiz news straight to your inbox​

All the latest Foxiz news straight to your inbox