OpenAI has officially launched GPT-5, an AI system that combines a non-reasoning GPT-style model to answer most questions with a more powerful reasoning model meant to address tougher and more complex problems. The system also includes a real-time router that determines the model to which each query should be delivered, using cues such as the type of conversation in which the query is embedded, the prompt's complexity, whether it requires tools to be solved and explicit intent. CEO Sam Altman called GPT-5 "the best model in the world" and a "significant step" toward artificial general intelligence (AGI) in a press briefing.
Expectedly, GPT-5 has been introduced to the general public as part of the recent and much-discussed transition from chatbots to AI agents. According to OpenAI, GPT-5 is capable of completing complex tasks, including generating entire software applications, managing calendars, and creating research briefs. The model's function calling, browsing, and instruction-following capabilities were extensively tested; in these areas, GPT-5 performance was found to exceed that of OpenAI's o3 and 40 models.
OpenAI also highlighted that GPT-5 was developed to excel in ChatGPT's most common use cases: health, coding, and writing. More generally, based on its benchmark scoring, GPT-5 is state-of-the-art in several domains. In many cases, this is only by a slight edge of less than a percentage point (74.9% vs. Claude 4.1 Opus' 74.5%), and the model does significantly underperform in some benchmarks. Most notably among them, GPT-5 scores 9.9% on the famed ARC-AGI-2 benchmark, over 5 percentage points lower than the Grok 4 (Thinking) state-of-the-art score of 15.9%. Still, early testers find that it is a capable and performant model.
There appears to have been a significant effort made to ensure that GPT-5 is more trustworthy, helpful, and reliable than its precursors. OpenAI claims that GPT-5 is 45% less likely to include factual errors in its responses than the 40 model, and 80% less likely than the o3 model in the LongFact and FActScore benchmarks. The model was also trained to recognize and communicate that it cannot complete a task, rather than lie about it or display excessive confidence in an incorrect solution. When tested with "anonymized prompts representative of ChatGPT production traffic," GPT-5 displays lower hallucination and deception rates than the 40 and 03 models.
GPT-5 also ships with a new technique for safety training called safe completions, which reduces the number of false refusals by answering questions with unclear user intent only partially, or by providing a high-level answer only. OpenAI also stated that although it has no evidence that GPT-5 could help a non-expert create harm related to the biological and chemical domains, the company has decided to treat the model as having high capability in these domains, and has enacted additional safeguards to protect users from those risks.
GPT-5 is coming to all ChatGPT free and paid customers, with the main difference being the usage rates allocated to each tier. Once users reach their GPT-5 usage limits, the platform will switch to GPT-5 mini, which is smaller and faster than the flagship. Relatedly, Pro subscribers will have exclusive access to GPT-5 Pro, a model version with extended reasoning capabilities. Developers using the API get three model sizes: GPT-5, GPT-5 mini, and GPT-5 nano, each supporting a verbosity parameter that sets reasoning effort: minimal, low, medium, and high. The GPT-5 base model is very competitively priced at $1.25 per million input tokens and $10 per million output tokens.
Comments