OpenAI has launched OpenAI o1, a model capable of 'thinking' through complex problems

OpenAI recently announced a preview of OpenAI o1, its next-generation model family. Models of the OpenAI o1 family have the distinctive feature of performing a long internal chain of thought processes before generating a response to a query, something akin to 'thinking' before answering. o1's performance in complex reasoning tasks is showcased with remarkable reports about the model's capabilities, especially in competitive math and coding problem-solving and PhD level reasoning skills. The preview version of this model, OpenAI o1-preview, is now available for ChatGPT Plus and Team subscribers and trusted API users.

OpenAI o1's math and coding capabilities are showcased by testing the model against GPT-4o in competitive math and coding evaluations. For math, both models were asked to solve a qualifier exam for the USA Math Olympiad (AIME 2024), with the result that o1 would place among the top 500 students taking the exam with an 83.3% score (compared to GPT-4o's 13.4%). The model also tests in the 89th percentile in competitive coding (Codeforces), with GPT-4o placing in the 11th percentile. Finally, the models' chemistry, physics and biology expertise was tested using the GPQA diamond, a difficult benchmark testing PhD level knowledge on those subjects. Per OpenAI's report, in this benchmark o1 not only surpasses GPT-4o, but also scores better than human experts asked to take the evaluation.

o1 does have its drawbacks and limitations. As a consequence of its improved performance in math, coding, and other areas requiring complex reasoning, o1 responses for these tasks were favored in human preference evaluations. However, in some areas involving natural language processing, GPT-4o was chosen as the better model, suggesting that o1 is not yet ready to become an all-purpose model. Furthermore, the internal chain of thought technique requires time and additional compute resources, meaning that o1 is slower and significantly more expensive.

Notably, o1 keeps its chains of thought hidden for reasons including "competitive advantage", a telling sign that giving models more time to think through their reasoning is a promising avenue of research (currently also being pursued by Google DeepMind). OpenAI has stated that it plans to iterate on o1 to deliver upgraded versions of the model and that it will experiment on further increasing the time a model can take to reason through complex p