Now that the long-anticipated era of AI 'agents' is underway, OpenAI is redoubling its efforts to remain center stage. Following its launch of Operator, an agent can use its web browser to complete tasks like making restaurant reservations, booking accommodations, and even shopping on its users' behalf, OpenAI recently started rolling out 'deep research', an AI agent based on a version of the o3 model that can collect and synthesize information from several internet sources "at the level of a research analyst".
Deep research can search the internet for appropriate text, images, and PDF document sources. It can also process attachments like text files and spreadsheets for additional context, plot graphs using a Python tool, and embed those graphs and other images from the internet in its final research report. According to OpenAI, deep research queries can take anywhere from 5 to 30 minutes. Moreover, the agent summarizes its actions and the sources it has consulted in the application's sidebar as it progresses through a task. The company states deep research aims to assist people who perform "intensive knowledge work in areas like finance, science, policy, and engineering."
Generally, OpenAI considers information synthesis a necessary step towards artificial general intelligence (AGI). Despite several accusations of goalpost moving to suit the AGI definition to its best interest, the company now reports that the capability of doing novel scientific research has long been part of its vision for AGI. As with many other advances, the deep research agent's performance is mainly measured with benchmark scores. OpenAI boasts, likely according to internal research, that deep research achieves a 26.6% score in Humanity's Last Exam, a fairly recent benchmark developed by the Center for AI Safety (CAIS) and Scale AI.
The comparison is somewhat unfair because deep research has access to the internet and its Python tool; unlike the other scoreboard entries. However, the scoreboard does help put into perspective just how much o3's capabilities can improve when given access to extra tools. The next highest score on the list is OpenAI o3-mini (high) with 13% accuracy, less than half of deep research's accuracy.
The feature is initially available to ChatGPT Pro users on the web app, with plans to roll out the agent on the desktop and mobile apps within a month. Current availability is also restricted to users outside the United Kingdom, Switzerland, and the European Economic Area, with no clarity on when these users will get access. In all remaining geographic locations where deep research has been rolled out, its availability will soon expand to Plus and Team users. The current deep research version requires substantial computational resources, leading to a 100-query monthly limit for paid users. OpenAI plans to solve this by releasing an upcoming more efficient version of the agent.
Comments