Yesterday, OpenAI unveiled ChatGPT agent, a new product that can autonomously handle multi-step tasks from start to finish, using a virtual computer within the user's system. The new capability combines Operator's ability to browse the web and interact with websites, deep research's analytical and deliverable generation capabilities, and ChatGPT's conversational intelligence into a single powerful tool.

According to OpenAI, users can request a variety of tasks, such as creating slide decks that analyze competitors or planning shopping lists and buying the necessary ingredients for specific dishes. Once ChatGPT's 'agent mode' is prompted with requests of this sort, it will navigate websites, conduct research, run code, and deliver editable presentations or spreadsheets featuring its findings.

To help ChatGPT agent carry out the tasks it is asked to do, the platform incorporates multiple tools, including visual and text-based browsers, a terminal, and direct access to APIs. The system can integrate with apps like Gmail and GitHub through ChatGPT connectors, which enable ChatGPT to find relevant information in the user's sources to include in its responses.

OpenAI states users are always in control of the system, as ChatGPT agent asks for permission for tasks with "real world consequences" (for instance, making purchases), requires active supervision for certain critical tasks, and will refuse high-risk transactions like bank transfers. More generally, users are supposed to have the possibility of interrupting, taking over, or stopping any actions taking place in ChatGPT agent's virtual computer.

Performance benchmarks demonstrate impressive results, with the agent achieving 41.6% on Humanity's Last Exam and 27.4% accuracy on FrontierMath—significantly outperforming previous OpenAI models. According to an internal evaluation of real-world professional task performance, ChatGPT matches or exceeds human performance in roughly half the cases.

The feature launches today for Pro, Plus, and Team users, with Pro users getting 400 monthly messages and other paid tiers receiving 40 messages per month. OpenAI acknowledges this represents a higher-risk deployment given the agent's ability to take real-world actions and stated it has implemented enhanced safety measures, including prompt injection protections and biological risk safeguards.