News

Anthropic's three major announcements: new models, computer use API, and an analysis tool

Anthropic announced several major updates this week, including the launch of Claude 3.5 Sonnet and Haiku models, improvements in agentic capabilities, a new computer use API public beta, and a data analysis tool for Claude.ai that lets the assistant write and run code in a sandbox.

by Ellie Ramirez-Camara

Updated October 25, 2024

Anthropic's three major announcements: new models, computer use API, and an analysis tool — Credit: Anthropic

Anthropic made several important announcements this week, starting with the launch of the upgraded Claude 3.5 Sonnet and the new Claude 3.5 Haiku. Interestingly, Anthropic notes that Claude 3.5 Haiku, the smallest in the current family of Claude models, has a performance matching Claude 3 Opus, the largest model from Claude's previous generation. Claude 3.5 Sonnet is available immediately, while Claude 3.5 Haiku will be released shortly.

The released results show that the upgraded Claude 3.5 Sonnet scores higher than its predecessor in all evaluated benchmarks. Remarkably, the two areas of most improvement are agentic coding (SWE-bench Verified), where the new Sonnet displays an improvement from 33.4% to 49.0%, and agentic tool use as tested by the TAU-bench. In the TAU-bench airline domain, Claude 3.5 Sonnet upped its predecessor's score by an impressive 10%, from 36% to 46%. These scores represent a milestone in providing Claude with agentic capabilities and are also quite relevant for the debuted public beta of the computer use API.

Anthropic's computer use API lets Claude 3.5 Sonnet analyze screenshots to interact with applications, moving a cursor around screens, clicking on appropriate elements, and inputting text through a virtual keyboard. According to Anthropic, since so much human work happens using computers, providing Claude with computer use will unlock many novel use cases. In OSWorld, an evaluation that tests models' computer use capabilities, Claude 3.5 Sonnet scored 14.9%, nearly doubling the score of the next best system, 7.7% but still quite far from the human baseline (70-75%).

Although Claude is still far from using a computer like a human, the company noted its mistakes were sometimes amusingly human-like. Reportedly, the model accidentally stopped a screen recording, which caused the loss of the recorded materials. On another occasion, Claude "took a break" from its coding demo and started going over pictures of Yellowstone National Park. Anthropic reports it found no indication that computer use required stronger safety and security measures than those in place for Claude 3.5 Sonnet, which Anthropic places at ASL-2 according to its Responsible Scaling Policy.

Finally, the company announced Thursday it has launched a data analysis tool for Claude.ai, which allows the AI-powered assistant to write and run JavaScript code in a coding sandbox within Claude.ai. The new feature lets Claude assist users with new tasks, including complex math, data processing, and analysis. This means that Claude's answers leveraging the analysis tool will be reproducible and accurate, lending an additional measure of trustworthiness to any insights unearthed from asking Claude to go over large quantities of data. The analysis tool is already available to all Claude.ai users as a feature preview.

by Ellie Ramirez-Camara

Updated October 25, 2024

Subscribe to Our Newsletter

Anthropic's three major announcements: new models, computer use API, and an analysis tool

The massive DeepSeek V3 rivals many of the latest openly available models

Google is reportedly testing Gemini against Anthropic's Claude

Coralogix acquired Aporia to offer a comprehensive observability platform to users

Mindgard, a Lancaster University spinoff, raised $8M to expand its presence in the US

Weekly AI Highlights Review: December 17–25

Data Phoenix Digest

Read More

The massive DeepSeek V3 rivals many of the latest openly available models

Google is reportedly testing Gemini against Anthropic's Claude

Coralogix acquired Aporia to offer a comprehensive observability platform to users

Mindgard, a Lancaster University spinoff, raised $8M to expand its presence in the US