Anthropic announces the new Claude 3 family of models
The Claude 3 model family includes three state-of-the-art text generation models: Opus, Sonnet, and Haiku. Opus is the most capable of the models, while Haiku is the fastest and most cost-effective. Opus and Sonnet are generally available in the 159 countries where the Claude API and claude.ai are available. Sonnet currently powers the free tier at claude.ai, with Opus available for Claude Pro users. Haiku will join the other two models soon.
The Claude 3 model family performs better in several text manipulation tasks in non-English languages when compared to its predecessors. Additionally, Opus achieves state-of-the-art scores in many popular benchmarking tests, thus surpassing the performance of most of its peers available to date, including GPT-4 and Gemini 1.0. In particular, Opus obtained a 90.7% (0-shot) score in the multilingual math benchmark MGSM, a vast improvement over GPT-4's 74.5% (8-shot) score. Opus also achieved an impressive improvement in the HumanEval, GPQA, and Diamond benchmarks. Finally, Sonnet and Opus are twice as fast as Claude 2 and Claude 2.1 in most workloads, with Opus being both faster and more capable at complex tasks.
The Claude 3 family also features improved and competitive vision capabilities, an indispensable feature given that many users have at least part of their knowledge bases coded in non-text formats such as PDFs, charts, diagrams, and slides. The Claude 3 models are particularly adept at reading science diagrams and answering chart Q&As, per the AI2D and test benchmarks. Another notable improvement is that Claude 3 models are less likely to refuse to answer sufficiently contextualized prompts bordering the models' guardrails. They also boast improved accuracy, reflected in an increase in correct answers and, most importantly, in a reduction of incorrect answers (hallucinations). Finally, Claude 3 models are made available with a 200K-token context window even though they can process inputs exceeding a million tokens.
Further details on the Claude 3 family benchmarking and security evaluations, specifications, and prompt methodology can be found in the Claude 3 model card. Planned updates for the Claude 3 family include function calling and interactive coding support. In addition to its availability via the API and the claude.ai web experience, Sonnet is generally available through Amazon Bedrock and via private preview on Google Cloud's Vertex AI Model Garden, with support for Opus and Haiku coming shortly.
It is worth noting that Claude 3 is being compared to models that are currently generally available, such as GPT-4 and Gemini 1.0. More recent models, such as Gemini 1.5, are included in the model card but not in the announcement, and higher scores for GPT-4 Turbo were reported by Anthropic's engineering team but not included in the Claude 3 model card.