Anthropic launches an initiative to fund third-party AI evaluations
Anthropic has launched an initiative to fund third-party evaluations of advanced AI models, focusing on safety assessments, capability metrics, and evaluation infrastructure to enhance transparency and safety across the AI industry.
Benchmarks and evaluations for AI have become a hot topic amid worries that they may be too easy (because of design flaws or contamination), may not accurately reflect real-world use of the technology, may miss areas that should be targeted or may be constructed in ways that do not enable efficient scalability. To change this, Anthropic has announced a new initiative to fund third-party evaluations of advanced AI models. The company is receiving applications of evaluation proposals focusing on three key areas: AI Safety Level (ASL) assessments, advanced capability and safety metrics, and infrastructure for developing evaluations.
In parallel with the initiative announcement, Anthropic has developed a guide that details its motivations and the topics it is most interested in supporting. ASL levels are Anthropic's in-house risk measure, with ASL-1 posing the lowest risk (exemplified by older models or goal-specific systems, such as a chess-playing AI), and ASL-4 representing catastrophic misuse potential and higher autonomy levels. The company places Claude and current LLMs at ASL-2 because they do not appear autonomous and cannot provide reliable information on dangerous topics. Moreover, if they can comply with harmful information requests, LLMs do not provide information that can't already be found in a textbook or search engine.
Following this starting point, Anthropic will prioritize ASL evaluation topics including cybersecurity; chemical, biological, radiological, and nuclear (CBRN) risks; model autonomy; national security; social manipulation; and misalignment. In terms of advanced capabilities and security metrics, Anthropic is particularly interested in evaluations that can effectively measure graduate-level scientific knowledge, multilingual capabilities, and societal impacts of AI systems.
The company also seeks to support the development of tools that streamline the creation of high-quality evaluations, including no-code platforms for subject matter experts, model grading evaluations, and uplift trials to measure the impact of models within a group of people.
To support the proposed evaluations' design, Anthropic has outlined several principles, highlighting the need for difficulty, novelty, efficiency, and expert involvement in the development process. The company encourages departing from the multiple-choice format and favoring high task volumes (where possible) while stressing the importance of expert baselines and realistic threat modeling for safety-related assessments.
Interested parties can submit proposals through Anthropic's application form, with funding options available based on project needs and stages.