Google is reportedly testing Gemini against Anthropic's Claude
Google contractors have been reportedly asked to compare and score Gemini's responses against Claude's using criteria including truthfulness and verbosity. A spokesperson for Google DeepMind confirmed the company was only using Claude for model output comparison, a standard industry practice.
According to recent reports, Google contractors are being tasked with comparing Gemini's responses to Claude's against each other and scoring them on criteria including truthfulness and verbosity. The reports are based on internal documentation and model outputs seen by TechCrunch. In the correspondence, contractors note the increasing number of references to Anthropic's AI assistant Claude. Moreover, at least one of the model outputs obtained by TechCrunch explicitly states “I am Claude, created by Anthropic.”
Anthropic is known for prioritizing safety. The startup often positions itself as a more ethical and safety-focused alternative to competitors, including OpenAI and Google itself. Contractors note in the correspondence that in many cases, Claude will refuse to answer a query that other assistants will not consider unsafe, ranging from pretending to be a different AI assistant to cases where Claude wouldn't answer and Gemini's response had to be flagged as a severe safety violation for involving nudity and bondage.
TechCrunch could not confirm whether Google asked for Anthropic's authorization before performing these tests. Anthropic's commercial terms of service forbid using Claude to train rival products, services, and AI models. Google DeepMind spokesperson Shira McNamara emphasized that Claude was not being used in any capacity to train Gemini. Rather, the company leverages Claude for evaluation purposes, using processes such as comparing model outputs, a standard industry practice.