xAI is beta-testing two new models and adding image-generation capabilities to its chatbot

xAI recently shared the beta-test launch of Grok-2, its latest frontier model, alongside a smaller version called Grok-2 mini. The models are currently available on 𝕏 for Premium and Premium+ subscribers, with plans to expand access via xAI's enterprise API soon. xAI's blogpost highlights that an early version of Grok-2, codenamed "sus-column-r", performed remarkably on the LMSYS Chatbot Arena, reaching the #3 spot overall (tied with GPT-4o, and surpassing GPT-4o mini and Claude 3.5 Sonnet), and achieving top 5 scores on Coding (#2), Hard Prompts (#4), and Math (#2).

xAI also remarked that their internal evaluation procedure, which is very similar to the one for the LMSYS Chatbot Arena, focused on testing instruction following capabilities and the factuality and truthfulness of the chatbot's replies. The company also reports that Grok-2 shows improved reasoning from retrieved content and tool use. As expected, Grok-2 and Grok-2 mini were also evaluated against the industry's most popular benchmarks, showing that the models' performance competes with other proprietary foundation models, like OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Meta's Llama 3 405B.

As has been announced for some time now, xAI's models integrate real-time information from 𝕏, which is quite obviously one of the contributing factors for the Gork chatbot's propensity to spread misinformation. Five US Secretaries of State recently penned a letter urging the company to take measures against Grok's tendencies to generate misinformation after the assistant claimed that the ballot deadline had passed for some US states, thus casting doubts on the legitimacy of Vice President Kamala Harris's candidacy. It took 𝕏 over a week to address this specific incident, confirming the platform's laissez-faire attitude towards content moderation.

On that same note, the bit about the Grok-2 models that stole the spotlight has nothing to do with their impressive performance but with the fact that the AI-powered assistant is now leveraging the Black Forest Labs' FLUX.1 models to offer image generation capabilities with very relaxed guardrails. Grok can, for instance, create very controversial images of current political figures (and then some). As of today, neither xAI nor 𝕏 have commented on the matter, as the platform appears to remain intent on not offering any kind of substantial content moderation.