Grok just got vision: xAI announces Grok-1.5V preview

Grok-1.5V is xAI's first multimodal foundation model. Grok-1.5V shares its predecessors' text capabilities and complements them with a strong visual information processing capacity enabling the model to extract data from sources including documents, diagrams, charts, screenshots, and photographs. To evaluate Grok-1.5V, the research team at xAI developed the RealWorldQA, which measures real-world spatial understanding by asking questions that involve comparing several objects in a picture, describing an object's position, or determining true size by considering perspective. The RealWorldQA benchmark was released under a CC BY-ND 4.0 license in parallel with the Grok-1.5V preview announcement. The RealWorldQA contains 700 anonymized pictures taken from various real-world sources, including vehicles, annotated with easily verifiable question-answer pairs. Grok-1.5V will be available to early testers and existing Grok users shortly, as xAI plans to continue its delve into multimodal AI as part of its journey toward AGI.

Subscribe

Grok just got vision: xAI announces Grok-1.5V preview

Comments

Read Next

Mistral AI has updated its code generation model Codestral to make it more efficient

Evidence shows AI data centers may be hurting the power quality for homes in the US

CES 2025: Delta announced a new AI-powered assistant is coming to its app