Grok just got vision: xAI announces Grok-1.5V preview
Grok-1.5V is xAI's first multimodal foundation model. Grok-1.5V shares its predecessors' text capabilities and complements them with a strong visual information processing capacity enabling the model to extract data from sources including documents, diagrams, charts, screenshots, and photographs. To evaluate Grok-1.5V, the research team at xAI developed the RealWorldQA, which measures real-world spatial understanding by asking questions that involve comparing several objects in a picture, describing an object's position, or determining true size by considering perspective. The RealWorldQA benchmark was released under a CC BY-ND 4.0 license in parallel with the Grok-1.5V preview announcement. The RealWorldQA contains 700 anonymized pictures taken from various real-world sources, including vehicles, annotated with easily verifiable question-answer pairs. Grok-1.5V will be available to early testers and existing Grok users shortly, as xAI plans to continue its delve into multimodal AI as part of its journey toward AGI.