Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn't arrive within 3 minutes, check your spam folder.

Ok, Thanks

Google DeepMind's Genie 2 creates minute-long playable 3D environments

Google DeepMind's Genie 2 is an advanced AI world model that can generate interactive 3D environments from a single image, with capabilities for simulating virtual worlds, object interactions, and character animations. Genie 2 is primarily intended to create training environments for AI agents.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Google DeepMind's Genie 2 creates minute-long playable 3D environments
Credit: Google DeepMind

Shortly after World Labs showcased the outputs of its AI system that generates interactive 3D scenes from 2D images, Google DeepMind announced Genie 2, a world model capable of generating playable 3D environments from a single image input. Genie 2 is primarily meant as a tool to generate rich and diverse training environments for embodied and AI agents, mitigating an important bottleneck slowing research into AI capabilities down. The research team behind Genie 2 also acknowledges the model could eventually become the basis for creative work involving prototyping interactive scenarios.

Genie 2 is a world model designed to simulate virtual worlds and the consequences of taking actions within them. It was trained on video data, and its training made it possible for Genie 2 to display some additional capabilities, such as modeling object interactions, character animations, physics, and a degree of predictive power. To generate the showcased outputs, the research team prompted Genie 2 with an image generated using Google DeepMind's Imagen 3. Although the example clips span between 10 and 20 seconds, Genie 2 can maintain consistency for up to a minute.

Some of Genie 2's specific capabilities include responding to actions resulting from keyboard or mouse input, like moving a character in the right direction, the capability to generate multiple movement trajectories from a single starting frame, displaying sufficient long horizon memory to re-render parts of the world that were no longer part of the main view, and creating diverse perspective, including first-person, isometric, and third-person driving views. Some of these capabilities were tested using SIMA, Google DeepMind's instructable game-playing AI agent, created in collaboration with several game developers so it could be trained on data from multiple video games.

Ellie Ramirez-Camara profile image
by Ellie Ramirez-Camara
Updated

Data Phoenix Digest

Subscribe to the weekly digest with a summary of the top research papers, articles, news, and our community events, to keep track of trends and grow in the Data & AI world!

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Read More