DeepMind's Genie 3: A real-time world model for training AI agents

Genie 3 is a real-time world model that generates 3D environments that remain coherent for minutes at a time and can be customized using "promptable world events. DeepMind has highlighted Genie 3's potential for creating new training and education opportunities for a wide variety of agents.

Google DeepMind has unveiled Genie 3, a groundbreaking general-purpose world model that can generate 3D environments at 720p resolution and 24 frames per second, and retain the environment's consistency for several minutes. Among other potential use cases, DeepMind says that Genie 3 can be used to, among other things:

model the physical properties of the world and their interactions;
simulate natural environments and their inhabitants, like animal and plant life;
create coherent fantasy worlds and animated characters; and
explore geographical and historical landmarks.

Consistent generations without the help of reference material

Genie 3's greatest innovation may be its ability to generate consistent environments for several minutes. DeepMind has called this an "emergent capability", but this is not because consistency is something the model came up with on its own. Rather, what DeepMind means is that, unlike other techniques for 3D environment rendering such as Gaussian Splatting or NeRFs, Genie 3 does not require a previous input (often in video format) to use as a reference for its generations. Rather, Genie 3 generates its environments frame by frame, with the contents fully determined by the prompt instructions and the model's visual memory, which can span up to a minute ago.

Modifying generated worlds on the fly

Genie 2 was largely limited to navigational instructions: its generated environments could respond to keyboard or mouse input and translate it to movement, predict multiple possible trajectories from a single starting point, and provide different viewpoints (first-person, third-person, isometric) for the same setting.

Another important breakthrough for Genie 3 is the introduction of "promptable world events", which enable users to modify an already generated environment by, for instance, changing the weather conditions or introducing new objects. According to DeepMind, promptable world events can be used to create meaningful variations (what-ifs, counterfactuals) on the same scene, potentially enabling agents to be trained on how to handle uncertainty or unexpected events.

To test some of Genie 3's potential, DeepMind says it instructed a version of SIMA, its generalist AI agent for 3D environments, to attain a list of goals that only required navigational instructions within a Genie 3-generated environment. Since Genie 3 had no information on the agent's goals, it had to adapt its generations to the navigational commands it received in real-time.

An encouraging starting point for a critical technology

Genie 3 is currently available as a research preview for a select group of researchers and creators only. DeepMind acknowledges the technology has a long way to go, including the need to overcome certain limitations, like the brevity of the current generations, the lack of breadth in the kinds of actions that agents can undertake, and the difficulties associated with modeling multi-agent environments where agents can interact not only with the environment, but also among themselves. Still, the lab views Genie 3 as an exciting development in a technology they view as essential for the development of artificial general intelligence.

Subscribe

DeepMind's Genie 3: A real-time world model for training AI agents

Consistent generations without the help of reference material

Modifying generated worlds on the fly

An encouraging starting point for a critical technology

Comments

Read Next

Yann LeCun launches world model startup AMI Labs, targets multi-billion dollar valuation

Cursor acquires code review startup Graphite

Ai2 launches Molmo 2, open-source multimodal models with advanced video understanding