Recently, world foundation models have emerged alongside agentic AI as the next big thing in the industry. Recently, both the startup World Labs and Google DeepMind previewed the fruits of their labor: a 3D scene generator in World Labs' case, and DeepMind's Genie 2. This Monday during Nvidia's CES 2025 keynote, CEO Jensen Huang revealed Nvidia would launch and share the Cosmos platform—an end-to-end world foundation model (WFM) platform including models, advanced tokenizers, guardrails and an accelerated video processing pipeline—under a permissive license.
With the Cosmos platform, Nvidia aims to help physical AI developers mitigate the costs associated with the vast amounts of real-world data and extensive training required by, for instance, robotics and autonomous vehicle development. To do this, Nvidia developed the key element of the platform, the Cosmos family of world foundation models, by training it on 9,000 trillion tokens, which include 20 million hours of autonomous driving, robotics, and other related domains. The resulting Cosmos models can process images, text and video to generate virtual environments for physical AI development and testing.
The Cosmos models use autoregressive and diffusion architectures, and come in three sizes to cover a variety of use cases. Nano, the smallest, are meant for low-latency, real time applications; the Super size category balances size and performance; while Ultra is all about delivering the highest quality, proportional to the bigger size of the models in this category. In addition to the model family, the Cosmos platform includes a two-stage guardrail system that first blocks harmful keywords and prompts, and then scans generated content for unsafe frames and blurs any human faces present in the videos. Additionally, Nvidia says the outputs of its models are watermarked for identification as AI-generated content.
Further elements encompass the NeMo Curator to accelerate the data processing and curation pipeline, and the Cosmos Tokenizer, which Nvidia claims delivers significantly more compression and faster processing than the alternatives. The Cosmos world foundation models can be fine-tuned on proprietary data using its NeMo Framework. According to Nvidia, leading robotics and AV firms including 1X, Agile Robots, Agility, Figure AI, Foretellix, Uber, Wayve, Waabi and XPENG are among the early adopters of the Cosmos platform.
Comments