News

Uninterrupted diffusion with Imagen Video

The Google Brain team has created the best Text-to-Video solution, Imagen Video. It is an AI system capable of generating video clips based on a text query. The text-based video diffusion model can generate videos at up to 1280×768 resolution at 24 frames per second. 0:00/1× Given

by Dmitry Spodarets

Updated October 13, 2022

Uninterrupted diffusion with Imagen Video

0:00

Given a text-based query, Imagen Video generates high-definition video using a basic video generation model and a sequence of alternating spatial and temporal superresolution video models.

Imagen Video is a so-called "diffusion" model, which consists of a text encoder (frozen T5-XXL), a basic video diffusion model, and alternating spatial and temporal superresolution diffusion models. It generates new data (e.g., video) by learning how to "break down" and "restore" multiple existing data samples.

0:00

A particular development feature is Video U-Net, a video-unet architecture whose spatial operations are performed independently on frames with common parameters (batch x time, height, width, channels), while temporal operations work already on the entire 5-dimensional tensor (batch, time, height, width, channels).

Not only is Imagen Video capable of generating video with high fidelity, but it also has a high degree of control and knowledge of the world, including the ability to generate a variety of video and text animations in a variety of artistic styles and with a 3D understanding of objects.

0:00

Imagen Video is based on Google's Imagen, an image generation system comparable to DALL-E 2, which was previously reported to have been taken off the beta waiting list, and users can now start using it at any time.

by Dmitry Spodarets

Updated October 13, 2022

Subscribe to Our Newsletter

Uninterrupted diffusion with Imagen Video

Mistral AI released Mistral Large 2, a multilingual, tool use-capable, open model of its own

The FTC is gathering information on surveillance pricing products and services

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet

Adobe introduced new Firefly AI-powered features for Photoshop and Illustrator

Data Phoenix Digest

Read More

Mistral AI released Mistral Large 2, a multilingual, tool use-capable, open model of its own

The FTC is gathering information on surveillance pricing products and services

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet