Papers

Scalable Diffusion Models with Transformers Post feature image

Scalable Diffusion Models with Transformers

In this work, the researchers explore a new class of diffusion models based on the transformer architecture; train latent diffusion models, replacing the U-Net backbone with a transformer that operates on latent patches; and analyze the scalability of Diffusion Transformers (DiTs).

NeRF-Art: Text-Driven Neural Radiance Fields Stylization Post feature image

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

Neural radiance fields (NeRF) enable high-quality novel view synthesis. Editing NeRF, however, remains challenging. In this paper, the authors present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a single text prompt.

ECON: Explicit Clothed humans Obtained from Normals Post feature image

ECON: Explicit Clothed humans Obtained from Normals

ECON combines the best aspects of implicit and explicit surfaces to infer high-fidelity 3D humans, even with loose clothing or in challenging poses. ECON is more accurate than the state of the art. Perceptual studies also show that ECON’s perceived realism is better by a large margin.

Novel View Synthesis with Diffusion Models Post feature image

Novel View Synthesis with Diffusion Models

3DiM is a diffusion model for 3D novel view synthesis from as few as a single image. Comparing it to the SRN ShapeNet dataset, it is clear that 3DiM's generated videos from a single view achieve much higher fidelity while being approximately 3D consistent.

RANA: Relightable Articulated Neural Avatars Post feature image

RANA: Relightable Articulated Neural Avatars

RANA is a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. It is a novel framework to model humans while disentangling their geometry, texture, and lighting environment from monocular RGB videos.