Sophia - Data Phoenix (Page 3)

Let's build GPT: from scratch, in code, spelled out

In this video, Andrej Karpathy demonstrates how to build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3, and much more. Make sure that you watch at least parts of it!

Jan 27, 2023

by Sophia

Papers

RoDynRF: Robust Dynamic Radiance Fields

In this work, the authors address the robustness issue of dynamic radiance field reconstruction methods by jointly estimating the static and dynamic radiance fields along with the camera parameters (poses and focal length). Learn how they do it!

Jan 20, 2023

by Sophia

Papers

AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

AgileAvatar is a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters. To ensure the discrete parameters are optimized, a cascaded relaxation-and-search pipeline is implemented.

Jan 17, 2023

by Sophia

Papers

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

Box2Mask is a novel single-shot instance segmentation approach, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Check the paper out!

Jan 15, 2023

by Sophia

Papers

Zero-Shot Text-Guided Object Generation with Dream Fields

Dream Fields can generate the geometry and color of a wide range of objects without 3D supervision. It combines neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Take a look!

Jan 11, 2023

by Sophia

Papers

InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds

InstantAvatar is a system that can reconstruct human avatars from a monocular video within seconds, and these avatars can be animated and rendered at an interactive rate. It converges 130x faster and can be trained in minutes instead of hours, way faster than competitors.

Jan 06, 2023

by Sophia

Papers

Scalable Diffusion Models with Transformers

In this work, the researchers explore a new class of diffusion models based on the transformer architecture; train latent diffusion models, replacing the U-Net backbone with a transformer that operates on latent patches; and analyze the scalability of Diffusion Transformers (DiTs).

Jan 03, 2023

by Sophia

Papers

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

Neural radiance fields (NeRF) enable high-quality novel view synthesis. Editing NeRF, however, remains challenging. In this paper, the authors present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a single text prompt.

Dec 30, 2022

by Sophia

Papers

ECON: Explicit Clothed humans Obtained from Normals

ECON combines the best aspects of implicit and explicit surfaces to infer high-fidelity 3D humans, even with loose clothing or in challenging poses. ECON is more accurate than the state of the art. Perceptual studies also show that ECON’s perceived realism is better by a large margin.

Dec 27, 2022

by Sophia

Papers

3DHumanGAN: Towards Photo-realistic 3D-Aware Human Image Generation

3DHumanGAN is a 3D-aware generative adversarial network (GAN) that synthesizes images of full-body humans with consistent appearances under different view-angles and body-poses. The model is adversarially learned from a collection of web images needless of manual annotation.

Dec 26, 2022

by Sophia

Papers

Novel View Synthesis with Diffusion Models

3DiM is a diffusion model for 3D novel view synthesis from as few as a single image. Comparing it to the SRN ShapeNet dataset, it is clear that 3DiM's generated videos from a single view achieve much higher fidelity while being approximately 3D consistent.

Dec 23, 2022

by Sophia

Papers

UDE: A Unified Driving Engine for Human Motion Generation

Unified Driving Engine (UDE) is the first unified driving engine that enables generating human motion sequences from natural language or audio sequences. It can support both text-driven and audio-driven human motion generation.

Dec 21, 2022

by Sophia