Full-body avatars from single images and textual guidance (Talk)
- Yangyi Huang (master)
- the State Key Lab of CAD&CG, Zhejiang University
The reconstruction of full body appearance of clothed humans from single-view RGB images is a crucial yet challenging task, primarily due to depth ambiguities and the absence of observations from unseen regions. While existing methods have shown impressive results, they still suffer from limitations such as over-smooth surfaces and blurry textures, particularly lacking details at the backside of the avatar. In this talk, I will delve into how we have addressed these limitations by leveraging text guidance and pretrained text-image models, introducing two novel methods. Firstly, I will present ELICIT, a data-efficient approach that utilizes a SMPL-based human body prior and CLIP-based semantic prior to create an animatable human nerf from a single image. This method tackles the challenges of creating detailed back-side appearance by a CLIP embedding loss. Secondly, I will introduce TeCH, our latest project for reconstructing high-fidelity 3D clothed humans with consistent texture maps and detailed geometry. This approach employs a hybrid mesh representation and pretrained 2D text-to-image diffusion models to achieve remarkable results. Through these advancements, we aim to push the boundaries of creating digital human, bridging the gap between single-image inputs and the creation of fully textured and realistic 3D avatars.
Biography: Yangyi is a Master student at the State Key Lab of CAD&CG, Zhejiang University, under the supervision of Prof. Deng Cai. His research interests lie in the areas of digital humans, 3D generative modeling, and computer vision. Currently, his focus is on utilizing pretrained text-to-image diffusion models to create 3D avatars with full body textures and geometry from single-image inputs and textual information. He completed his undergraduate studies at the Computer Science and Technology department of Zhejiang University.