Scaling up 3D content generation via 3D grounding for representation, data and algorithm (Talk)
Creating 3D virtual worlds will require generating diverse and high-quality 3D content that mimics the intricacies of the real 3D world. While machine learning has achieved significant success in image and video generation, its application in 3D content generation encounters fundamental challenges in the scarcity of 3D training data and increased complexities inherent in three dimensions. We approach the problem of 3D content generation by revisiting the 3D grounding for the representation, data and algorithms. First, we introduce a differentiable 3D representation that bridges neural fields with meshes via differentiable isosurfacing. This enables us not only to generate 3D meshes with varying topologies but also to regularize neural fields through the mesh. Second, we exploit 2D data prior to facilitating text-to-3D generation with a coarse-to-fine generation recipe. Specifically, we bring our differentiable isosurfacing to extract 3D meshes and differentiably render high-resolution images, which enables the generation of high-frequency details in geometry and textures from the text. Lastly, we develop a 3D generative algorithm that can generate high-quality meshes with textures by enforcing a 3D bottleneck in the generation process while supervising 2D images through differentiable rendering.
Biography: Jun Gao is a PhD student at the University of Toronto, advised by Prof. Sanja Fidler. He is also a Research Scientist at NVIDIA Toronto AI lab. His research interests focus on the intersection of 3D computer vision and computer graphics, particularly developing machine learning tools to facilitate large-scale and high-quality 3D content generation and drive real-world applications. His technical contributions have been deployed in products such as NVIDIA Picasso, GANVerse3D, Neural DriveSim and Toronto Annotation Suite.