We view computer vision as the process of inferring the causes behind the images that we observe; that is, we want to infer the story behind the picture. The most interesting stories involve people. Consequently, our research focuses on understanding humans and their interactions with each other and with the 3D world.
We approach this through capture, modeling, and synthesis of human behavior using computer vision, machine learning, and graphics.
We develop novel computer vision methods to capture the 3D movement of humans in complex scenes using a variety of input sources ranging from motion capture markers, IMU data, multi-view camera systems, RGB-D sensors, and monocular video. Our focus is on recovering the detailed body pose, facial expression, and hand configuration together with the structure of the 3D scene to model contact and interaction.
Capturing people allows us to model them and to create virtual humans. We learn generative models of 3D body shape and appearance using traditional representations like 3D meshes as well as new implict shape models. We also model how and why humans move so that we can generate realistic 3D motions conditioned on goals or language.
Finally, we synthesize virtual humans in 3D scenes such that they look and behave naturally. Here, we model the affordances of the scene and how humans interact with the world. We also exploit neural rendering to create realistic looking people.
To have an impact beyond academia, we develop applications in medicine and psychology, spin off companies, and license technology. We make most of our code and data available to the research community.