Diffusion Models for Human Motion Synthesis (Talk)
Character motion synthesis stands as a central challenge in computer animation and graphics. The successful adaptation of diffusion models to the field boosted synthesis quality and provided intuitive controls such as text and music. One of the earliest and most popular methods to do so is Motion Diffusion Model (MDM) [ICLR 2023]. In this talk, I will review how MDM incorporates domain know-how into the diffusion model and enables intuitive editing capabilities. Then, I will present two recent works, each suggesting a refreshing take on motion diffusion and extending its abilities to new animation tasks. Multi-view Ancestral Sampling (MAS) [CVPR 2024] is an inference time algorithm that samples 3D animations from 2D keypoint diffusion models. We demonstrated it by generating 3D animations for characters and scenarios that are challenging to record in elaborate motion capture systems, yet vastly ubiquitous on in-the-wild videos. These include for example horse racing and professional rhythmic gymnastics motions. Monkey See, Monkey Do (MoMo) [SIGGRAPH Asia 2024] explores the attention space of the motion diffusion model. A careful analysis shows the roles of the attention’s keys and queries through the generation process. With these findings in hand, we design a training-free method that generates motion following the distinct motifs of one motion while led by an outline dictated by another motion. To conclude the talk, I will give my modest take on the challenges in the fields and our lab’s current work attempting to tackle some of them.
Biography: Guy Tevet is a Ph.D. student in the Computer Graphics lab at Tel-Aviv University, advised by Prof. Amit Bermano and Prof. Daniel Cohen-Or. His main field of study is human motion generation. In the summer of 2023, he was a visiting research student at the University of British Colombia advised by Prof. Michiel Van de Panne and Dr. Xue Bin (Jason) Peng (Simon Fraser University). He was a research intern at Google during 2020-2021 in natural language processing, and a computer vision researcher at Apple during 2018-2020.