Perceiving Systems, Computer Vision

VOCA: Capture, Learning, and Synthesis of 3D Speaking Styles

2019-05-01


VOCA (Voice Operated Character Animation) is a framework that takes a speech signal as input and realistically animates a wide range of adult faces.

Code: We provide Python demo code that outputs a 3D head animation given a speech signal and a static 3D head mesh. The codebase further provides animation control to alter the speaking style, identity-dependent facial shape, and head pose (i.e. head rotation around the neck) during animation. The code further demonstrates how to sample 3D head meshes from the publicly available FLAME model, that can then be animated  with the provided code.

Dataset: We capture a unique 4D face dataset (VOCASET) with about 29 minutes of 3D scans captured at 60 fps and synchronized audio from 12 speakers. We provide the raw 3D scans, registrations in FLAME topology, and unposed registrations (i.e. registrations in "zero pose").

Author(s): Daniel Cudeiro and Timo Bolkart and Cassidy Laidlaw and Anurag Ranjan and Michael Black
Department(s): Perceiving Systems
Authors: Daniel Cudeiro and Timo Bolkart and Cassidy Laidlaw and Anurag Ranjan and Michael Black
Release Date: 2019-05-01
License: The MIT License (MIT)
Copyright: Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V.
Repository: https://github.com/TimoBolkart/voca
External Link: https://voca.is.tue.mpg.de