Perceiving Systems, Computer Vision


2024


{PuzzleAvatar}: Assembling 3D Avatars from Personal Albums
PuzzleAvatar: Assembling 3D Avatars from Personal Albums

Xiu, Y., Liu, Z., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Abstract
Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose.

Page Code Video DOI [BibTex]

2024

Page Code Video DOI [BibTex]


{StableNormal}: Reducing Diffusion Variance for Stable and Sharp Normal
StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

Ye, C., Qiu, L., Gu, X., Zuo, Q., Wu, Y., Dong, Z., Bo, L., Xiu, Y., Han, X.

ACM Transactions on Graphics, 43(6), ACM, December 2024 (article) To be published

Abstract
This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available.

Page Huggingface Demo Code Video DOI [BibTex]

Page Huggingface Demo Code Video DOI [BibTex]


Localization and recognition of human action in {3D} using transformers
Localization and recognition of human action in 3D using transformers

Sun, J., Huang, L., Hongsong Wang, C. Z. J. Q., Islam, M. T., Xie, E., Zhou, B., Xing, L., Chandrasekaran, A., Black, M. J.

Nature Communications Engineering , 13(125), September 2024 (article)

Abstract
Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.

paper DOI [BibTex]

paper DOI [BibTex]


EarthRanger: An Open-Source Platform for Ecosystem Monitoring, Research, and Management
EarthRanger: An Open-Source Platform for Ecosystem Monitoring, Research, and Management

Wall, J., Lefcourt, J., Jones, C., Doehring, C., O’Neill, D., Schneider, D., Steward, J., Krautwurst, J., Wong, T., Jones, B., Goodfellow, K., Schmitt, T., Gobush, K., Douglas-Hamilton, I., Pope, F., Schmidt, E., Palmer, J., Stokes, E., Reid, A., Elbroch, M. L., Kulits, P., Villeneuve, C., Matsanza, V., Clinning, G., Oort, J. V., Denninger-Snyder, K., Daati, A. P., Gold, W., Cunliffe, S., Craig, B., Cork, B., Burden, G., Goss, M., Hahn, N., Carroll, S., Gitonga, E., Rao, R., Stabach, J., Broin, F. D., Omondi, P., Wittemyer, G.

Methods in Ecology and Evolution, 13, British Ecological Society, September 2024 (article)

DOI [BibTex]

DOI [BibTex]


Re-Thinking Inverse Graphics with Large Language Models
Re-Thinking Inverse Graphics with Large Language Models

Kulits, P., Feng, H., Liu, W., Abrevaya, V., Black, M. J.

Transactions on Machine Learning Research, August 2024 (article)

Abstract
Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the environment. This complexity limits the ability of existing carefully engineered approaches to generalize across domains. Inspired by the zero-shot ability of large language models (LLMs) to generalize to novel contexts, we investigate the possibility of leveraging the broad world knowledge encoded in such models to solve inverse-graphics problems. To this end, we propose the Inverse-Graphics Large Language Model (IG-LLM), an inverse-graphics framework centered around an LLM, that autoregressively decodes a visual embedding into a structured, compositional 3D-scene representation. We incorporate a frozen pre-trained visual encoder and a continuous numeric head to enable end-to-end training. Through our investigation, we demonstrate the potential of LLMs to facilitate inverse graphics through next-token prediction, without the application of image-space supervision. Our analysis enables new possibilities for precise spatial reasoning about images that exploit the visual knowledge of LLMs. We release our code and data at https://ig-llm.is.tue.mpg.de/ to ensure the reproducibility of our investigation and to facilitate future research.

link (url) [BibTex]

link (url) [BibTex]


Exploring Weight Bias and Negative Self-Evaluation in Patients with Mood Disorders: Insights from the {BodyTalk} Project,
Exploring Weight Bias and Negative Self-Evaluation in Patients with Mood Disorders: Insights from the BodyTalk Project,

Meneguzzo, P., Behrens, S. C., Pavan, C., Toffanin, T., Quiros-Ramirez, M. A., Black, M. J., Giel, K., Tenconi, E., Favaro, A.

Frontiers in Psychiatry, 15, Sec. Psychopathology, May 2024 (article)

Abstract
Background: Negative body image and adverse body self-evaluation represent key psychological constructs within the realm of weight bias (WB), potentially intertwined with the negative self-evaluation characteristic of depressive symptomatology. Although WB encapsulates an implicit form of self-critical assessment, its exploration among people with mood disorders (MD) has been under-investigated. Our primary goal is to comprehensively assess both explicit and implicit WB, seeking to reveal specific dimensions that could interconnect with the symptoms of MDs. Methods: A cohort comprising 25 MD patients and 35 demographically matched healthy peers (with 83% female representation) participated in a series of tasks designed to evaluate the congruence between various computer-generated body representations and a spectrum of descriptive adjectives. Our analysis delved into multiple facets of body image evaluation, scrutinizing the associations between different body sizes and emotionally charged adjectives (e.g., active, apple-shaped, attractive). Results: No discernible differences emerged concerning body dissatisfaction or the correspondence of different body sizes with varying adjectives. Interestingly, MD patients exhibited a markedly higher tendency to overestimate their body weight (p = 0.011). Explicit WB did not show significant variance between the two groups, but MD participants demonstrated a notable implicit WB within a specific weight rating task for BMI between 18.5 and 25 kg/m2 (p = 0.012). Conclusions: Despite the striking similarities in the assessment of participants’ body weight, our investigation revealed an implicit WB among individuals grappling with MD. This bias potentially assumes a role in fostering self-directed negative evaluations, shedding light on a previously unexplored facet of the interplay between WB and mood disorders.

paper paper link (url) DOI [BibTex]

paper paper link (url) DOI [BibTex]


The Poses for Equine Research Dataset {(PFERD)}
The Poses for Equine Research Dataset (PFERD)

Li, C., Mellbin, Y., Krogager, J., Polikovsky, S., Holmberg, M., Ghorbani, N., Black, M. J., Kjellström, H., Zuffi, S., Hernlund, E.

Nature Scientific Data, 11, May 2024 (article)

Abstract
Studies of quadruped animal motion help us to identify diseases, understand behavior and unravel the mechanics behind gaits in animals. The horse is likely the best-studied animal in this aspect, but data capture is challenging and time-consuming. Computer vision techniques improve animal motion extraction, but the development relies on reference datasets, which are scarce, not open-access and often provide data from only a few anatomical landmarks. Addressing this data gap, we introduce PFERD, a video and 3D marker motion dataset from horses using a full-body set-up of densely placed over 100 skin-attached markers and synchronized videos from ten camera angles. Five horses of diverse conformations provide data for various motions from basic poses (eg. walking, trotting) to advanced motions (eg. rearing, kicking). We further express the 3D motions with current techniques and a 3D parameterized model, the hSMAL model, establishing a baseline for 3D horse markerless motion capture. PFERD enables advanced biomechanical studies and provides a resource of ground truth data for the methodological development of markerless motion capture.

paper [BibTex]

paper [BibTex]


{InterCap}: Joint Markerless {3D} Tracking of Humans and Objects in Interaction from Multi-view {RGB-D} Images
InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction from Multi-view RGB-D Images

Huang, Y., Taheri, O., Black, M. J., Tzionas, D.

International Journal of Computer Vision (IJCV), 2024 (article)

Abstract
Humans constantly interact with objects to accomplish tasks. To understand such interactions, computers need to reconstruct these in 3D from images of whole bodies manipulating objects, e.g., for grasping, moving and using the latter. This involves key challenges, such as occlusion between the body and objects, motion blur, depth ambiguities, and the low image resolution of hands and graspable object parts. To make the problem tractable, the community has followed a divide-and-conquer approach, focusing either only on interacting hands, ignoring the body, or on interacting bodies, ignoring the hands. However, these are only parts of the problem. On the contrary, recent work focuses on the whole problem. The GRAB dataset addresses whole-body interaction with dexterous hands but captures motion via markers and lacks video, while the BEHAVE dataset captures video of body-object interaction but lacks hand detail. We address the limitations of prior work with InterCap, a novel method that reconstructs interacting whole-bodies and objects from multi-view RGB-D data, using the parametric whole-body SMPL-X model and known object meshes. To tackle the above challenges, InterCap uses two key observations: (i) Contact between the body and object can be used to improve the pose estimation of both. (ii) Consumer-level Azure Kinect cameras let us set up a simple and flexible multi-view RGB-D system for reducing occlusions, with spatially calibrated and temporally synchronized cameras. With our InterCap method we capture the InterCap dataset, which contains 10 subjects (5 males and 5 females) interacting with 10 daily objects of various sizes and affordances, including contact with the hands or feet. To this end, we introduce a new data-driven hand motion prior, as well as explore simple ways for automatic contact detection based on 2D and 3D cues. In total, InterCap has 223 RGB-D videos, resulting in 67,357 multi-view frames, each containing 6 RGB-D images, paired with pseudo ground-truth 3D body and object meshes. Our InterCap method and dataset fill an important gap in the literature and support many research directions. Data and code are available at https://intercap.is.tue.mpg.de.

Paper link (url) DOI [BibTex]


{HMP}: Hand Motion Priors for Pose and Shape Estimation from Video
HMP: Hand Motion Priors for Pose and Shape Estimation from Video

Duran, E., Kocabas, M., Choutas, V., Fan, Z., Black, M. J.

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024 (article)

Abstract
Understanding how humans interact with the world necessitates accurate 3D hand pose estimation, a task complicated by the hand’s high degree of articulation, frequent occlusions, self-occlusions, and rapid motions. While most existing methods rely on single-image inputs, videos have useful cues to address aforementioned issues. However, existing video-based 3D hand datasets are insufficient for training feedforward models to generalize to in-the-wild scenarios. On the other hand, we have access to large human motion capture datasets which also include hand motions, e.g. AMASS. Therefore, we develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. This motion prior is then employed for video-based 3D hand motion estimation following a latent optimization approach. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. It produces stable, temporally consistent results that surpass conventional single-frame methods. We demonstrate our method’s efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets, with special emphasis on an occlusion-focused subset of HO3D.

webpage pdf code [BibTex]

webpage pdf code [BibTex]

2023


{FLARE}: Fast learning of Animatable and Relightable Mesh Avatars
FLARE: Fast learning of Animatable and Relightable Mesh Avatars

Bharadwaj, S., Zheng, Y., Hilliges, O., Black, M. J., Abrevaya, V. F.

ACM Transactions on Graphics, 42(6):204:1-204:15, December 2023 (article) Accepted

Abstract
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems. While 3D meshes enable efficient processing and are highly portable, they lack realism in terms of shape and appearance. Neural representations, on the other hand, are realistic but lack compatibility and are slow to train and render. Our key insight is that it is possible to efficiently learn high-fidelity 3D mesh representations via differentiable rendering by exploiting highly-optimized methods from traditional computer graphics and approximating some of the components with neural networks. To that end, we introduce FLARE, a technique that enables the creation of animatable and relightable mesh avatars from a single monocular video. First, we learn a canonical geometry using a mesh representation, enabling efficient differentiable rasterization and straightforward animation via learned blendshapes and linear blend skinning weights. Second, we follow physically-based rendering and factor observed colors into intrinsic albedo, roughness, and a neural representation of the illumination, allowing the learned avatars to be relit in novel scenes. Since our input videos are captured on a single device with a narrow field of view, modeling the surrounding environment light is non-trivial. Based on the split-sum approximation for modeling specular reflections, we address this by approximating the pre-filtered environment map with a multi-layer perceptron (MLP) modulated by the surface roughness, eliminating the need to explicitly model the light. We demonstrate that our mesh-based avatar formulation, combined with learned deformation, material, and lighting MLPs, produces avatars with high-quality geometry and appearance, while also being efficient to train and render compared to existing approaches.

Paper Project Page Code DOI [BibTex]

2023

Paper Project Page Code DOI [BibTex]


From Skin to Skeleton: Towards Biomechanically Accurate {3D} Digital Humans
From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans

(Honorable Mention for Best Paper)

Keller, M., Werling, K., Shin, S., Delp, S., Pujades, S., Liu, C. K., Black, M. J.

ACM Transaction on Graphics (ToG), 42(6):253:1-253:15, December 2023 (article)

Abstract
Great progress has been made in estimating 3D human pose and shape from images and video by training neural networks to directly regress the parameters of parametric human models like SMPL. However, existing body models have simplified kinematic structures that do not correspond to the true joint locations and articulations in the human skeletal system, limiting their potential use in biomechanics. On the other hand, methods for estimating biomechanically accurate skeletal motion typically rely on complex motion capture systems and expensive optimization methods. What is needed is a parametric 3D human model with a biomechanically accurate skeletal structure that can be easily posed. To that end, we develop SKEL, which re-rigs the SMPL body model with a biomechanics skeleton. To enable this, we need training data of skeletons inside SMPL meshes in diverse poses. We build such a dataset by optimizing biomechanically accurate skeletons inside SMPL meshes from AMASS sequences. We then learn a regressor from SMPL mesh vertices to the optimized joint locations and bone rotations. Finally, we re-parametrize the SMPL mesh with the new kinematic parameters. The resulting SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. We show that SKEL has more biomechanically accurate joint locations than SMPL, and the bones fit inside the body surface better than previous methods. By fitting SKEL to SMPL meshes we are able to “upgrade" existing human pose and shape datasets to include biomechanical parameters. SKEL provides a new tool to enable biomechanics in the wild, while also providing vision and graphics researchers with a better constrained

Project Page Paper DOI [BibTex]

Project Page Paper DOI [BibTex]


{BARC}: Breed-Augmented Regression Using Classification for {3D} Dog Reconstruction from Images
BARC: Breed-Augmented Regression Using Classification for 3D Dog Reconstruction from Images

Rueegg, N., Zuffi, S., Schindler, K., Black, M. J.

Int. J. of Comp. Vis. (IJCV), 131(8):1964–1979, August 2023 (article)

Abstract
The goal of this work is to reconstruct 3D dogs from monocular images. We take a model-based approach, where we estimate the shape and pose parameters of a 3D articulated shape model for dogs. We consider dogs as they constitute a challenging problem, given they are highly articulated and come in a variety of shapes and appearances. Recent work has considered a similar task using the multi-animal SMAL model, with additional limb scale parameters, obtaining reconstructions that are limited in terms of realism. Like previous work, we observe that the original SMAL model is not expressive enough to represent dogs of many different breeds. Moreover, we make the hypothesis that the supervision signal used to train the network, that is 2D keypoints and silhouettes, is not sufficient to learn a regressor that can distinguish between the large variety of dog breeds. We therefore go beyond previous work in two important ways. First, we modify the SMAL shape space to be more appropriate for representing dog shape. Second, we formulate novel losses that exploit information about dog breeds. In particular, we exploit the fact that dogs of the same breed have similar body shapes. We formulate a novel breed similarity loss, consisting of two parts: One term is a triplet loss, that encourages the shape of dogs from the same breed to be more similar than dogs of different breeds. The second one is a breed classification loss. With our approach we obtain 3D dogs that, compared to previous work, are quantitatively better in terms of 2D reconstruction, and significantly better according to subjective and quantitative 3D evaluations. Our work shows that a-priori side information about similarity of shape and appearance, as provided by breed labels, can help to compensate for the lack of 3D training data. This concept may be applicable to other animal species or groups of species. We call our method BARC (Breed-Augmented Regression using Classification). Our code is publicly available for research purposes at https://barc.is.tue.mpg.de/.

On-line DOI [BibTex]

On-line DOI [BibTex]


Virtual Reality Exposure to a Healthy Weight Body Is a Promising Adjunct Treatment for Anorexia Nervosa
Virtual Reality Exposure to a Healthy Weight Body Is a Promising Adjunct Treatment for Anorexia Nervosa

Behrens, S. C., Tesch, J., Sun, P. J., Starke, S., Black, M. J., Schneider, H., Pruccoli, J., Zipfel, S., Giel, K. E.

Psychotherapy and Psychosomatics, 92(3):170-179, June 2023 (article)

Abstract
Introduction/Objective: Treatment results of anorexia nervosa (AN) are modest, with fear of weight gain being a strong predictor of treatment outcome and relapse. Here, we present a virtual reality (VR) setup for exposure to healthy weight and evaluate its potential as an adjunct treatment for AN. Methods: In two studies, we investigate VR experience and clinical effects of VR exposure to higher weight in 20 women with high weight concern or shape concern and in 20 women with AN. Results: In study 1, 90% of participants (18/20) reported symptoms of high arousal but verbalized low to medium levels of fear. Study 2 demonstrated that VR exposure to healthy weight induced high arousal in patients with AN and yielded a trend that four sessions of exposure improved fear of weight gain. Explorative analyses revealed three clusters of individual reactions to exposure, which need further exploration. Conclusions: VR exposure is a well-accepted and powerful tool for evoking fear of weight gain in patients with AN. We observed a statistical trend that repeated virtual exposure to healthy weight improved fear of weight gain with large effect sizes. Further studies are needed to determine the mechanisms and differential effects.

on-line DOI [BibTex]

on-line DOI [BibTex]


{Fast-SNARF}: A Fast Deformer for Articulated Neural Fields
Fast-SNARF: A Fast Deformer for Articulated Neural Fields

Chen, X., Jiang, T., Song, J., Rietmann, M., Geiger, A., Black, M. J., Hilliges, O.

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages: 1-15, April 2023 (article)

Abstract
Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space via iterative root finding. Fast-SNARF is a drop-in replacement in functionality to our previous work, SNARF, while significantly improving its computational efficiency. We contribute several algorithmic and implementation improvements over SNARF, yielding a speed-up of 150× . These improvements include voxel-based correspondence search, pre-computing the linear blend skinning function, and an efficient software implementation with CUDA kernels. Fast-SNARF enables efficient and simultaneous optimization of shape and skinning weights given deformed observations without correspondences (e.g. 3D meshes). Because learning of deformation maps is a crucial component in many 3D human avatar methods and since Fast-SNARF provides a computationally efficient solution, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.

pdf publisher site code DOI [BibTex]

pdf publisher site code DOI [BibTex]


{SmartMocap}: Joint Estimation of Human and Camera Motion Using Uncalibrated {RGB} Cameras
SmartMocap: Joint Estimation of Human and Camera Motion Using Uncalibrated RGB Cameras

Saini, N., Huang, C. P., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, 8(6):3206-3213, 2023 (article)

Abstract
Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this letter, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate system. Second, we learn a probability distribution of short human motion sequences (~1sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera.

publisher site link (url) DOI [BibTex]


Viewpoint-Driven Formation Control of Airships for Cooperative Target Tracking
Viewpoint-Driven Formation Control of Airships for Cooperative Target Tracking

Price, E., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, 8(6):3653-3660, 2023 (article)

Abstract
For tracking and motion capture (MoCap) of animals in their natural habitat, a formation of safe and silent aerial platforms, such as airships with on-board cameras, is well suited. In our prior work we derived formation properties for optimal MoCap, which include maintaining constant angular separation between observers w.r.t. the subject, threshold distance to it and keeping it centered in the camera view. Unlike multi-rotors, airships have non-holonomic constrains and are affected by ambient wind. Their orientation and flight direction are also tightly coupled. Therefore a control scheme for multicopters that assumes independence of motion direction and orientation is not applicable. In this letter, we address this problem by first exploiting a periodic relationship between the airspeed of an airship and its distance to the subject. We use it to derive analytical and numeric solutions that satisfy the formation properties for optimal MoCap. Based on this, we develop an MPC-based formation controller. We perform theoretical analysis of our solution, boundary conditions of its applicability, extensive simulation experiments and a real world demonstration of our control method with an unmanned airship.

publisher's site DOI [BibTex]

2022


How immersive virtual reality can become a key tool to advance research and psychotherapy of eating and weight disorders
How immersive virtual reality can become a key tool to advance research and psychotherapy of eating and weight disorders

Behrens, S. C., Streuber, S., Keizer, A., Giel, K. E.

Frontiers in Psychiatry, 13, pages: 1011620, November 2022 (article)

Abstract
Immersive virtual reality technology (VR) still waits for its wide dissemination in research and psychotherapy of eating and weight disorders. Given the comparably high efforts in producing a VR setup, we outline that the technology’s breakthrough needs tailored exploitation of specific features of VR and user-centered design of setups. In this paper, we introduce VR hardware and review the specific properties of immersive VR versus real-world setups providing examples how they improved existing setups. We then summarize current approaches to make VR a tool for psychotherapy of eating and weight disorders and introduce user-centered design of VR environments as a solution to support their further development. Overall, we argue that exploitation of the specific properties of VR can substantially improve existing approaches for research and therapy of eating and weight disorders. To produce more than pilot setups, iterative development of VR setups within a user-centered design approach is needed.

DOI [BibTex]

2022


{iRotate}: Active visual {SLAM} for omnidirectional robots
iRotate: Active visual SLAM for omnidirectional robots

Bonetto, E., Goldschmid, P., Pabst, M., Black, M. J., Ahmad, A.

Robotics and Autonomous Systems, 154, pages: 104102, Elsevier, August 2022 (article)

Abstract
In this paper, we present an active visual SLAM approach for omnidirectional robots. The goal is to generate control commands that allow such a robot to simultaneously localize itself and map an unknown environment while maximizing the amount of information gained and consuming as low energy as possible. Leveraging the robot’s independent translation and rotation control, we introduce a multi-layered approach for active V-SLAM. The top layer decides on informative goal locations and generates highly informative paths to them. The second and third layers actively re-plan and execute the path, exploiting the continuously updated map and local features information. Moreover, we introduce two utility formulations to account for the presence of obstacles in the field of view and the robot’s location. Through rigorous simulations, real robot experiments, and comparisons with state-of-the-art methods, we demonstrate that our approach achieves similar coverage results with lesser overall map entropy. This is obtained while keeping the traversed distance up to 39% shorter than the other methods and without increasing the wheels’ total rotation amount. Code and implementation details are provided as open-source and all the generated data is available online for consultation.

Code Data iRotate Data Independent Camera experiments link (url) DOI [BibTex]


{AirPose}: Multi-View Fusion Network for Aerial {3D} Human Pose and Shape Estimation
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Saini, N., Bonetto, E., Price, E., Ahmad, A., Black, M. J.

IEEE Robotics and Automation Letters, 7(2):4805-4812, IEEE, April 2022, Also accepted and presented in the 2022 IEEE International Conference on Robotics and Automation (ICRA) (article)

Abstract
In this letter, we present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments that uses a team of autonomous unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation. Existing methods are limited by calibrated cameras and off-line processing. Thus, we present the first method (AirPose) to estimate human pose and shape using images captured by multiple extrinsically uncalibrated flying cameras. AirPose calibrates the cameras relative to the person instead of in a classical way. It uses distributed neural networks running on each UAV that communicate viewpoint-independent information with each other about the person (i.e., their 3D shape and articulated pose). The persons shape and pose are parameterized using the SMPL-X body model, resulting in a compact representation, that minimizes communication between the UAVs. The network is trained using synthetic images of realistic virtual environments, and fine-tuned on a small set of real images. We also introduce an optimization-based post processing method (AirPose+) for offline applications that require higher mocap quality. We make code and data available for research at https://github.com/robot-perception-group/AirPose. Video describing the approach and results is available at https://youtu.be/Ss48ICeqvnQ.

paper code video pdf DOI [BibTex]

paper code video pdf DOI [BibTex]


Physical activity improves body image of sedentary adults. Exploring the roles of interoception and affective response
Physical activity improves body image of sedentary adults. Exploring the roles of interoception and affective response

Srismith, D., Dierkes, K., Zipfel, S., Thiel, A., Sudeck, G., Giel, K. E., Behrens, S. C.

Current Psychology, Springer, 2022 (article) Accepted

Abstract
To reduce the number of sedentary people, an improved understanding of effects of exercise in this specific group is needed. The present project investigates the impact of regular aerobic exercise uptake on body image, and how this effect is associated with differences in interoceptive abilities and affective response to exercise. Participants were 29 sedentary adults who underwent a 12-week aerobic physical activity intervention comprised of 30–36 sessions. Body image was improved with large effect sizes. Correlations were observed between affective response to physical activity and body image improvement, but not with interoceptive abilities. Explorative mediation models suggest a neglectable role of a priori interoceptive abilities. Instead, body image improvement was achieved when positive valence was assigned to interoceptive cues experienced during exercise.

DOI [BibTex]

DOI [BibTex]

2021


The neural coding of face and body orientation in occipitotemporal cortex
The neural coding of face and body orientation in occipitotemporal cortex

Foster, C., Zhao, M., Bolkart, T., Black, M. J., Bartels, A., Bülthoff, I.

NeuroImage, 246, pages: 118783, December 2021 (article)

Abstract
Face and body orientation convey important information for us to understand other people's actions, intentions and social interactions. It has been shown that several occipitotemporal areas respond differently to faces or bodies of different orientations. However, whether face and body orientation are processed by partially overlapping or completely separate brain networks remains unclear, as the neural coding of face and body orientation is often investigated separately. Here, we recorded participants’ brain activity using fMRI while they viewed faces and bodies shown from three different orientations, while attending to either orientation or identity information. Using multivoxel pattern analysis we investigated which brain regions process face and body orientation respectively, and which regions encode both face and body orientation in a stimulus-independent manner. We found that patterns of neural responses evoked by different stimulus orientations in the occipital face area, extrastriate body area, lateral occipital complex and right early visual cortex could generalise across faces and bodies, suggesting a stimulus-independent encoding of person orientation in occipitotemporal cortex. This finding was consistent across functionally defined regions of interest and a whole-brain searchlight approach. The fusiform face area responded to face but not body orientation, suggesting that orientation responses in this area are face-specific. Moreover, neural responses to orientation were remarkably consistent regardless of whether participants attended to the orientation of faces and bodies or not. Together, these results demonstrate that face and body orientation are processed in a partially overlapping brain network, with a stimulus-independent neural code for face and body orientation in occipitotemporal cortex.

paper DOI [BibTex]

2021

paper DOI [BibTex]


A pose-independent method for accurate and precise body composition from 3D optical scans
A pose-independent method for accurate and precise body composition from 3D optical scans

Wong, M. C., Ng, B. K., Tian, I., Sobhiyeh, S., Pagano, I., Dechenaud, M., Kennedy, S. F., Liu, Y. E., Kelly, N., Chow, D., Garber, A. K., Maskarinec, G., Pujades, S., Black, M. J., Curless, B., Heymsfield, S. B., Shepherd, J. A.

Obesity, 29(11):1835-1847, Wiley, November 2021 (article)

Abstract
Objective: The aim of this study was to investigate whether digitally re-posing three-dimensional optical (3DO) whole-body scans to a standardized pose would improve body composition accuracy and precision regardless of the initial pose. Methods: Healthy adults (n = 540), stratified by sex, BMI, and age, completed whole-body 3DO and dual-energy X-ray absorptiometry (DXA) scans in the Shape Up! Adults study. The 3DO mesh vertices were represented with standardized templates and a low-dimensional space by principal component analysis (stratified by sex). The total sample was split into a training (80%) and test (20%) set for both males and females. Stepwise linear regression was used to build prediction models for body composition and anthropometry outputs using 3DO principal components (PCs). Results: The analysis included 472 participants after exclusions. After re-posing, three PCs described 95% of the shape variance in the male and female training sets. 3DO body composition accuracy compared with DXA was as follows: fat mass R2 = 0.91 male, 0.94 female; fat-free mass R2 = 0.95 male, 0.92 female; visceral fat mass R2 = 0.77 male, 0.79 female. Conclusions: Re-posed 3DO body shape PCs produced more accurate and precise body composition models that may be used in clinical or nonclinical settings when DXA is unavailable or when frequent ionizing radiation exposure is unwanted.

Wiley online adress DOI Project Page [BibTex]

Wiley online adress Author Version DOI Project Page [BibTex]


Separated and overlapping neural coding of face and body identity
Separated and overlapping neural coding of face and body identity

Foster, C., Zhao, M., Bolkart, T., Black, M. J., Bartels, A., Bülthoff, I.

Human Brain Mapping, 42(13):4242-4260, September 2021 (article)

Abstract
Recognising a person's identity often relies on face and body information, and is tolerant to changes in low-level visual input (e.g., viewpoint changes). Previous studies have suggested that face identity is disentangled from low-level visual input in the anterior face-responsive regions. It remains unclear which regions disentangle body identity from variations in viewpoint, and whether face and body identity are encoded separately or combined into a coherent person identity representation. We trained participants to recognise three identities, and then recorded their brain activity using fMRI while they viewed face and body images of these three identities from different viewpoints. Participants' task was to respond to either the stimulus identity or viewpoint. We found consistent decoding of body identity across viewpoint in the fusiform body area, right anterior temporal cortex, middle frontal gyrus and right insula. This finding demonstrates a similar function of fusiform and anterior temporal cortex for bodies as has previously been shown for faces, suggesting these regions may play a general role in extracting high-level identity information. Moreover, we could decode identity across fMRI activity evoked by faces and bodies in the early visual cortex, right inferior occipital cortex, right parahippocampal cortex and right superior parietal cortex, revealing a distributed network that encodes person identity abstractly. Lastly, identity decoding was consistently better when participants attended to identity, indicating that attention to identity enhances its neural representation. These results offer new insights into how the brain develops an abstract neural coding of person identity, shared by faces and bodies.

on-line DOI Project Page [BibTex]

on-line DOI Project Page [BibTex]


Skinned multi-infant linear body model
Skinned multi-infant linear body model

Hesse, N., Pujades, S., Romero, J., Black, M.

(US Patent 11,127,163, 2021), September 2021 (patent)

Abstract
A computer-implemented method for automatically obtaining pose and shape parameters of a human body. The method includes obtaining a sequence of digital 3D images of the body, recorded by at least one depth camera; automatically obtaining pose and shape parameters of the body, based on images of the sequence and a statistical body model; and outputting the pose and shape parameters. The body may be an infant body.

[BibTex]

[BibTex]


The role of sexual orientation in the relationships between body perception, body weight dissatisfaction, physical comparison, and eating psychopathology in the cisgender population
The role of sexual orientation in the relationships between body perception, body weight dissatisfaction, physical comparison, and eating psychopathology in the cisgender population

Meneguzzo, P., Collantoni, E., Bonello, E., Vergine, M., Behrens, S. C., Tenconi, E., Favaro, A.

Eating and Weight Disorders - Studies on Anorexia, Bulimia and Obesity , 26(6):1985-2000, Springer, August 2021 (article)

Abstract
Purpose: Body weight dissatisfaction (BWD) and visual body perception are specific aspects that can influence the own body image, and that can concur with the development or the maintenance of specific psychopathological dimensions of different psychiatric disorders. The sexual orientation is a fundamental but understudied aspect in this field, and, for this reason, the purpose of this study is to improve knowledge about the relationships among BWD, visual body size-perception, and sexual orientation. Methods: A total of 1033 individuals participated in an online survey. Physical comparison, depression, and self-esteem was evaluated, as well as sexual orientation and the presence of an eating disorder. A Figure Rating Scale was used to assess different valences of body weight, and mediation analyses were performed to investigated specific relationships between psychological aspects. Results: Bisexual women and gay men reported significantly higher BWD than other groups (p < 0.001); instead, higher body misperception was present in gay men (p = 0.001). Physical appearance comparison mediated the effect of sexual orientation in both BWD and perceptual distortion. No difference emerged between women with a history of eating disorders and without, as regards the value of body weight attributed to attractiveness, health, and presence on social media. Conclusion: This study contributes to understanding the relationship between sexual orientations and body image representation and evaluation. Physical appearance comparisons should be considered as critical psychological factors that can improve and affect well-being. The impact on subjects with high levels of eating concerns is also discussed. Level of evidence: Level III: case–control analytic study.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Learning an Animatable Detailed {3D} Face Model from In-the-Wild Images
Learning an Animatable Detailed 3D Face Model from In-the-Wild Images

Feng, Y., Feng, H., Black, M. J., Bolkart, T.

ACM Transactions on Graphics, 40(4):88:1-88:13, August 2021 (article)

Abstract
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://deca.is.tue.mpg.de.

pdf Sup Mat code video talk DOI Project Page [BibTex]


Red shape, blue shape: Political ideology influences the social perception of body shape
Red shape, blue shape: Political ideology influences the social perception of body shape

Quiros-Ramirez, M. A., Streuber, S., Black, M. J.

Nature Humanities and Social Sciences Communications, 8, pages: 148, June 2021 (article)

Abstract
Political elections have a profound impact on individuals and societies. Optimal voting is thought to be based on informed and deliberate decisions yet, it has been demonstrated that the outcomes of political elections are biased by the perception of candidates’ facial features and the stereotypical traits voters attribute to these. Interestingly, political identification changes the attribution of stereotypical traits from facial features. This study explores whether the perception of body shape elicits similar effects on political trait attribution and whether these associations can be visualized. In Experiment 1, ratings of 3D body shapes were used to model the relationship between perception of 3D body shape and the attribution of political traits such as ‘Republican’, ‘Democrat’, or ‘Leader’. This allowed analyzing and visualizing the mental representations of stereotypical 3D body shapes associated with each political trait. Experiment 2 was designed to test whether political identification of the raters affected the attribution of political traits to different types of body shapes. The results show that humans attribute political traits to the same body shapes differently depending on their own political preference. These findings show that our judgments of others are influenced by their body shape and our own political views. Such judgments have potential political and societal implications.

pdf on-line sup. mat. sup. figure author pdf DOI Project Page [BibTex]


Weight bias and linguistic body representation in anorexia nervosa: Findings from the BodyTalk project
Weight bias and linguistic body representation in anorexia nervosa: Findings from the BodyTalk project

(Top Cited Article 2021-2022)

Behrens, S. C., Meneguzzo, P., Favaro, A., Teufel, M., Skoda, E., Lindner, M., Walder, L., Quiros-Ramirez, M. A., Zipfel, S., Mohler, B., Black, M., Giel, K. E.

European Eating Disorders Review, 29(2):204-215, Wiley, March 2021 (article)

Abstract
Objective: This study provides a comprehensive assessment of own body representation and linguistic representation of bodies in general in women with typical and atypical anorexia nervosa (AN). Methods: In a series of desktop experiments, participants rated a set of adjectives according to their match with a series of computer generated bodies varying in body mass index, and generated prototypic body shapes for the same set of adjectives. We analysed how body mass index of the bodies was associated with positive or negative valence of the adjectives in the different groups. Further, body image and own body perception were assessed. Results: In a German‐Italian sample comprising 39 women with AN, 20 women with atypical AN and 40 age matched control participants, we observed effects indicative of weight stigmatization, but no significant differences between the groups. Generally, positive adjectives were associated with lean bodies, whereas negative adjectives were associated with obese bodies. Discussion: Our observations suggest that patients with both typical and atypical AN affectively and visually represent body descriptions not differently from healthy women. We conclude that overvaluation of low body weight and fear of weight gain cannot be explained by generally distorted perception or cognition, but require individual consideration.

on-line pdf DOI Project Page [BibTex]

on-line pdf DOI Project Page [BibTex]


Analyzing the Direction of Emotional Influence in Nonverbal Dyadic Communication: A Facial-Expression Study
Analyzing the Direction of Emotional Influence in Nonverbal Dyadic Communication: A Facial-Expression Study

Shadaydeh, M., Müller, L., Schneider, D., Thuemmel, M., Kessler, T., Denzler, J.

IEEE Access, 9, pages: 73780-73790, IEEE, 2021 (article)

Abstract
Identifying the direction of emotional influence in a dyadic dialogue is of increasing interest in the psychological sciences with applications in psychotherapy, analysis of political interactions, or interpersonal conflict behavior. Facial expressions are widely described as being automatic and thus hard to be overtly influenced. As such, they are a perfect measure for a better understanding of unintentional behavior cues about socio-emotional cognitive processes. With this view, this study is concerned with the analysis of the direction of emotional influence in dyadic dialogues based on facial expressions only. We exploit computer vision capabilities along with causal inference theory for quantitative verification of hypotheses on the direction of emotional influence, i.e., cause-effect relationships, in dyadic dialogues. We address two main issues. First, in a dyadic dialogue, emotional influence occurs over transient time intervals and with intensity and direction that are variant over time. To this end, we propose a relevant interval selection approach that we use prior to causal inference to identify those transient intervals where causal inference should be applied. Second, we propose to use fine-grained facial expressions that are present when strong distinct facial emotions are not visible. To specify the direction of influence, we apply the concept of Granger causality to the time-series of facial expressions over selected relevant intervals. We tested our approach on newly, experimentally obtained data. Based on quantitative verification of hypotheses on the direction of emotional influence, we were able to show that the proposed approach is promising to reveal the cause-effect pattern in various instructed interaction conditions.

DOI [BibTex]

DOI [BibTex]


Scientific Report 2016 - 2021
Scientific Report 2016 - 2021
2021 (mpi_year_book)

Abstract
This report presents research done at the Max Planck Institute for Intelligent Systems from January2016 to November 2021. It is our fourth report since the founding of the institute in 2011. Dueto the fact that the upcoming evaluation is an extended one, the report covers a longer reportingperiod.This scientific report is organized as follows: we begin with an overview of the institute, includingan outline of its structure, an introduction of our latest research departments, and a presentationof our main collaborative initiatives and activities (Chapter1). The central part of the scientificreport consists of chapters on the research conducted by the institute’s departments (Chapters2to6) and its independent research groups (Chapters7 to24), as well as the work of the institute’scentral scientific facilities (Chapter25). For entities founded after January 2016, the respectivereport sections cover work done from the date of the establishment of the department, group, orfacility. These chapters are followed by a summary of selected outreach activities and scientificevents hosted by the institute (Chapter26). The scientific publications of the featured departmentsand research groups published during the 6-year review period complete this scientific report.

Scientific Report 2016 - 2021 [BibTex]


Body Image Disturbances and Weight Bias After Obesity Surgery: Semantic and Visual Evaluation in a Controlled Study, Findings from the BodyTalk Project
Body Image Disturbances and Weight Bias After Obesity Surgery: Semantic and Visual Evaluation in a Controlled Study, Findings from the BodyTalk Project

Meneguzzo, P., Behrens, S. C., Favaro, A., Tenconi, E., Vindigni, V., Teufel, M., Skoda, E., Lindner, M., Quiros-Ramirez, M. A., Mohler, B., Black, M., Zipfel, S., Giel, K. E., Pavan, C.

Obesity Surgery, 31(4):1625-1634, 2021 (article)

Abstract
Purpose: Body image has a significant impact on the outcome of obesity surgery. This study aims to perform a semantic evaluation of body shapes in obesity surgery patients and a group of controls. Materials and Methods: Thirty-four obesity surgery (OS) subjects, stable after weight loss (average 48.03 ± 18.60 kg), and 35 overweight/obese controls (MC), were enrolled in this study. Body dissatisfaction, self-esteem, and body perception were evaluated with self-reported tests, and semantic evaluation of body shapes was performed with three specific tasks constructed with realistic human body stimuli. Results: The OS showed a more positive body image compared to HC (p < 0.001), higher levels of depression (p < 0.019), and lower self-esteem (p < 0.000). OS patients and HC showed no difference in weight bias, but OS used a higher BMI than HC in the visualization of positive adjectives (p = 0.011). Both groups showed a mental underestimation of their body shapes. Conclusion: OS patients are more psychologically burdened and have more difficulties in judging their bodies than overweight/obese peers. Their mental body representations seem not to be linked to their own BMI. Our findings provide helpful insight for the design of specific interventions in body image in obese and overweight people, as well as in OS.

paper pdf DOI Project Page [BibTex]

paper pdf DOI Project Page [BibTex]

2020


Occlusion Boundary: A Formal Definition & Its Detection via Deep Exploration of Context
Occlusion Boundary: A Formal Definition & Its Detection via Deep Exploration of Context

Wang, C., Fu, H., Tao, D., Black, M. J.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2641-2656, November 2020 (article)

Abstract
Occlusion boundaries contain rich perceptual information about the underlying scene structure and provide important cues in many visual perception-related tasks such as object recognition, segmentation, motion estimation, scene understanding, and autonomous navigation. However, there is no formal definition of occlusion boundaries in the literature, and state-of-the-art occlusion boundary detection is still suboptimal. With this in mind, in this paper we propose a formal definition of occlusion boundaries for related studies. Further, based on a novel idea, we develop two concrete approaches with different characteristics to detect occlusion boundaries in video sequences via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context) with deep models and conditional random fields. Experimental evaluations of our methods on two challenging occlusion boundary benchmarks (CMU and VSB100) demonstrate that our detectors significantly outperform the current state-of-the-art. Finally, we empirically assess the roles of several important components of the proposed detectors to validate the rationale behind these approaches.

official version DOI [BibTex]

2020

official version DOI [BibTex]


3D Morphable Face Models - Past, Present and Future
3D Morphable Face Models - Past, Present and Future

Egger, B., Smith, W. A. P., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., Theobalt, C., Blanz, V., Vetter, T.

ACM Transactions on Graphics, 39(5):157, October 2020 (article)

Abstract
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

project page pdf preprint DOI Project Page [BibTex]

project page pdf preprint DOI Project Page [BibTex]


AirCapRL: Autonomous Aerial Human Motion Capture Using Deep Reinforcement Learning
AirCapRL: Autonomous Aerial Human Motion Capture Using Deep Reinforcement Learning

Tallamraju, R., Saini, N., Bonetto, E., Pabst, M., Liu, Y. T., Black, M., Ahmad, A.

IEEE Robotics and Automation Letters, 5(4):6678-6685, IEEE, October 2020, Also accepted and presented in the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (article)

Abstract
In this letter, we introduce a deep reinforcement learning (DRL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap). We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose, and shape of a single moving person using multiple micro aerial vehicles. State-of-the-art solutions to this problem are based on classical control methods, which depend on hand-crafted system, and observation models. Such models are difficult to derive, and generalize across different systems. Moreover, the non-linearities, and non-convexities of these models lead to sub-optimal controls. In our work, we formulate this problem as a sequential decision making task to achieve the vision-based motion capture objectives, and solve it using a deep neural network-based RL method. We leverage proximal policy optimization (PPO) to train a stochastic decentralized control policy for formation control. The neural network is trained in a parallelized setup in synthetic environments. We performed extensive simulation experiments to validate our approach. Finally, real-robot experiments demonstrate that our policies generalize to real world conditions.

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


Analysis of motor development within the first year of life: 3-{D} motion tracking without markers for early detection of developmental disorders
Analysis of motor development within the first year of life: 3-D motion tracking without markers for early detection of developmental disorders

Parisi, C., Hesse, N., Tacke, U., Rocamora, S. P., Blaschek, A., Hadders-Algra, M., Black, M. J., Heinen, F., Müller-Felber, W., Schroeder, A. S.

Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz, 63(7):881–890, July 2020 (article)

Abstract
Children with motor development disorders benefit greatly from early interventions. An early diagnosis in pediatric preventive care (U2–U5) can be improved by automated screening. Current approaches to automated motion analysis, however, are expensive, require lots of technical support, and cannot be used in broad clinical application. Here we present an inexpensive, marker-free video analysis tool (KineMAT) for infants, which digitizes 3‑D movements of the entire body over time allowing automated analysis in the future. Three-minute video sequences of spontaneously moving infants were recorded with a commercially available depth-imaging camera and aligned with a virtual infant body model (SMIL model). The virtual image generated allows any measurements to be carried out in 3‑D with high precision. We demonstrate seven infants with different diagnoses. A selection of possible movement parameters was quantified and aligned with diagnosis-specific movement characteristics. KineMAT and the SMIL model allow reliable, three-dimensional measurements of spontaneous activity in infants with a very low error rate. Based on machine-learning algorithms, KineMAT can be trained to automatically recognize pathological spontaneous motor skills. It is inexpensive and easy to use and can be developed into a screening tool for preventive care for children.

pdf on-line w/ sup mat DOI Project Page [BibTex]

pdf on-line w/ sup mat DOI Project Page [BibTex]


Learning and Tracking the {3D} Body Shape of Freely Moving Infants from {RGB-D} sequences
Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

Hesse, N., Pujades, S., Black, M., Arens, M., Hofmann, U., Schroeder, S.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(10):2540-2551, 2020 (article)

Abstract
Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

pdf Journal DOI Project Page [BibTex]

pdf Journal DOI Project Page [BibTex]


Machine learning systems and methods of estimating body shape from images
Machine learning systems and methods of estimating body shape from images

Black, M., Rachlin, E., Heron, N., Loper, M., Weiss, A., Hu, K., Hinkle, T., Kristiansen, M.

(US Patent 10,679,046), June 2020 (patent)

Abstract
Disclosed is a method including receiving an input image including a human, predicting, based on a convolutional neural network that is trained using examples consisting of pairs of sensor data, a corresponding body shape of the human and utilizing the corresponding body shape predicted from the convolutional neural network as input to another convolutional neural network to predict additional body shape metrics.

[BibTex]

[BibTex]


General Movement Assessment from videos of computed {3D} infant body models is equally effective compared to conventional {RGB} Video rating
General Movement Assessment from videos of computed 3D infant body models is equally effective compared to conventional RGB Video rating

Schroeder, S., Hesse, N., Weinberger, R., Tacke, U., Gerstl, L., Hilgendorff, A., Heinen, F., Arens, M., Bodensteiner, C., Dijkstra, L. J., Pujades, S., Black, M., Hadders-Algra, M.

Early Human Development, 144, pages: 104967, May 2020 (article)

Abstract
Background: General Movement Assessment (GMA) is a powerful tool to predict Cerebral Palsy (CP). Yet, GMA requires substantial training hampering its implementation in clinical routine. This inspired a world-wide quest for automated GMA. Aim: To test whether a low-cost, marker-less system for three-dimensional motion capture from RGB depth sequences using a whole body infant model may serve as the basis for automated GMA. Study design: Clinical case study at an academic neurodevelopmental outpatient clinic. Subjects: Twenty-nine high-risk infants were recruited and assessed at their clinical follow-up at 2-4 month corrected age (CA). Their neurodevelopmental outcome was assessed regularly up to 12-31 months CA. Outcome measures: GMA according to Hadders-Algra by a masked GMA-expert of conventional and computed 3D body model (“SMIL motion”) videos of the same GMs. Agreement between both GMAs was assessed, and sensitivity and specificity of both methods to predict CP at ≥12 months CA. Results: The agreement of the two GMA ratings was substantial, with κ=0.66 for the classification of definitely abnormal (DA) GMs and an ICC of 0.887 (95% CI 0.762;0.947) for a more detailed GM-scoring. Five children were diagnosed with CP (four bilateral, one unilateral CP). The GMs of the child with unilateral CP were twice rated as mildly abnormal. DA-ratings of both videos predicted bilateral CP well: sensitivity 75% and 100%, specificity 88% and 92% for conventional and SMIL motion videos, respectively. Conclusions: Our computed infant 3D full body model is an attractive starting point for automated GMA in infants at risk of CP.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Real Time Trajectory Prediction Using Deep Conditional Generative Models
Real Time Trajectory Prediction Using Deep Conditional Generative Models

Gomez-Gonzalez, S., Prokudin, S., Schölkopf, B., Peters, J.

IEEE Robotics and Automation Letters, 5(2):970-976, IEEE, April 2020 (article)

arXiv DOI [BibTex]


Learning Multi-Human Optical Flow
Learning Multi-Human Optical Flow

Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J.

International Journal of Computer Vision (IJCV), 128(4):873-890, April 2020 (article)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

pdf DOI poster link (url) DOI Project Page Project Page [BibTex]


The iReAct study - A biopsychosocial analysis of the individual response to physical activity
The iReAct study - A biopsychosocial analysis of the individual response to physical activity

Thiel, A., Sudeck, G., Gropper, H., Maturana, F. M., Schubert, T., Srismith, D., Widmann, M., Behrens, S., Martus, P., Munz, B., Giel, K., Zipfel, S., Niess, A. M.

Contemporary Clinical Trials Communications , 17, pages: 100508, March 2020 (article)

Abstract
Background Physical activity is a substantial promoter for health and well-being. Yet, while an increasing number of studies shows that the responsiveness to physical activity is highly individual, most studies focus this issue from only one perspective and neglect other contributing aspects. In reference to a biopsychosocial framework, the goal of our study is to examine how physically inactive individuals respond to two distinct standardized endurance trainings on various levels. Based on an assessment of activity- and health-related biographical experiences across the life course, our mixed-method study analyzes the responsiveness to physical activity in the form of a transdisciplinary approach, considering physiological, epigenetic, motivational, affective, and body image-related aspects. Methods Participants are randomly assigned to two different training programs (High Intensity Interval Training vs. Moderate Intensity Continuous Training) for six weeks. After this first training period, participants switch training modes according to a two-period sequential-training-intervention (STI) design and train for another six weeks. In order to analyse baseline characteristics as well as acute and adaptive biopsychosocial responses, three extensive mixed-methods diagnostic blocks take place at the beginning (t0) of the study and after the first (t1) and the second (t2) training period resulting in a net follow-up time of 15 weeks. The study is divided into five modules in order to cover a wide array of perspectives. Discussion The study's transdisciplinary mixed-method design allows to interlace a multitude of subjective and objective data and therefore to draw an integrated picture of the biopsychosocial efficacy of two distinct physical activity programs. The results of our study can be expected to contribute to the development and design of individualised training programs for the promotion of physical activity.

DOI [BibTex]

DOI [BibTex]


Machine learning systems and methods for augmenting images
Machine learning systems and methods for augmenting images

Black, M., Rachlin, E., Lee, E., Heron, N., Loper, M., Weiss, A., Smith, D.

(US Patent 10,529,137 B1), January 2020 (patent)

Abstract
Disclosed is a method including receiving visual input comprising a human within a scene, detecting a pose associated with the human using a trained machine learning model that detects human poses to yield a first output, estimating a shape (and optionally a motion) associated with the human using a trained machine learning model associated that detects shape (and optionally motion) to yield a second output, recognizing the scene associated with the visual input using a trained convolutional neural network which determines information about the human and other objects in the scene to yield a third output, and augmenting reality within the scene by leveraging one or more of the first output, the second output, and the third output to place 2D and/or 3D graphics in the scene.

[BibTex]

[BibTex]


Influence of Physical Activity Interventions on Body Representation: A Systematic Review
Influence of Physical Activity Interventions on Body Representation: A Systematic Review

Srismith, D., Wider, L., Wong, H. Y., Zipfel, S., Thiel, A., Giel, K. E., Behrens, S. C.

Frontiers in Psychiatry, 11, pages: 99, 2020 (article)

Abstract
Distorted representation of one's own body is a diagnostic criterion and corepsychopathology of disorders such as anorexia nervosa and body dysmorphic disorder. Previousliterature has raised the possibility of utilising physical activity intervention (PI) as atreatment option for individuals suffering from poor body satisfaction, which is traditionallyregarded as a systematic distortion in “body image.” In this systematic review,conducted according to the PRISMA statement, the evidence on effectiveness of PI on body representation outcomes is synthesised. We provide an update of 34 longitudinal studies evaluating the effectiveness of different types of PIs on body representation. No systematic risk of bias within or across studies were identified. The reviewed studies show that the implementation of structured PIs may be efficacious in increasing individuals’ satisfaction of their own body, and thus improving their subjective body image related assessments. However, there is no clear evidence regarding an additional or interactive effect of PI when implemented in conjunction with established treatments for clinical populations. We argue for theoretically sound, mechanism-oriented, multimethod approaches to future investigations on body image disturbance. Specifically, we highlight the need to consider expanding the theoretical framework for the investigation of body representation disturbances to include further body representations besides body image.

DOI [BibTex]

DOI [BibTex]

2019


Decoding subcategories of human bodies from both body- and face-responsive cortical regions
Decoding subcategories of human bodies from both body- and face-responsive cortical regions

Foster, C., Zhao, M., Romero, J., Black, M. J., Mohler, B. J., Bartels, A., Bülthoff, I.

NeuroImage, 202(15):116085, November 2019 (article)

Abstract
Our visual system can easily categorize objects (e.g. faces vs. bodies) and further differentiate them into subcategories (e.g. male vs. female). This ability is particularly important for objects of social significance, such as human faces and bodies. While many studies have demonstrated category selectivity to faces and bodies in the brain, how subcategories of faces and bodies are represented remains unclear. Here, we investigated how the brain encodes two prominent subcategories shared by both faces and bodies, sex and weight, and whether neural responses to these subcategories rely on low-level visual, high-level visual or semantic similarity. We recorded brain activity with fMRI while participants viewed faces and bodies that varied in sex, weight, and image size. The results showed that the sex of bodies can be decoded from both body- and face-responsive brain areas, with the former exhibiting more consistent size-invariant decoding than the latter. Body weight could also be decoded in face-responsive areas and in distributed body-responsive areas, and this decoding was also invariant to image size. The weight of faces could be decoded from the fusiform body area (FBA), and weight could be decoded across face and body stimuli in the extrastriate body area (EBA) and a distributed body-responsive area. The sex of well-controlled faces (e.g. excluding hairstyles) could not be decoded from face- or body-responsive regions. These results demonstrate that both face- and body-responsive brain regions encode information that can distinguish the sex and weight of bodies. Moreover, the neural patterns corresponding to sex and weight were invariant to image size and could sometimes generalize across face and body stimuli, suggesting that such subcategorical information is encoded with a high-level visual or semantic code.

paper pdf DOI Project Page [BibTex]


Active Perception based Formation Control for Multiple Aerial Vehicles
Active Perception based Formation Control for Multiple Aerial Vehicles

Tallamraju, R., Price, E., Ludwig, R., Karlapalem, K., Bülthoff, H. H., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, Robotics and Automation Letters, 4(4):4491-4498, IEEE, October 2019 (article)

Abstract
We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Method for providing a three dimensional body model
Method for providing a three dimensional body model

Loper, M., Mahmood, N., Black, M.

September 2019, U.S.~Patent 10,417,818 (patent)

Abstract
A method for providing a three-dimensional body model which may be applied for an animation, based on a moving body, wherein the method comprises providing a parametric three-dimensional body model, which allows shape and pose variations; applying a standard set of body markers; optimizing the set of body markers by generating an additional set of body markers and applying the same for providing 3D coordinate marker signals for capturing shape and pose of the body and dynamics of soft tissue; and automatically providing an animation by processing the 3D coordinate marker signals in order to provide a personalized three-dimensional body model, based on estimated shape and an estimated pose of the body by means of predicted marker locations.

MoSh Project pdf [BibTex]


Decoding the Viewpoint and Identity of Faces and Bodies
Decoding the Viewpoint and Identity of Faces and Bodies

Foster, C., Zhao, M., Bolkart, T., Black, M., Bartels, A., Bülthoff, I.

Journal of Vision, 19(10): 54c, pages: 54-55, Arvo Journals, September 2019 (article)

Abstract
(2019). . , 19(10): 25.13, 54-55. doi: Zitierlink: http://hdl.handle.net/21.11116/0000-0003-7493-4

link (url) DOI Project Page [BibTex]

link (url) DOI Project Page [BibTex]


 Perceptual Effects of Inconsistency in Human Animations
Perceptual Effects of Inconsistency in Human Animations

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

ACM Trans. Appl. Percept., 16(1):2:1-2:18, February 2019 (article)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person’s movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. From these data, we estimated both the kinematics of the actions as well as the performer’s individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. Using these stimuli we conducted three experiments in an immersive virtual reality environment. First, a group of participants detected which of two stimuli was inconsistent. Performance was very low, and results were only marginally significant. Next, a second group of participants rated perceived attractiveness, eeriness, and humanness of consistent and inconsistent stimuli, but these judgements of animation characteristics were not affected by consistency of the stimuli. Finally, a third group of participants rated properties of the objects rather than of the performers. Here, we found strong influences of shape-motion inconsistency on perceived weight and thrown distance of objects. This suggests that the visual system relies on its knowledge of shape and motion and that these components are assimilated into an altered perception of the action outcome. We propose that the visual system attempts to resist inconsistent interpretations of human animations. Actions involving object manipulations present an opportunity for the visual system to reinterpret the introduced inconsistencies as a change in the dynamics of an object rather than as an unexpected combination of body shape and body motion.

publisher pdf DOI Project Page [BibTex]

publisher pdf DOI Project Page [BibTex]


Scientific Report 2016 - 2018
Scientific Report 2016 - 2018
2019 (mpi_year_book)

Abstract
This report presents research done at the Max Planck Institute for Intelligent Systems from January 2016 to December 2018. It is our third report since the founding of the institute in 2011. This status report is organized as follows: we begin with an overview of the institute, including its organizational structure (Chapter 1). The central part of the scientific report consists of chapters on the research conducted by the institute’s departments (Chapters 2 to 5) and its independent research groups (Chapters 6 to 18), as well as the work of the institute’s central scientific facilities (Chapter 19). For entities founded after January 2016, the respective report sections cover work done from the date of the establishment of the department, group, or facility.

Scientific Report 2016 - 2018 [BibTex]


The Virtual Caliper: Rapid Creation of Metrically Accurate Avatars from {3D} Measurements
The Virtual Caliper: Rapid Creation of Metrically Accurate Avatars from 3D Measurements

Pujades, S., Mohler, B., Thaler, A., Tesch, J., Mahmood, N., Hesse, N., Bülthoff, H. H., Black, M. J.

IEEE Transactions on Visualization and Computer Graphics, 25(5):1887-1897, IEEE, 2019 (article)

Abstract
Creating metrically accurate avatars is important for many applications such as virtual clothing try-on, ergonomics, medicine, immersive social media, telepresence, and gaming. Creating avatars that precisely represent a particular individual is challenging however, due to the need for expensive 3D scanners, privacy issues with photographs or videos, and difficulty in making accurate tailoring measurements. We overcome these challenges by creating “The Virtual Caliper”, which uses VR game controllers to make simple measurements. First, we establish what body measurements users can reliably make on their own body. We find several distance measurements to be good candidates and then verify that these are linearly related to 3D body shape as represented by the SMPL body model. The Virtual Caliper enables novice users to accurately measure themselves and create an avatar with their own body shape. We evaluate the metric accuracy relative to ground truth 3D body scan data, compare the method quantitatively to other avatar creation tools, and perform extensive perceptual studies. We also provide a software application to the community that enables novices to rapidly create avatars in fewer than five minutes. Not only is our approach more rapid than existing methods, it exports a metrically accurate 3D avatar model that is rigged and skinned.

Project Page IEEE Open Access IEEE Open Access PDF DOI Project Page [BibTex]