Computer vision – Our goal
Computer vision is often treated as problem of pattern recognition, 3D reconstruction, or image processing. While these all play supporting roles, our view is that the goal of computer vision is to infer what is not in the picture. The goal is to recognize the unseen. This is different from the Aristotelian view that “vision is knowing what is where by looking.” We see vision as the process of inferring the causes and motivations behind the images that we observe; that is, we want to infer the story behind the picture.
The most interesting stories involve people. Consequently, our research focuses on understanding humans and their actions in the world. We aim to recover human behavior in detail, including human-human interactions, and human interactions with the environment.
Development of Models and algorithms for computer training
Humans interact with each other and manipulate the world through their bodies, faces, hands and speech. If computers are to understand humans and our behavior, then they are going to have to understand much more about us than they currently do. For example, they need to recognize when we are picking up something heavy and might need help. They need to understand when we are distracted. They need to understand that changes in our behavior may signal medical or psychological changes.
To address this, we develop datasets, tools, models, and algorithms to train computers to recover human movement in unconstrained scenes at a level not previously possible. From single images or videos, we estimate full 3D body pose, including the motion of the face and the pose of the hands. We also recover the 3D structure of the world, its motion, and the objects in it so that human movement can be placed in context.
This is quite different from previous work in which the human body is treated in isolation, removed from the world around it. We see the interesting space as the one where people are present in, and interacting with, the 3D world. By estimating 3D models of people, scenes, and objects, we are able to place people in context and reason about their physical interactions with the world.
Recreation of natural behavior with Computer vision
We also seek to go deeper and understand the goals behind people's movements. To this end we relate natural language descriptions of human behavior to 3D movement. The goal is to learn from images and language how people behave and then recreate natural behavior in virtual 3D scenes.
To advance this agenda, Perceiving Systems combines computer vision with machine learning and computer graphics to capture, model, and synthesize digital humans. We see the virtual human as more than a useful artifact. We see it as a tool for understanding ourselves. If we can simulate a virtual human in a virtual world behaving in ways that are indistinguishable from a real human, then we assert that we have captured something about what it means to be human.
Computer Vision – Our impact
We want to have an impact beyond the academic discipline of computer vision. Consequently, we develop applications in medicine and psychology in collaboration with medical colleagues. We have also spun off two companies that are using our 3D body model technology. One of these, Body Labs Inc., was acquired by Amazon in 2017. We also make code and data available open source or for license and our SMPL body model is now in wide use. Finally, we are responsible for, or contribute to, widely used datasets and evaluation benchmarks that help push the state of the art and provide a platform for industry to understand what works, how well, and why.
Modeling 3D Humans and Animals
Our approach to understanding humans and their behavior is grounded on 3D models of the body and its movement. Such models facilitate reasoning about human-object interaction, contact, social touch, and emotion. They make explicit what is implicit in images -- the form of the body and its relationship to t... Read More
Human Pose, Shape, and Motion
As humans, we influence the world through our bodies. We express our emotions through our facial expressions and body posture. We manipulate and change the world with our hands. For computers to be full partners with humans, they have to see us and understand our behavior. They have to recognize our f... Read More
Behavior, Action, and Language
Much of our work focuses on capturing or estimating human movement. For this we seek metrically accurate 3D movement with increasing levels of detail. We are interested, however, in more than the movement of the joints, the facial muscles, the fingers, etc. What we really seek is what is behind human... Read More
Synthesizing People
Much of the work in Perceiving Systems focuses on the capture and modeling of humans and their motion. How do we know if our models are any good? What does it mean to have a good model of humans and their behavior? We argue for something akin to a Turing Test for avatars. Specifically, given a novel 3D scene, a digit... Read More
Society, Medicine, and Psychology
Our bodies and our health are intertwined. The question we ask is how we can leverage our models of the human body to detect and treat disease? To answer this, we collaborate with doctors and psychologists to relate body shape and movement to health. Specifically,... Read More
Scenes, Structure and Motion
Humans and animals live in, and interact with, the 3D world around them. To understand humans then, we must understand the surfaces that support them and the objects with which they interact. To that end, we develop methods to estimate the structure and motion of the world from a single image, video, or mu... Read More
Beyond Mocap
To understand human and animal movement, we want to capture it, model it, and then simulate it. Most methods for capturing human motion are restricted to laboratory environments and/or limited volumes. Most do not take into account the complex and rich environment in which humans usually operate. Nor do th... Read More
Datasets
Datasets with ground truth have driven many of the recent advances in computer vision. They allow evaluation and comparison so the field knows what works and what does not. They also provide training data to machine learning methods that are hungry for data. Code is equally important as it supp... Read More
Robot Perception Group
Robot Perception Group Github Organization Page Our focus is on vision-based perception in multi-robot systems. Our goal is to understand how teams of robots, especially flying robots, can act (navigate, cooperate ... Read More
Holistic Vision Group
Our goal is to understand the process of perception, to learn the representations that allow complex reasoning about visual input, inferring actions and predicting their consequences. We seek fundamental principles, algorithms and implementations for solving this task. In the past two years, we have made s... Read More
Data Team
Computer vision research today is driven by data. Capturing and processing data is both specialized and time consuming. Our unique multi-disciplinary Data Team supports researchers in collecting and processing data ranging from 4D body scans to crowd-sourced image labeling. For example, we help design, organize and run hum... Read More
Completed Projects
These projects represent work in Perceiving Systems between Jan 2011 and the present that has been superseded by new work or that we are no longer pursuing. Read More
- Human Pose, Shape and Action
- 3D Pose from Images
- 2D Pose from Images
- Beyond Motion Capture
- Action and Behavior
- Body Perception
- Body Applications
- Pose and Motion Priors
- Clothing Models (2011-2015)
- Reflectance Filtering
- Learning on Manifolds
- Markerless Animal Motion Capture
- Multi-Camera Capture
- 2D Pose from Optical Flow
- Body Perception
- Neural Prosthetics and Decoding
- Part-based Body Models
- Intrinsic Depth
- Lie Bodies
- Layers, Time and Segmentation
- Understanding Action Recognition (JHMDB)
- Intrinsic Video
- Intrinsic Images
- Action Recognition with Tracking
- Neural Control of Grasping
- Flowing Puppets
- Faces
- Deformable Structures
- Model-based Anthropometry
- Modeling 3D Human Breathing
- Optical flow in the LGN
- FlowCap
- Smooth Loops from Unconstrained Video
- PCA Flow
- Efficient and Scalable Inference
- Motion Blur in Layers
- Facade Segmentation
- Smooth Metric Learning
- Robust PCA
- 3D Recognition
- Object Detection