Perceiving Systems, Computer Vision

Synthesizing Environment-Specific People in Photographs

2024

Conference Paper

ncs

ps


We present ESP, a novel method for context-aware full-body generation, that enables photo-realistic synthesis and inpainting of people wearing clothing that is semantically appropriate for the scene depicted in an input photograph. ESP is conditioned on a 2D pose and contextual cues that are extracted from the photograph of the scene and integrated into the generation process, where the clothing is modeled explicitly with human parsing masks (HPM). Generated HPMs are used as tight guiding masks for inpainting, such that no changes are made to the original background. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms the state-of-the-art on the task of contextual full-body generation.

Author(s): Mirela Ostrek and Carol O’Sullivan and Michael Black and Justus Thies
Book Title: European Conference on Computer Vision (ECCV 2024)
Year: 2024
Month: October
Series: LNCS
Publisher: Springer Cham

Department(s): Neural Capture and Synthesis, Perceiving Systems
Bibtex Type: Conference Paper (inproceedings)
Paper Type: Conference

Event Name: European Conference on Computer Vision (ECCV 2024)
Event Place: Milan, Italy

Degree Type: PhD
Digital: True
State: Accepted
URL: https://esp.is.tue.mpg.de/

BibTex

@inproceedings{esp,
  title = {Synthesizing Environment-Specific People in Photographs},
  author = {Ostrek, Mirela and O'Sullivan, Carol and Black, Michael and Thies, Justus},
  booktitle = {European Conference on Computer Vision (ECCV 2024)},
  series = {LNCS},
  publisher = {Springer Cham},
  month = oct,
  year = {2024},
  doi = {},
  url = {https://esp.is.tue.mpg.de/},
  month_numeric = {10}
}