Mining Visual Knowledge from Large Pre-trained Models (Talk)

Luming Tang (PhD candidate)

Cornell University

More information

Computer vision made huge progress in the past decade with the dominant supervised learning paradigm, that is training large-scale neural networks on each task with ever larger datasets. However, in many cases, scalable data or annotation collection is intractable. In contrast, humans can easily adapt to new vision tasks with very little data or labels. In order to bridge this gap, we found that there actually exists rich visual knowledge in large pre-trained models, i.e., models trained on scalable internet images with either self-supervised or generative objectives. And we proposed different techniques to extract these implicit knowledge and use them to accomplish specific downstream tasks where data is constrained including recognition, dense prediction and generation. Specifically, I’ll mainly present the following three works. Firstly, I will introduce an efficient and effective way to adapt pre-trained vision transformers to a variety of low-shot downstream tasks, while tuning only less than 1 percent of the model parameters. Secondly, I will show that accurate visual correspondences emerge from a strong generative model (i.e., diffusion models) without any supervision. Following that, I will demonstrate that an adapted diffusion model is able to complete a photo with true scene contents using only a few casual captured reference images.

Biography: Luming Tang is a final-year PhD student at Cornell University, working with Prof. Bharath Hariharan. Before that, he was an undergrad at Tsinghua University, studying physics. He has broad research interests in computer vision and machine learning, including generative models and representation learning, especially on how to solve challenging real-world vision problems where data or annotation is constrained.

LLM Diffusion Model

Details

18 January 2024 • 15:00 - 16:00
N3.022
Perceiving Systems

Organizers

Yuliang Xiu

Guest Scientist

Yandong Wen

Guest Scientist

Mining Visual Knowledge from Large Pre-trained Models (Talk)

Details

Organizers

Share

Latest News

Links

Contact Us