Controlling Text-to-Image Diffusion by Orthogonal Finetuning
2023
Conference Paper
ei
ps
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
Author(s): | Qiu*, Z. and Liu*, W. and Feng, H. and Xue, Y. and Feng, Y. and Liu, Z. and Zhang, D. and Weller, A. and Schölkopf, B. |
Book Title: | Advances in Neural Information Processing Systems 36 (NeurIPS 2023) |
Volume: | 36 |
Pages: | 79320--79362 |
Year: | 2023 |
Month: | December |
Editors: | A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine |
Publisher: | Curran Associates, Inc. |
Department(s): | Empirical Inference, Perceiving Systems |
Bibtex Type: | Conference Paper (conference) |
Event Name: | 37th Annual Conference on Neural Information Processing Systems |
Event Place: | New Orleans, USA |
Note: | *equal contribution |
State: | Published |
URL: | https://proceedings.neurips.cc/paper_files/paper/2023/file/faacb7a4827b4d51e201666b93ab5fa7-Paper-Conference.pdf |
Links: |
Home
Code |
BibTex @conference{Qiuetal23, title = {Controlling Text-to-Image Diffusion by Orthogonal Finetuning}, author = {Qiu*, Z. and Liu*, W. and Feng, H. and Xue, Y. and Feng, Y. and Liu, Z. and Zhang, D. and Weller, A. and Sch{\"o}lkopf, B.}, booktitle = {Advances in Neural Information Processing Systems 36 (NeurIPS 2023)}, volume = {36}, pages = {79320--79362}, editors = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine}, publisher = {Curran Associates, Inc.}, month = dec, year = {2023}, note = {*equal contribution}, doi = {}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/faacb7a4827b4d51e201666b93ab5fa7-Paper-Conference.pdf}, month_numeric = {12} } |