Combine and conquer: representation learning from multiple data distributions (Talk)
It is becoming less and less controversial to say that the days of learning representations through label supervision are over. Recent work discovers that such regimes are not only expensive, but also suffer from various generalisation/robustness issues. This is somewhat unsurprising, as perceptual data (vision, language) are rich and cannot be well represented by a single label --- doing so inevitably result in the model learning spurious features that trivially correlates to the label. In this talk, I will introduce my work during my PhD at Oxford, which looks at representation learning through multiple sources of data, e.g. vision and language. We show in both generative models (VAE) and discriminative models that learning to extract common abstract concepts between multiple modalities/domains can result in higher-quality and more generalisable representations. Additionally, we also look at improving the data-efficiency of such models, both through 1) using less multimodal pair by adopting contrastive-style objectives and 2) "generating" multimodal pair via masked image modelling.
Biography: Yuge Shi is a DPhil student at University of Oxford, supervised by Philip Torr and Siddharth Narayanaswamy. Her research focuses on representation learning through multiple sources of perceptual data (e.g. vision, language), with application to classic machine learning problems such as self-supervised learning, semi-supervised learning and domain generalisation. She also co-founded the GirlsWhoML organisation which aims at providing free workshops on machine learning and coding for those who identify as female/non-binary and served on the Oxford Women in Computer Science society as core committee member for three years.