Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation task from visual observations involving cluttered and irregularly shaped objects. We investigate two GNN approaches and empirically assess their capability to generalize to scenarios with novel and an increasing number of objects. The first, Graph Networks (GN) based approach, considers explicitly defined edge attributes and not only does it consistently underperform an auto-encoder baseline that we modified to predict future states, our results indicate how different edge attributes can significantly influence the predictions. Consequently, we develop the Auto-Predictor that does not rely on explicitly defined edge attributes. It outperforms the baseline and the GN-based models. Overall, our results show the sensitivity of GNN-based approaches to the task representation, the efficacy of relational inductive biases and advocate choosing lightweight approaches that implicitly reason about relations over ones that leave these decisions to human designers.
Biography: Fabio Ferreira is a Master's student at Karlsruhe Institute of Technology (KIT) and works as a student researcher in the High Performance Humanoid Technologies Lab (H2T) under supervision of Prof. Tamim Asfour. For his Master's thesis research in the Interactive Perception Robot Learning Lab (IPRL) at Stanford University under supervision of Prof. Jeannette Bohg he investigated the efficacy of graph neural networks for learning object dynamics models from vision. His current research interest evolves around learning physics models from vision and to use such models for action planning in robotics. He is also interested in gaining more insights in learning representation spaces to better understand the effects of design decisions on these spaces for unsupervised or reinforcement learning downstream tasks. Fabio gained is CS B. Sc. at Karlsruhe University of Applied Sciences with emphasis on software engineering, graphs and computer vision.