Existing optical flow algorithms (bottom left) do not make use of the semantics of the scene (top left). Our approach computes motion differently depending on the semantic class label of the region, resulting in more precise flow (bottom right). This also helps refine the object segmentation of the foreground objects (top right).
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class. Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI 2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.
Code can be downloaded here.
Precomputed segmentation and flow fields for the training set of KITTI 2015 can be downloaded here.