Session 3

Thibault Neveu, Visual Behavior
Date: Feb 10, 2022
Advanced in monocular depth estimation through supervised and self-supervised learning makes it possible to infer a semantically coherent depth map. However monocular depth estimation still lacks some features for practical embedded robotics applications. In such constraints scenarios, the scale and shift ambiguity, the computation requirements and the lack of temporal stability are some of the barriers to production. In this talk, I introduce ongoing work to include spatio-temporal inductive bias in the models. I first introduce how the flexibility of transformers architecture could improve models efficiency at run time. Then, I take a quick tour of how we can leverage multi-view geometry through optical expansion and depth from motion to solve the inherent scale ambiguity of monocular models. Talk by Thibault Neveu, CTO @ Visual Behavior
Join the next ACV here: