Monocular vision-based tracking and 3D environmental mapping in real time is among the most challenging issues for mobile robot localization and navigation in large-scale unstructured natural environments. In large-scale natural scenery, the scene structure observed by a monocular camera can be ranged from less than one meter to hundreds of meters.
The conventional DSO system can generate very accurate maps for close scene, but the reconstruction accuracy of distant points is still not comparable with that of close ones due to their quadratic error property. Our proposed method addresses this problem by incorporating the idea of multi-baseline stereo into existing DSO system, that is, differentiating pixels by their depths and select proper sets of frames to perform reconstruction.
A multi-scale reconstruction framework is presented, where the red triangle represents the current reference frame for all levels, blue triangle and dot represent the keyframes within optimization window, the gray triangle represents the current frame, and the dashed triangle and dot mean the keyframes outside optimization window. At first level, the current frame is tracked against the current reference frame and all keyframes in the window. It should be noted that this layer is exactly same as conventional DSO. At a higher level, the inverse depth map of the current reference frame is further refined by optimizing the depth from distant frames.
The qualitative evaluation of reconstruction of MS-DSO and DSO on Symphony Lake Dataset is illustrated with reconstructed scene model from Google Earth. Inc. The quantity of points in point cloud counts the number of well-reconstructed points, thus can be treated as a measure of reconstruction precision. Observed from these two pairs of point clouds, our proposed MS-DSO could provide a much larger amount of well-constructed points than that of DSO with same parameters.
© 2019 DREAM Lab – Georgia Tech Lorraine