The database used for detecting an object is extracted from "reference views" of the object. In case of planar objects, a typical choice for a reference view would be a frontal, full-resolution image of the planar object. Additional views can also be added to provide greater robustness to view-point changes. In this special case, the object geometry is also fairly simple, since all the reference features lie on a single plane.
In the case of 3D objects, there can be several reference views describing the object where features from different views need to be described in 3D coordinates consistent across all views. Qualcomm’s AR team has developed two alternative pipelines for extracting 3D object representations, also known as "3D object databases” depending on whether a 3D model of the object at hand is known or unknown. In this situation, the object geometry verification is determined by estimating a consistent rotation and translation that can explain the observed correspondences.
3D Object Feature Extraction
If the 3D model is given, features extracted from multiple object views are projected forward and their depth is estimated at the intersection with the model positioned consistent with the camera pose of the view in question. However, if the 3D model is not available, which is most often the case, the problem becomes much more difficult. To resolve this, we have developed a specialized SLAM-based scanning tool which extracts object features from a multitude of views and estimates their 3D coordinates using standard structure-from-motion techniques.