Depth From Stereo
Depth-from-stereo (DFS) is a set of algorithms that start with two images, a left and a right stereo pair, and produce a disparity or depth map by correlating and tracking features between the two eyes. This is in some ways similar to how human stereo vision is able to provide depth perception. By having two views of the same scene, separated by a known distance, the shifting of features between the two views allows these algorithms to infer how far away an object is.
There is a subtle difference between disparity and depth. The output of DFS is a disparity map, which is a texture containing the measured disparity for features at each point in the view. This is the distance each object has shifted in the 2D sense between the two input images. Using the properties of the lenses in photographic content, or the camera setup for rendered content, these disparity values can be converted into depth values to produce a depth map, a texture where the value for each texel contains the distance from the camera to the object at that point.
DFS_Example
Sep 4, 2024 | 0:13

Use cases
Common use cases for DFS include a wide range of industries, from automotive advanced driver assistance systems (ADAS) to artistic image post-processing such as applying a synthetic depth of field effect.
One area where DFS (and in particular high performance / low latency DFS) is critical is in the field of mixed-reality XR devices. Due to the physical nature of a VR headset design, the outward facing cameras which provide the passthrough images can’t be located where the user’s actual eyes are. Even if the spacing between the two cameras matches the user’s eye spacing exactly, the cameras are still offset unnaturally forwards from the user’s eyes. If the image from these cameras was directly shown to the user unaltered it would be very physically uncomfortable. Objects would appear at the wrong scale and as they moved their head the motion of the cameras would sweep a different path through space compared to how their eyes are moving (due to the offset position), causing nausea and distortion in the perception of the world.
DFS is applied to these camera images in order to produce a depth map, and that depth map is used to reproject the camera images and produce an image that shows what the scene from the camera would have looked like if the camera could be located at the same position in space as the user’s own eyes. This gives a natural sense of scale and movement and is critical to have a viable mixed reality VR headset.
GPU DFS Strengths
The Adreno GPU DFS solution is built into the latest Adreno Motion Engine and provides a high-performance, low latency and low power implementation. Such solution can be critical in areas such as XR where latency is paramount.
Designed from the ground up to offer high quality results in as little total time and with as little latency as possible. The algorithm is highly optimized for the Adreno GPU, executing in many use cases in well under 1 millisecond. Being a GPU-based solution enables systems to pipeline the DFS operations back-to-back with the rendering workload, which will be consuming the depth buffer, essentially reducing the latency between operations to nearly zero.
Takeaway
Qualcomm Technologies’ Adreno GPU Depth-from-stereo support provides a high-performance, low latency solution for generating depth maps. A wide range of use cases, including demanding XR platforms, benefit from access to depth maps quickly and in a form that can be directly integrated into their existing GPU pipelines.

