As we strive toward immersive virtual reality (VR), we are guided by the extreme requirements associated with visual quality, sound, quality, and intuitive interactions. A previous blog post provided intuition about how VR works and explained the need for minimized “motion to photon” latency (MTP). In this blog post, I want to dive deeper into motion tracking, which is not only the starting point of the MTP path but also essential for generating a high-quality visuals and sounds.
Precise, low-latency motion tracking of head movements is crucial for intuitive interactions with the virtual world — you need it for presence. For example, when turning your head to explore the virtual world, accurate head tracking will provide the pose to generate the proper visuals (and sounds). Similarly, if your head is stable and not moving, the visuals need to stay completely still, otherwise it may feel like you are on a boat. Motion is often characterized by how many degrees of freedom are possible in movement: either 3 degrees of freedom (3-DOF) or 6 degrees of freedom (6-DOF).
3-DOF detects rotational movement around the X, Y, and Z axis — the orientation. For head movements, that means being able to yaw, pitch, and roll your head (figure above), while keeping the rest of your body in the same location. 3-DOF in VR allows you to look around the virtual world from fixed points —think of a camera on a tripod. For many 360° spherical videos, 3-DOF will provide very immersive content, such as viewing sporting events from a particular seat or nature from a particular lookout point.
6-DOF detects rotational movement and translational movement — the orientation and position. This means that your body can now move from fixed viewpoints in the virtual world in the X, Y, and Z direction. 6-DOF in VR is very beneficial for experiences like gaming, where you can move freely in the virtual world and look around corners. However, even simple things, like looking at objects on a desk or shifting your head side-to-side can be compelling with 6-DOF. 6-DOF is more immersive since it captures our real movement and removes the sensory conflict between our vision and vestibular system.
6-DOF motion tracking through VIO
One solution to provide precise 6-DOF motion tracking is visual-inertial odometry (VIO). So what is VIO and how does VIO work? VIO estimates relative position and orientation of a moving device in an unknown environment using a camera and motion sensors. VIO takes advantage of the complementary strengths of the camera and inertial sensors.
For example, a single camera can estimate relative position, but it cannot provide absolute scale — the actual distances between objects, or the size of objects in meters or feet. Inertial sensors provide absolute scale and take measurement samples at a higher rate, thereby improving robustness for fast device motion. However, inertial sensors, particularly low-cost MEMS varieties, are prone to substantial drifts in position estimates when compared with cameras. So VIO blends together the best of both worlds to accurately estimate device pose.
An optimized on-device 6-DOF solution
At Qualcomm Technologies, we’ve spent a lot of time and effort figuring out how to make on-device VIO work within the power and thermal constraints of mobile devices. Motion tracking is essential not only for VR, but also for many other applications, such as drones, robotics, augmented reality, and indoor navigation. The on-device aspect is crucial for VR since it allows the VR headset (and you!) to be completely mobile — you can enjoy room-scale VR and not worry about being tethered to a PC or getting tangled in wires.
Qualcomm Technologies is uniquely positioned to support superior mobile VR experiences by designing for precise, low-latency motion tracking. By taking a system approach and custom designing specialized engines across the SoC, Qualcomm Snapdragon processors are engineered to provide an efficient heterogeneous computing solution that is optimized from end-to-end for latency, power, and performance (check out our VR white paper for more details). Let me walk you through how we achieve an optimized VIO solution on Snapdragon processors. Looking at the diagram below, you can see that the camera, accelerometers, and gyroscopes are the inputs to the Snapdragon VIO subsystem. The inertial sensors, which are the accelerometers and gyroscopes, are running at much higher frequency than the camera.
For Snapdragon VIO subsystem, the camera feature processing block detects features and tracks them. The inertial data processing block runs at the high frequencies of the inertial sensors and fuses the data. The VIO algorithms run on the Hexagon DSP for efficiency. The magic really happens when fusing the camera and inertial sensor data. Fusing this data allows for continuous localization of the head position. Finally, we need the head position at a high frequency, so we generate both an accurate, high-rate pose and predicted pose. This 6-DOF pose can then be used to quickly update the user’s view in the VR world with the precise visuals (and sound).
Our VIO solution is efficient for a few key reasons. First, the sensors and the processing are physically close to each other in the same device. This reduces wire and interface delays. Second, all the processing is happening on a highly integrated Snapdragon processor, which means that all the functionality is on the same chip. This reduces latency and power because of efficient interfaces and reduced memory transactions. Third, our system software is optimized for VIO and our Snapdragon VR SDK gives customers and developers access to take advantage of it. We look forward to seeing all the amazing 3-DOF and 6-DOF VR experiences that developers come up with.
Want to learn more? Be sure to sign up for our newsletter to receive the latest information about mobile computing.