Autonomous vehicles require a comprehensive understanding of their static road environment and other users of the road (vehicles, pedestrians, etc.). Deep neural networks have shown great potential towards this goal, as deep learning methods are making breakthroughs in the tasks of object detection, recognition, and tracking. However, these networks require a large memory footprint and significant computational power. It has been a key challenge for our industry to design an embedded platform with sufficient compute power that can operate within the power and thermal constraints of operation in a vehicle.
Additionally, autonomous vehicles need sub-meter to centimeter-level localization accuracy, where they are relative to global coordinates, or where they are in the world represented by an HD map. While some of today’s sensors can provide that capability, they can cost hundreds of thousands of dollars and are too expensive to commercialize broadly.
Solving these two problems are critical if autonomous driving is to become reality. In response, Qualcomm Research developed a Drive Data Platform (DDP), enabling autonomous vehicles with a complete picture of their road environment concurrently with road conditions provided by a network of other vehicular “nodes” that constantly share information with each other. DDP is made possible through our significant advances in on-device machine learning and edge analytics, highly accurate positioning, and heterogeneous connectivity.
DDP hits the road in San Diego
DDP recently completed a major milestone in support of the 2017 Consumer Electronics Show (CES) in Las Vegas, successfully demonstrating machine learning and highly accurate autonomous vehicular positioning. During this exciting demonstration, three roving Qualcomm Research vehicles circled San Diego’s busy streets as they positioned themselves even in GPS-challenged environments.
Serving as moving sensor and data collection platforms, our vehicles detected landmarks such as road signs and lane markers using computer vision and machine learning methods and triangulated their position with respect to the vehicle’s 6-DOF pose, placing the landmarks in a 3D map frame. Each vehicle’s highly accurate positioning (6-DOF pose) was achieved by fusing GPS/GNSS with Visual-inertial Odometry (VIO), which leverages commercial grade sensors.
As the vehicles conducted multiple journeys throughout San Diego, they continuously uploaded their data in real time back to our proprietary cloud processing engine. This crowdsourcing of data was instrumental for generating and updating accurate 3D localization information for HD Maps.
Enabling massive edge analytics in a SoC
For DDP, we focus on edge analytics, allowing us to perform on-device machine learning inference and deliver an accurate understanding of the surrounding environment, in both a geometric and semantic sense. We are currently developing the framework and the infrastructure to provide optimized deep neural networks for autonomous driving in a solution that consumes less than 10 watts. We recently demonstrated this capability in October at ECCV 2016 (European Conference on Computer Vision) with our proof of concept Single Shot Multibox Detector (SSD), a cutting-edge deep neural network that was trained on 16 classes of objects. SSD was integrated on a Qualcomm Snapdragon 820-powered tablet consuming less than 6-8 watts. In live view, it analyzed street scenes in Amsterdam, classifying cars, bicycles, people, and other objects in a visual scene. This impressive demonstration proved that a very large and computationally complex deep network like SSD that is typically run on servers and in the cloud could be run online on a smartphone-class SoC (System on Chip). We envision SSD to be a critical cog in DDP, assisting vehicles with understanding all elements in a scene, thus improving safety. For example, should SSD image a pedestrian crossing the street, it could cue the vehicle to stop to avoid an accident.
In addition to using SSD for object detection, we are also developing Vertex SSD, a special variant of SSD to detect traffic sign vertices with pixel level accuracy which we also demonstrated at CES this year:
The output of Vertex SSD, coupled with the precise 6-DOF positioning of camera when the particular image/frame was captured, are the foundation to enable precise mapping of the landmarks.
Creating highly accurate and cost-effective positioning
To achieve highly accurate positioning, we have developed algorithms to tightly couple inputs from multiple standard vehicle component sensors (GPS/GNSS receiver, camera and inertial sensors), to build a cost-effective solution. The foundation of this technology is Visual-Inertial Odometry (VIO), which we’ve continuously perfected over the last four years.
VIO leverages the strength of a monocular camera and inertial sensors (accelerometer and gyrometer) and tightly fuses their inputs to generate the 6-DOF pose trajectory of the camera, with less than 1% of drift. Fusion is needed because monocular vision provides accurate constraints on motion but does not distinguish between different scales, while inertial sensors provide a direct, although noisy and biased, estimation of the scale of the motion. An Extended Kalman Filter (EKF)-based engine was developed to jointly estimate the 6-DOF pose of the camera, the location and depth of all key vision features, and the intrinsic bias and scale errors of the inertial sensors. Check out VIO in action:
We have also developed a tight fusion engine between the VIO engine and GPS/GNSS measurement engine. Although GPS/GNSS receivers provide good positioning fixes in open sky conditions, their performance degrades significantly in multiple path environments (e.g., urban canyons). To counter this, we leverage the good quality relative motion measurement from VIO to select or reject GPS satellite measurements. Specifically, we apply a robust outlier rejection algorithm, based on RANSAC, to track the pseudorange measurements from the satellites over hundreds of GPS epochs. Measurements which are not consistent with the most plausible conjecture of the position are rejected and the position is calculated with the remaining measurements, over a sliding window for fusion with VIO. In this way, we can recover highly accurate GPS positioning that can currently only be achieved in open sky conditions, even when the vehicle is traveling through urban canyons.
Establishing heterogeneous connectivity
Inter-vehicular access to car sensors and on-board inertial sensors will also be paramount for ensuring safe vehicular transportation in the years ahead. For example, as vehicle detects ice on a particular spot in the road, the precise location of this road hazard could be communicated to vehicles 500 meters away, alerting them to the danger, thus preventing a serious accident.
We began turning this concept into a reality by pioneering V2X (Vehicle-to-X), which extends the range of a vehicle’s sensors by enabling inter-vehicular communication. Building on our experience with V2X using Dedicated Short Range Communications (DSRC), based on 802.11p, we engineered C-V2X (Cellular V2X) which will reuse the upper layers of DSRC, while providing better performing lower layers that have an active evolution via 5G NR (New Radio). A few years ago, we created a device-to-device capability that enabled users to discover and interact with the world around them and we have evolved that technology, applying it to vehicular transit. C-V2X will support vehicular transits up to 500 km/hr (relative), as well as the high device densities expected in busy roads, as part of 3GPP Release 14. We are also developing future Release 15/16 enhancements which will utilize 5G NR to provide vehicular communications with additional optimized functionality suitable for high throughput and very low latency with high reliability use cases.
Want to learn more?