If you’re looking to create great mobile experiences, optimization isn’t optional: it’s a crucial step that helps transform good ideas into great execution. In our previous “Start Cooking with Heterogeneous Computing Tools on QDN” blog, we discussed the concept of heterogeneous computing and how it can help you get more from mobile hardware by sending computational tasks to the best suited processor. Heterogeneous computing is designed to help you achieve better application performance while improving thermal and power efficiency.
However, not all systems capable of heterogeneous computing are created equal and it’s important to understand why. Heterogeneous computing is both a computational technique and a hardware architecture. To achieve greater benefits, you are better served with hardware architected for heterogeneous computing from the ground up along with a software stack that facilitates heterogeneous computing techniques. It’s the combination of purpose-built hardware and a software stack offering granular control within a larger framework of system abstraction that allows for the deep optimizations that heterogeneous computing can deliver.
The Qualcomm Snapdragon Mobile Platform is designed on these principles. This starts with the microarchitecture – the choices made in platform circuitry that include how individual processors are engineered for high performance and how to optimize compute paths between the processors. Let’s look at the main components of the Snapdragon mobile platform and a few of the microarchitecture considerations that went into its design:
Qualcomm Kryo 280 CPU
Designed to handle complex workloads like web browsing and in-game artificial intelligence, the Kryo 280 features an octa-core processor with independent high efficiency and high performance core clusters. During normal operation, the high efficiency cores run most tasks while the high-performance cores activate for anything needing more power.
Qualcomm Hexagon 682 DSP
With the Hexagon wide Vector eXtensions (HVX), the Hexagon DSP excels at applications requiring heavy vector data processing, such as 6-DOF (or Degrees of Freedom) head motion tracking for virtual reality, image processing, and neural network computations. With a 1024-bit instruction word capability and dual execution of the control code processor and the computational code processor within the DSP, Hexagon can achieve breakthrough performance without draining system power.
Qualcomm Adreno 540 GPU
Ideal for arithmetic-heavy workloads that require substantial, parallel number crunching like 3D graphics rendering and camcorder image stabilization, the Adreno GPU is engineered to achieve improved power efficiency and 40% better performance than predecessors. Designed to deliver up to 25% faster graphics rendering and 60x more display colors compared to previous designs, the Adreno GPU supports real-life-quality visuals, and can perform stunning visual display feats like stitching together 4K 360 video in real time.
Heterogeneous computing in microarchitecture design
Beyond the performance enhancements among the individual processors, the Snapdragon mobile platform was designed to optimize the use of the processors together. For example, the Hexagon DSP can bypass DDR memory and the associated data shuffling CPU cycles by streaming data directly from sensors to the DSP cache. Similarly, the Adreno GPU supports 64-bit virtual addressing, allowing for shared virtual memory (SVM) and efficient co-processing with the Kryo CPU. These are just two of the microarchitecture design choices in the Snapdragon mobile platform that make it cutting-edge for heterogeneous computing.
As we noted at the beginning of this post, heterogeneous computing is also a technique. And to truly support heterogeneous computing requires a software stack that provides developers the abstractions and the control to leverage the optimizations in the hardware per the requirements of their application.
To program the DSP or the GPU for heterogeneous computation, and to maximize their performance, developers can use the Qualcomm Hexagon SDK and the Qualcomm Adreno SDK, respectively. These SDKs open a toolbox of controls allowing for precision manipulation of data and computational resources.
For system-wide heterogeneous computing control, Qualcomm Symphony system manager SDK provides the software utilities designed to achieve better performance and lower power consumption from the Snapdragon mobile platform. Symphony is designed to manage the entire platform in different configurations so that the most efficient and effective combination of processors and specialized cores are chosen to get the job done as quickly as possible, with minimal power consumption.
On top of these SDKs it is possible for developers to build their applications directly – many developers opt for this route. However, there is a growing ecosystem of SDKs, frameworks and supporting libraries for accelerating development within a given application domain. Two examples of this are QDN's Adreno SDK for Vulkan for the Vulkan graphics API and our recently released Snapdragon VR SDK.
How to Put Heterogeneous Computing Techniques into Practice with Tools from Qualcomm Developer Network