“I’d like to program the DSP and move our app toward heterogeneous computing. How will that increase processing power, improve compute efficiency and reduce power consumption?”
Most of the questions I hear from software developers boil down to that one. In my previous post I described the work several developers are doing with Hexagon™ Vector eXtensions (HVX) in Hexagon SDK 3.0. They’re using the SDK for video, computational imaging, computer vision (CV), machine learning and always-on use cases that can run less efficiently on the CPU and contribute to battery drain.
In this post I’ll give you three reasons why they’re using SDK 3.0.
1. Increased processing power: Up to 1024 bits of data per clock cycle, simultaneous processing
The width of HVX – the number of bits it can process at a time – lets your app do more work in each clock cycle.
The standard mobile CPU is 32 bits wide and its computational coprocessor extends that by a factor of 4 to 128 bits wide. The basic Hexagon processor is 64 bits wide – double the normal width of a mobile CPU – and the HVX coprocessor extends that by a factor of 8 to 1024 bits wide. As shown in red in the diagram below, the Hexagon DSP can tie two 512-bit vector mode units together for 1024 bits of data in a single clock cycle. HVX uses Very Long Instruction Word (VLIW) to achieve its tremendous work-width. The result is that Hexagon with HVX has between 8 and 32 times higher performance than a standard mobile CPU.
HVX has another big advantage. On the mobile CPU, if you go to the coprocessor, you idle the scalar processor, so you get one or the other. But on Hexagon, the coprocessor and the scalar can execute simultaneously, so you get both.
Why is that important? Say you’re processing on the mobile CPU in control code mode and you switch to computational mode on the coprocessor. If you need any control code, you have to stop and go back from the coprocessor to the main CPU. With Hexagon, both the control code processor on the DSP and the computational code processor on HVX can run at the same time for tight coupling of control and computational code. That allows the DSP to take the result of an HVX computation and use it in a control code decision in the next clock cycle. Corephotonics and SMI take advantage of tightly coupled control and computational code on HVX in their applications.
2. Improved compute efficiency: Streaming that bypasses the CPU
Streaming mode on HVX is suited to very-low-power camera processing. By streaming data directly from the camera to the Hexagon DSP local memory (L2 Cache), HVX makes possible new use cases beyond photography to computer vision (CV), computational imaging and machine learning.
HVX bypasses the DDR memory controller and external memory for camera processing functions that can be supported with the existing L2 Cache size. That is designed to reduce latency, battery spend and CPU/DDR access requirements for camera and vision processing algorithms.
In the example of camera streaming, say your app needs to move data from the camera sensor to a processor in the device. Without HVX, the app reads that data into DDR memory, then the CPU reads DDR to perform calculations on it, consuming tens of milliwatts of power and incurring a round-trip delay to DDR memory before any real processing has even begun. With Hexagon SDK 3.0 you can process the stream directly on HVX in the Hexagon DSP, eliminate the interim storage and transfer steps, and leave the CPU out altogether.
In fact, the CPU and other elements on the Qualcomm Snapdragon processor is designed to remain idle through the entire operation. Normally, the entire chip would consume much more power when other elements are utilized, but why involve them when all you need is the DSP?
By running the data straight into the DSP, HVX makes it possible to process the data locally, minimizing overall power consumption. That’s especially useful in apps that combine vision and machine learning, which can work in isolation with data right off the camera. If machine learning on the DSP recognizes an event that requires additional processing, HVX can trigger other elements to deploy on the chip.
3. Reduced power consumption: Always-on senses
Smartphones get smarter and are more useful when they’re always alert. A big part of staying alert is always-on senses, which are tied to the DSP:
- Always-on voice – The device is powered off and waiting to recognize your voice or a keyword when you address it.
- Always-on touch – The device monitors the screen in the background, then wakes up when you touch it.
- Always-on sensors – A sensor hub is connected to the compass, accelerometer and other sensors, waiting for an event that requires additional processing.
- Always-on vision – The camera streams into the device and triggers processing only on threshold events.
HVX solves two lingering problems in the world of always-on senses. First, staying alert all the time takes a lot of battery life. But HVX running on the DSP is designed to require minimal power for itself and to not need processing on other elements to keep the senses always on. Second, vision has been the most elusive of the always-on senses because the data stream from the camera goes through the main chip. HVX, however, is designed to offer a direct path from the camera to the DSP. Like the other senses, always-on vision with HVX is designed to involve the rest of the chip only when additional processing is needed.
I’ve posted before about the high performance and low power consumption you get when you program the Hexagon DSP. Give it a try for virtually always-on senses.
You can start working with HVX right now, because it’s available in commercial devices powered by the Snapdragon 820, including the OnePlus 3, HTC 10, Samsung Galaxy S7 and LG G5. Several manufacturers rely on HVX to differentiate the camera and video features of their devices.
The Hexagon SDK is available for Windows and Linux. Download Hexagon SDK 3.0 to see how you can increase processing power, improve compute efficiency and reduce battery consumption in your own apps.