Application porting to Windows on Snapdragon, using the sse2neon header file (Part 1 of 2)
Have you been working on porting your applications from x86 to Windows on Snapdragon? What kinds of challenges do you face when you start porting your apps? For one thing, x86-specific libraries don’t port directly to the Windows on Snapdragon platform. In those cases, you will need to look at alternatives. In this two-part series, we’ll examine one such alternative: sse2neon.h.
In this post we’ll take a look at the x86 instruction set and see how it compares with a similar instruction set that works on Windows on Snapdragon platforms. In the next post we’ll see how you can implement the sse2neon header file to port an app from x86 to Windows on Snapdragon .
What is Intel SSE (Streaming SIMD Extensions)?
Intel SSE (Streaming SIMD Extensions) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture. SSE instructions work on single-precision, floating-point data. Intel expanded SSE to SSE2, SSE3, SSSE3 and SSE4.
Function calls starting with _mm, like _mm_add_ps() and _mm_storeu_ps(), are Intel SSE intrinsics. Typical Intel SSE applications are digital signal processing and graphics processing.
What is NEON?
The Windows on Snapdragon platform is based on a Reduced Instruction Set Computer (RISC) architecture. NEON is the implementation of Advanced SIMD architecture for Windows on Snapdragon platform.
The Windows on Snapdragon platform supports the 64-bit architecture and 32-bit architecture execution states. Both support SIMD and floating-point instructions.
The NEON intrinsics are defined in the applicable header file. Typical NEON applications are multimedia and signal processing, 3D graphics, speech, image processing and other applications where fixed and floating-point performance is important.
What are intrinsics?
Intrinsic functions (also called built-in functions in compiler theory) are those whose implementation is handled by the compiler.
Intrinsics look like a function call, but they don’t require an actual function call. When the compiler encounters intrinsics, it substitutes a sequence of automatically generated instructions, like an inline function. Intrinsics are used to access SIMD instructions directly from C/C++ for improved application performance.
Intrinsics are easier to use than assembly language. When a program requests optimization, the compiler that implements intrinsic functions may enable the optimization; otherwise, a default implementation provided by the language runtime system is used. When code is expressed as intrinsics instead of raw assembly, the intrinsics let the compiler assist the programmer. The compiler is responsible for controlling register allocation and negotiating call conventions when traversing function call boundaries. It can also optimize the generated code.
For more on Intel SSE intrinsics, see the Intel Intrinsics Guide; for NEON intrinsics, see Intrinsics – Developer Site.
What is the difference between Intel SSE and NEON?
Intel SSE code is usually run in desktop and server environments, whereas NEON code runs on edge devices and mobile hardware.
The following code adds two 128-bit vectors using Intel SSE intrinsics:
#include <xmmintrin.h>
__m128 add(__m128 c, __m128 d)
{
return _mm_add_ps(c, d);
}The same function written with NEON intrinsics looks like this:
#include <header file name>
float32x4_t add(float32x4_t c, float32x4_t d)
{
return vaddq_f32(a, b);
}One difference between Intel SSE intrinsics and NEON intrinsics is the input arguments and output result. Intel SSE uses __m128 type and NEON uses float32x4_. SSE types describe the width of the entire vector register (128); NEON types describe the width of each component (32) and the component count (4).
Another difference between Intel SSE and NEON types is the treatment of unsigned quantities. SSE offers only one register (__m128i) to store four 32-bit signed and unsigned integers. NEON encodes the signed nature of the data in the type itself by offering register types like uint32x4_ and int32x4_t.
What are the benefits of the sse2neon header?
If you port an application from Intel SSE to NEON using emulation mode, your application will suffer from reduced performance and poor thermal efficiency compared to a native port. The sse2neon header addresses those problems.
When porting an application, you can manually replace SSE intrinsics with NEON intrinsics, but it’s time-consuming. While it’s suitable for short, isolated snippets of code, you can port much faster by using the sse2neon header.
What's next?
Read part 2 of this series to see how you can use sse2neon.h to port your applications from x86 to Windows on Snapdragon more effectively and quickly.
Visit Windows on Snapdragon portal and browse documentation today to get started.
Want to ask questions or provide your feedback? Bring your suggestions and get prompt support from your technical team on Developer Discord.
