Practical approach to Arm Neon Optimization

Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

Practical approach to Arm Neon Optimization

Ignitarium

@Ignitarium

September 30, 2021

Smart Mobility AI

1851

Arm Neon was introduced to improve multimedia encoding/decoding, UI, graphics and gaming related features running on mobile devices. Over the years, it has been used to accelerate signal processing algorithms and functions, to speed up not only the multimedia audio and video applications but foray into deep learning and AI related applications such as voice recognition, facial recognition and computer vision. Neon Architecture provides flexibility, support for various data types and is open-sourced, one of the key reasons that applications are able to capitalize on these features.

This blog is presented as a guide with practical points about the Arm Neon architecture that every software developer can use.

Arm Neon Architecture across versions

Even though the concept of the Neon is the same across multiple versions, there is a difference in how it is implemented. The major differences between Neon architecture in the Armv7 (32bit) and Armv8 (64bit) are as follows:

In Armv7, there are Q0-Q15, 128-bit registers, which are also accessible as D0-D31 64-bit registers or S0-S31 32-bit registers. Where as in the case of Armv8 there are 32X128-bit registers, which are named as V0 to V31. But there is no longer D0-D63. There is only D0 to D31 and S0 to S31. This is because the double word or VFP registers are placed in the lower part of the quad word V0-V31 registers. So, while instructions operate on the lower half, they automatically zero the upper part of the V register.

Since the arrangement of the registers is changed in Armv8, there are changes in the permutation instructions (reordering data, such as zip, unzip or transpose), which work differently. The permutation instruction works in-place in Armv7, but in Armv8 they require a target register.

Because of the above-mentioned architecture changes, the Armv8 assembly codes are not backward compatible. Developers need to reanalyze, often resulting in an entirely different solution.

Arm Neon Development Support

Arm and its community provide different possibilities for a developer to make use of the Neon technologies:

Auto-vectorization feature by compiler: Arm compilers have the capability to generate the optimized SIMD code to take advantage of Neon. In case of design time and cost, this feature is an advantage for the developer. But, in case of complex algorithms the compiler generated code won’t be optimized enough. In such cases developers have to look through intrinsics of hand tuned assemblies. The auto-vectorization includes:

Loop vectorization: unrolling loops to reduce the number of iterations, while performing more operations in each iteration.

Superword-Level Parallelism (SLP) vectorization: bundling scalar operations together to make use of full width Advanced SIMD instructions. In Armv8 architecture, Neon is enabled by default, the developer can specify Neon-capable target to target Armv8 AArch64 or specify cpu in Cortex‑A53 in AArch32 state.

In Armv7 architecture, Neon is optional. Developers can enable the Neon module using the compiler options such as -mcpu, -march and -mfpu . And auto-vectorization is enabled by default at higher optimization levels ( -O2 and higher). And -fno-vectorize settings help to disable auto-vectorization. At optimization level -O1, auto-vectorization is disabled by default. -fvectorize option lets you enable auto-vectorization. At optimization level -O0, auto-vectorization is always disabled. If you specify the -fvectorize option, the compiler ignores it.

Neon intrinsics: Neon intrinsics enables a mechanism for the developer to implement better optimized codes for the Neon architecture than the compiler generated ones, which will be an advantage to the developer, since he knows the application better than the compiler. You can refer to the arm_neon.h header file for the Neon intrinsics that are a set of C and C++ functions supported by the Arm compilers and GCC. These intrinsics accelerate development by providing similar freedom as the Neon assembly instruction and the compiler takes care of the allocation of registers. At the compilation stage, Neon intrinsics are replaced by appropriate Neon instruction or sequence of Neon instructions. An example for Neon intrinsics is as follows:

Hand-coded Neon assembler: As an experienced program developer, you can make use of assembly instructions, to generate better optimized codes when the performance is critical. In some regions of the algorithm, you can use both Arm and Neon instructions in parallel for independent operations. A Neon assembly program looks like:

Neon-enabled libraries: Arm and its community offers open-source libraries that already utilize Neon and developers can directly plug these libraries in their development environment. Few such libraries are:
- Arm Compute Library: This Library is a collection of low-level functions optimized for Arm CPU and GPU architectures targeted at image processing, computer vision, and machine learning.
- Ne10: Open-source C library, hosted on GitHub by Arm, are common processing intensive functions heavily optimized for Arm. Ne10 is a modular structure consisting of several smaller libraries.
- Libyuv: Open-source project that includes YUV scaling and conversion functionality.
- Skia: Open-source 2D graphics library used as the graphics engine for web browsers and operating systems.

Arm vs Neon performance improvements

From various examples, developers have proved that it is possible to achieve very good optimization in performance using the Neon SIMD instruction set. But the level of optimization achieved purely depends on your code, how much vectorization is possible. Eg. In cases of IIR filters there is a dependency of previously calculated output samples, in such cases Neon won’t be able to provide the amount of improvement compared to filters like FIR, where pure vectorization can be applied.

The following examples detail the level of optimization that can be achieved with Neon instructions, when compared to Arm instructions:

In complex video codec (mpeg4) processing, Neon provides 1.6 – 2.5 times performance boost over Arm11.
In Audio processing (AAC, voice recognition algorithms) FFT, Neon provides (3.8 us) 4 times performance boost over Arm11(15.2us).
In the ffmpeg FFT, Neon provides a 12 times performance boost over Arm11

Following example demonstrates the performance improvement that can be achieved by Neon. Let us consider, simple array multiply and accumulate program ( c[i] = c[i]+ a[i] * b[i] )

The handwritten Arm11 assembly code for the above C program is given below:

The handwritten Neon assembly code for the above C program is given below:

Now let us calculate the cycles for both Arm and Neon codes, considering loop count is 256:

Operation	Arm assembly (cycles )	Neon SIMD (cycles)
Load and store	(256/2)*(2+2+2+2)=1024	(256/8)*(2+2+2+2)=256
Multiply and Accumulate	(256/2)*(3+1+1+1+3+1+1+1)= 1536	(256/8)*(2+1+1+2+1+1)=256
Branch operations	(255/2)*1+4=131	(255/8 )*1+4=35
Total	2691	547

This example shows Neon can improve the performance of your program more that 70% compared to Arm assembly code.

This blog originally appeared on Ignitarium.com's Blog Page.

#armneon #multimedia #armarchitecture #mobiledevices #armsoftware

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

Ignitarium

How software-defined vehicles are revolutionising commercial vehicles —and why you should care

Tata Technolo..

@tatatechnologies

25 Nov 2024

Engineering Research & Design Analytics Function Data Privacy Threat Intelligence AI Inside 5G Smart Mobility Tech for Good Emerging Tech AI Machine Learning Industry Trends

Software-defined vehicles (SDVs) are already creating ripples in the automotive industry, especially among passenger vehicles. But what about the commercial vehicle (CV) sector? For an industry fueled by tight margins and focused on total cost…

Enhancing Public Safety with Smart Security and Surveillance Solutions

Epsum Labs

@saiprasad16

01 Oct 2024

Smart Mobility Smartcities

Enhancing Public Safety with Smart Security and Surveillance Solutions In an era where security and safety are paramount concerns for cities and communities, smart security and surveillance solutions are revolutionizing the way we protect our…

Treading on the Tech Track: 5 Key Modernization Initiatives by Railways in India

Kuhu Singh

@Kuhu

11 Sep 2024

Digital Transformation Smart Mobility

At a time when ushering in modernization using cutting-edge technologies is a key imperative across organizations, the railways system in India is not far behind. Technologies like artificial intelligence (AI), cloud computing, radio frequency…

Enhancing Public Safety with Smart Security and Surveillance Solutions

Epsum Labs

@saiprasad16

11 Sep 2024

Smartcities Smart Mobility

In an era where security and safety are paramount concerns for cities and communities, smart security and surveillance solutions are revolutionizing the way we protect our public spaces. From intelligent video analytics to facial recognition…

Depth Estimation in Off-Road Vehicles with ADAS

Cyient

@cyient_

25 Aug 2024

Smart Mobility

Off-road vehicles increasingly feature Advanced Driver Assistance Systems (ADAS). Several ADAS technologies, often adopted from conventional road cars, are making their way into off-road vehicles to improve their performance, safety, and efficiency…

How Much AI Is Enough for Civil Aviation?

CSM Tech

@csmtechnologies

13 Aug 2024

AI Smart Mobility

While the civil aviation industry in the United States continues to prosper, it is not immune to its challenges. U.S. airlines have faced substantial repercussions from significant historical events commencing with the onset of the COVID-19 pandemic…

Topics In Demand

Notification

New

Practical approach to Arm Neon Optimization

Arm Neon Architecture across versions

Arm Neon Development Support

Arm vs Neon performance improvements

Share this blog

Related blogs