Why Arm’s new Helium technology matters

Dr Dominic Binks, VP Technology at Audio Analytic. writes: We are often asked if we accelerate using NEON on Arm. The answer we give is perhaps seen as odd, but for us, optimizing for NEON is a rather academic exercise...

Our lightweight code performs fixed-point calculations, which optimises for the trade-offs between space, time, value range and ensuring cross-platform consistency. The main driver for this is that our main target processors are Cortex-M class processors operating at ~50MHz with a few hundred kilobytes of RAM. Optimizing for a Cortex-A class processor that normally operates above 300MHz, often has 1Gbyte of RAM or more and is typically superscalar is somewhat redundant. Obtaining a 10% improvement on a Cortex-M4, for example, will transfer to a NEON class processor but means we need only maintain a single code base and test requirements are simplified.

Choosing fixed point code gives us maximal cross-platform consistency and better control over memory usage but comes at a small price: simple arithmetic operations are made more complex by the need to shift according to the required dynamic range of the calculation.

With Helium, Arm’s recently announced lightweight M-Profile vector extensions (MVE) for the Armv8.1-M architecture, the game changes. If optimizing for NEON is largely an academic exercise, exploiting MVE is far from it. With Helium, next-generation Cortex-M processors gain the vectorising capabilities of NEON-class processors as well as a few other abilities that deliver tangible improvements in performance, and cost, in the microcontroller space.

Helium builds upon NEON’s capabilities. Optimising a few selected routines can quickly yield a 50% or greater improvement in execution speed, but these vector extensions offer other advantages too.

Helium offers another valuable trade-off with regards to memory usage control: half-precision floating point support. With our code seeing a 50% speed-up by performing operations in batches of four, half-precision would widen those batches to eight and avoid some of the extra steps involved in shifting fixed point values around while maintaining a more compact data representation at an acceptable accuracy trade-off.

Helium also helps with the hassles dealing with the left-overs: vectors that are not direct multiples of the number of available lanes mean special processing steps for the remainders. There are several techniques for dealing with this, but Helium offers streamlined loops to help resolve this problem. Not necessarily a huge performance gain, but delivers simpler, smaller code.

Optimizing for Helium brings the goal of true wireless battery-powered devices to reality. For example, when combined with the piezoelectric MEMS microphone from Vesper, which can boot the processor on reaching a suitable sound level, a Helium-enabled processor can then quickly analyse the observed sound; identifying whether it is a known sound before returning to sleep. It will be able to do this at a lower power level than comparable systems and potentially eliminates the need for a DSP in the system, thus simplifying the architecture and reducing the BOM (Bill of Materials).

Helium matters. Edge-based machine learning, also known as tinyML, offers OEMs major benefits, such as reducing the cost of cloud computing. It also appeals to consumers, as processing is done without data leaving the device. Arm’s Helium technology means that you can, as my colleague Chris calls it ‘supercharge AI at even lower-power and lower-cost’.

Why Arm’s new Helium technology matters

Looking for something specific?