[PATCH] Improved strlen for ARM, around 29% faster
dalias at aerifal.cx
Wed Oct 3 17:54:23 UTC 2012
On Sat, Sep 29, 2012 at 02:48:48AM +0200, Gabriel Gonzalez wrote:
> This version for ARM improves performance mainly unrolling the loop for iterations
> and reducing the instructions need to look for the null character.
> A deeper analysis of this can be found at http://www.gabrielgonzalezgarcia.com/2012/10/02/mystrlen-vs-android-bionics-strlen-on-arm-cpu/
> where you can find some data which back up the performance improvement.
> I have only tested it on a little endian CPU so the BIG ENDIAN chunk might need some testing
I suspect this code is still considerably slower than the good C
implementation, which looks something like:
(glibc uses a similar C implementation, but theirs seems to have a bug
whereby it drops out of the fast loop whenever it hits high bytes.)
Basically, the ideas is that you can test all bytes of a machine word
for a null byte in parallel rather than branching on each byte. I
don't have access to real arm hardware to test it on (just qemu) so
I'd be happy to hear which is actually faster.
More information about the uClibc