[PATCH] Improved strlen for ARM, around 29% faster

Thu Oct 4 00:14:04 UTC 2012

On Wed, Oct 03, 2012 at 08:38:04PM +0200, Gabriel Gonzalez wrote:
> Hi Rich,
> 
>   You replied before I was able to run the test but, yeah, you are
> right, my algorithm currently does the test word-based after it hits
> an aligned address, using a 3 instruction check to look for the null
> character.
> 
>   As requested I attach a plot of muslstrlen vs mystrlen, as you
> stated my ASM version outperforms the C version.

Thanks. It's a shame GCC can't get things like this right, because we
really shouldn't have to be writing per-arch asm when the desired asm
is _identical_ on each arch except for the mnemonic and register
names. I wish GCC's optimizer had the ability to detect loops with <N
arithmetic operations and basically brute-force search for a way to do
the same thing with less register shuffling and nonsensical overhead
inside the loop. On all but the tightest loops, it doesn't seem to
matter; their inefficient register shuffling gives 95% of better
performance compared to hand-written asm. But on loops with really
short bodies like this, the performance really suffers.

Rich