Quick and dirty malloc() support for realpath.
Rob Landley
rob at landley.net
Mon Oct 26 21:34:26 UTC 2009
On Monday 26 October 2009 07:20:23 Mike Frysinger wrote:
> On Sunday 25 October 2009 15:19:49 Rob Landley wrote:
> > - int readlinks = 0;
> > + int readlinks = 0, allocated = 0;
> > ...
> > + if (!got_path) {
> > + got_path = alloca(PATH_MAX);
> > + allocated++;
> > + }
> > ...
> > + if (allocated) got_path = strdup(got_path);
>
> it doesnt make any sense to treat "allocated" as an integer that gets
> incremented. you're pointlessly forcing gcc to generate load/update/store
> instructions when it only needs a store instruction. i.e. use stdbool like
> evolution intended.
I did that because instruction sets that have an increment instruction can
produce smaller code by avoiding the 32 bit constant, and on something like
arm using a variable smaller than integer size can produce significantly
_larger_ code due to the masking and shifting the compiler generates to fake
the smaller sizes it hasn't got instructions for. Also, in my experience
_Bool is about as real-world useful as the bit field notation with the colons,
and is really there to keep the language pedants and the c++ guys happy
without actually accomplishing much. I've never seen it actually produce
better code.
But by all means let's test it:
gcc -v 2>&1 | tail -n 1
gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
cat > hello.c << EOF
#include <stdio.h>
int main(int argc, char *argv[])
{
int allocated=0;
if (argc==2) allocated++;
printf("allocated=%d\n", allocated);
}
EOF
gcc -Os -s hello.c
objdump -d a.out
000000000040052c <main>:
40052c: 31 d2 xor %edx,%edx
40052e: 83 ff 02 cmp $0x2,%edi
400531: be 3c 06 40 00 mov $0x40063c,%esi
400536: 0f 94 c2 sete %dl
400539: bf 01 00 00 00 mov $0x1,%edi
40053e: 31 c0 xor %eax,%eax
400540: e9 db fe ff ff jmpq 400420 <__printf_chk at plt>
And the optimizer's constant propogation expanded it to use the constant
assignment anyway. But just to be sure let's switch to allocated=1 and...
000000000040052c <main>:
40052c: 31 d2 xor %edx,%edx
40052e: 83 ff 02 cmp $0x2,%edi
400531: be 3c 06 40 00 mov $0x40063c,%esi
400536: 0f 94 c2 sete %dl
400539: bf 01 00 00 00 mov $0x1,%edi
40053e: 31 c0 xor %eax,%eax
400540: e9 db fe ff ff jmpq 400420 <__printf_chk at plt>
Yup, the optimizer is actually coercing the two into producing identical code
on x86-64, which isn't particularly surprising.
Let's change the variable type to _Bool and...
000000000040052c <main>:
40052c: 31 d2 xor %edx,%edx
40052e: 83 ff 02 cmp $0x2,%edi
400531: be 3c 06 40 00 mov $0x40063c,%esi
400536: 0f 94 c2 sete %dl
400539: bf 01 00 00 00 mov $0x1,%edi
40053e: 31 c0 xor %eax,%eax
400540: e9 db fe ff ff jmpq 400420 <__printf_chk at plt>
Again, exactly the same code.
Now let's try arm, I've got a gcc 4.2.1 for armv5l lying around:
With the increment:
000083f8 <main>:
83f8: e3500002 cmp r0, #2 ; 0x2
83fc: 13a01000 movne r1, #0 ; 0x0
8400: 03a01001 moveq r1, #1 ; 0x1
8404: e59f0000 ldr r0, [pc, #0] ; 840c <.text+0xe0>
8408: eaffffbe b 8308 <.text-0x24>
840c: 00008420 andeq r8, r0, r0, lsr #8
With the integer assignment=1:
000083f8 <main>:
83f8: e3500002 cmp r0, #2 ; 0x2
83fc: 13a01000 movne r1, #0 ; 0x0
8400: 03a01001 moveq r1, #1 ; 0x1
8404: e59f0000 ldr r0, [pc, #0] ; 840c <.text+0xe0>
8408: eaffffbe b 8308 <.text-0x24>
840c: 00008420 andeq r8, r0, r0, lsr #8
And with the use of _Bool:
000083f8 <main>:
83f8: e3500002 cmp r0, #2 ; 0x2
83fc: 13a01000 movne r1, #0 ; 0x0
8400: 03a01001 moveq r1, #1 ; 0x1
8404: e59f0000 ldr r0, [pc, #0] ; 840c <.text+0xe0>
8408: eaffffbe b 8308 <.text-0x24>
840c: 00008420 andeq r8, r0, r0, lsr #8
It's really looking like gcc's optimizer is doing a whole lot of "not caring"
about the difference in this instance. The constant propogation is dropping
the distinction so it's actually _less_ optimized than using the INC
instruction (on x86, anyway), but oh well.
But I can change it if it makes you happy.
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
More information about the uClibc
mailing list