Quick and dirty malloc() support for realpath.

Mon Oct 26 21:34:26 UTC 2009

On Monday 26 October 2009 07:20:23 Mike Frysinger wrote:
> On Sunday 25 October 2009 15:19:49 Rob Landley wrote:
> > -	int readlinks = 0;
> > +	int readlinks = 0, allocated = 0;
> > ...
> > +	if (!got_path) {
> > +		got_path = alloca(PATH_MAX);
> > +		allocated++;
> > +	}
> > ...
> > +	if (allocated) got_path = strdup(got_path);
>
> it doesnt make any sense to treat "allocated" as an integer that gets
> incremented.  you're pointlessly forcing gcc to generate load/update/store
> instructions when it only needs a store instruction.  i.e. use stdbool like
> evolution intended.

I did that because instruction sets that have an increment instruction can 
produce smaller code by avoiding the 32 bit constant, and on something like 
arm using a variable smaller than integer size can produce significantly 
_larger_ code due to the masking and shifting the compiler generates to fake 
the smaller sizes it hasn't got instructions for.  Also, in my experience 
_Bool is about as real-world useful as the bit field notation with the colons, 
and is really there to keep the language pedants and the c++ guys happy 
without actually accomplishing much.  I've never seen it actually produce 
better code.

But by all means let's test it:

gcc -v 2>&1 | tail -n 1
gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)

cat > hello.c << EOF
#include <stdio.h>

int main(int argc, char *argv[])
{
  int allocated=0;

  if (argc==2) allocated++;

  printf("allocated=%d\n", allocated);
}
EOF

gcc -Os -s hello.c
objdump -d a.out

000000000040052c <main>:
  40052c:	31 d2                	xor    %edx,%edx
  40052e:	83 ff 02             	cmp    $0x2,%edi
  400531:	be 3c 06 40 00       	mov    $0x40063c,%esi
  400536:	0f 94 c2             	sete   %dl
  400539:	bf 01 00 00 00       	mov    $0x1,%edi
  40053e:	31 c0                	xor    %eax,%eax
  400540:	e9 db fe ff ff       	jmpq   400420 <__printf_chk at plt>

And the optimizer's constant propogation expanded it to use the constant 
assignment anyway.  But just to be sure let's switch to allocated=1 and...

000000000040052c <main>:
  40052c:	31 d2                	xor    %edx,%edx
  40052e:	83 ff 02             	cmp    $0x2,%edi
  400531:	be 3c 06 40 00       	mov    $0x40063c,%esi
  400536:	0f 94 c2             	sete   %dl
  400539:	bf 01 00 00 00       	mov    $0x1,%edi
  40053e:	31 c0                	xor    %eax,%eax
  400540:	e9 db fe ff ff       	jmpq   400420 <__printf_chk at plt>

Yup, the optimizer is actually coercing the two into producing identical code 
on x86-64, which isn't particularly surprising. 

Let's change the variable type to _Bool and...

000000000040052c <main>:
  40052c:	31 d2                	xor    %edx,%edx
  40052e:	83 ff 02             	cmp    $0x2,%edi
  400531:	be 3c 06 40 00       	mov    $0x40063c,%esi
  400536:	0f 94 c2             	sete   %dl
  400539:	bf 01 00 00 00       	mov    $0x1,%edi
  40053e:	31 c0                	xor    %eax,%eax
  400540:	e9 db fe ff ff       	jmpq   400420 <__printf_chk at plt>

Again, exactly the same code.

Now let's try arm, I've got a gcc 4.2.1 for armv5l lying around:

With the increment:
000083f8 <main>:
    83f8:	e3500002 	cmp	r0, #2	; 0x2
    83fc:	13a01000 	movne	r1, #0	; 0x0
    8400:	03a01001 	moveq	r1, #1	; 0x1
    8404:	e59f0000 	ldr	r0, [pc, #0]	; 840c <.text+0xe0>
    8408:	eaffffbe 	b	8308 <.text-0x24>
    840c:	00008420 	andeq	r8, r0, r0, lsr #8

With the integer assignment=1:

000083f8 <main>:
    83f8:	e3500002 	cmp	r0, #2	; 0x2
    83fc:	13a01000 	movne	r1, #0	; 0x0
    8400:	03a01001 	moveq	r1, #1	; 0x1
    8404:	e59f0000 	ldr	r0, [pc, #0]	; 840c <.text+0xe0>
    8408:	eaffffbe 	b	8308 <.text-0x24>
    840c:	00008420 	andeq	r8, r0, r0, lsr #8

And with the use of _Bool:

000083f8 <main>:
    83f8:	e3500002 	cmp	r0, #2	; 0x2
    83fc:	13a01000 	movne	r1, #0	; 0x0
    8400:	03a01001 	moveq	r1, #1	; 0x1
    8404:	e59f0000 	ldr	r0, [pc, #0]	; 840c <.text+0xe0>
    8408:	eaffffbe 	b	8308 <.text-0x24>
    840c:	00008420 	andeq	r8, r0, r0, lsr #8

It's really looking like gcc's optimizer is doing a whole lot of "not caring" 
about the difference in this instance.  The constant propogation is dropping 
the distinction so it's actually _less_ optimized than using the INC 
instruction (on x86, anyway), but oh well.

But I can change it if it makes you happy.

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds