[PATCH 0/8] ARC updates to uClibc

Bernhard Reutner-Fischer rep.dot.nop at gmail.com
Wed Feb 18 08:03:35 UTC 2015


On February 18, 2015 6:51:17 AM GMT+01:00, Vineet Gupta <Vineet.Gupta1 at synopsys.com> wrote:
>On Monday 16 February 2015 08:34 PM, Bernhard Reutner-Fischer wrote:
>>> While it at I also did some arch specific adjustment in sigaction
>path
>>> >- inlining the rt_sigaction syscall stub detour to reduce branch
>return
>>> >stack mispredicts etc - which is what 6/8 does !
>> This sounds suspicious.
>> IIRC we already had that argument, last time around _dl_do_reloc and
>_dl_do_lazy_reloc.
>> Could it be that your port has a bug here ( missed optimisation )
>around ifunc handling? Sounds like back then on ARM
>https://gcc.gnu.org/PR40887#c6
>> 
>> What am I missing?
>
>
>I don't think my use-case is close to the ARM issue u pointed to above
>as there is
>no ifunc or function pointer involved.

I was more thinking about the relic functors.
Does GCC 5 produce identical code for ARC master way to explicit function calls compared to using a function pointer like suggested and used in all other ports?
If not then I'd consider this a bug.

>
>With orig code, we get 2 function calls on ARC:
>
>0000b504 <__libc_sigaction>:
>    b504:	push_s     blink
>    b506:	sub_s      sp,sp,12
>    b508:	bl.d       36b20 <__st_r13_to_r15>
>...
>
>    b540:	bl.d       b750 <__syscall_rt_sigaction>   <--- DIRECT CALL
>    b544:	mov_s      r3,8
>    b546:	add_s      sp,sp,20
>    b548:	mov_s      r12,12
>    b54a:	b          36b88 <__ld_r13_to_r15_ret>
>    b54e:	nop_s
>
>0000b750 <__syscall_rt_sigaction>:
>    b750:	mov        r8,134
>b754:	swi                                <---- SYSCALL TRAP INTO KERNEL
>    b758:	cmp        r0,0xfffffc00
>    b75c:	bls_s      b76a
>    b75e:	st.a       blink,[sp,-4]
>    b762:	bl         b550 <__syscall_error>
>    b766:	ld.ab      blink,[sp,4]
>    b76a:	j_s        [blink]
>
>The small function call is not necessarily good micro-architecturally
>when
>returning due to limited number of call return stack entries. That cost
>is
>amortized if function is largish.
>
>I do understand that these small syscall wrappers are a common uClibc
>design
>pattern and exist all over the place but given that this was all arch
>code I tool
>the liberty of removing the one hop and the code now looks as below:
>
>0000b4d8 <__libc_sigaction>:
>    b4d8:	st.a       gp,[sp,-4]
>    b4dc:	sub_s      sp,sp,20
>    b4de:	add        gp,pcl,0x00065284
>    b4e6:	breq_s     r1,0,b516
>    b4e8:	ld_s       r3,[r1,4]
>...
>    b516:	mov        r8,134
>    b51a:	mov_s      r3,8
>    b51c:	swi
>    b520:	cmp        r0,0xfffffc00
>    b524:	bls_s      b532
>    b526:	st.a       blink,[sp,-4]
>    b52a:	bl         b53c <__syscall_error>
>    b52e:	ld.ab      blink,[sp,4]
>    b532:	ld.a       gp,[sp,20]
>    b536:	j_s.d      [blink]
>    b538:	add_s      sp,sp,4
>    b53a:	nop_s

I would have assumed / hoped that GCC 5 should generate this 2nd variant for extern inline __syscall_rt_sigaction.

Doesn't it do that?

TIA



More information about the uClibc mailing list