[PATCH 0/8] ARC updates to uClibc
Bernhard Reutner-Fischer
rep.dot.nop at gmail.com
Wed Feb 18 08:03:35 UTC 2015
On February 18, 2015 6:51:17 AM GMT+01:00, Vineet Gupta <Vineet.Gupta1 at synopsys.com> wrote:
>On Monday 16 February 2015 08:34 PM, Bernhard Reutner-Fischer wrote:
>>> While it at I also did some arch specific adjustment in sigaction
>path
>>> >- inlining the rt_sigaction syscall stub detour to reduce branch
>return
>>> >stack mispredicts etc - which is what 6/8 does !
>> This sounds suspicious.
>> IIRC we already had that argument, last time around _dl_do_reloc and
>_dl_do_lazy_reloc.
>> Could it be that your port has a bug here ( missed optimisation )
>around ifunc handling? Sounds like back then on ARM
>https://gcc.gnu.org/PR40887#c6
>>
>> What am I missing?
>
>
>I don't think my use-case is close to the ARM issue u pointed to above
>as there is
>no ifunc or function pointer involved.
I was more thinking about the relic functors.
Does GCC 5 produce identical code for ARC master way to explicit function calls compared to using a function pointer like suggested and used in all other ports?
If not then I'd consider this a bug.
>
>With orig code, we get 2 function calls on ARC:
>
>0000b504 <__libc_sigaction>:
> b504: push_s blink
> b506: sub_s sp,sp,12
> b508: bl.d 36b20 <__st_r13_to_r15>
>...
>
> b540: bl.d b750 <__syscall_rt_sigaction> <--- DIRECT CALL
> b544: mov_s r3,8
> b546: add_s sp,sp,20
> b548: mov_s r12,12
> b54a: b 36b88 <__ld_r13_to_r15_ret>
> b54e: nop_s
>
>0000b750 <__syscall_rt_sigaction>:
> b750: mov r8,134
>b754: swi <---- SYSCALL TRAP INTO KERNEL
> b758: cmp r0,0xfffffc00
> b75c: bls_s b76a
> b75e: st.a blink,[sp,-4]
> b762: bl b550 <__syscall_error>
> b766: ld.ab blink,[sp,4]
> b76a: j_s [blink]
>
>The small function call is not necessarily good micro-architecturally
>when
>returning due to limited number of call return stack entries. That cost
>is
>amortized if function is largish.
>
>I do understand that these small syscall wrappers are a common uClibc
>design
>pattern and exist all over the place but given that this was all arch
>code I tool
>the liberty of removing the one hop and the code now looks as below:
>
>0000b4d8 <__libc_sigaction>:
> b4d8: st.a gp,[sp,-4]
> b4dc: sub_s sp,sp,20
> b4de: add gp,pcl,0x00065284
> b4e6: breq_s r1,0,b516
> b4e8: ld_s r3,[r1,4]
>...
> b516: mov r8,134
> b51a: mov_s r3,8
> b51c: swi
> b520: cmp r0,0xfffffc00
> b524: bls_s b532
> b526: st.a blink,[sp,-4]
> b52a: bl b53c <__syscall_error>
> b52e: ld.ab blink,[sp,4]
> b532: ld.a gp,[sp,20]
> b536: j_s.d [blink]
> b538: add_s sp,sp,4
> b53a: nop_s
I would have assumed / hoped that GCC 5 should generate this 2nd variant for extern inline __syscall_rt_sigaction.
Doesn't it do that?
TIA
More information about the uClibc
mailing list