-fsigned-char on all arches

Will Newton will.newton at gmail.com
Fri Apr 13 19:14:25 UTC 2007


On 4/13/07, Rob Landley <rob at landley.net> wrote:

> > I believe this ambiguity in the definition of "char" was intended to
> > allow the implementation to choose the most efficient implementation
> > of the datatype. Pulling -fsigned-char on all architectures disallows
> > this efficiency.
>
> And back in the 1970's when that was relevant, "char" wasn't guaranteed to be
> 8 bits either.  On some platforms it was 12 bits.  These days, such crappy
> platforms have gone the way of the dodo.

It's still relevant today how char is defined. It's relevant to me and
it's relevant to other people who care about code size, performance
and correctness.

> You could make the exact same argument for short, int, and long.  And yet
> those data types don't have a built-in indeterminacy.  Instead for current
> 32-bit and 64-bit systems we actually have a standard
> (http://www.unix.org/whitepapers/64bit.html with rationale
> http://www.unix.org/version2/whatsnew/lp64_wp.html) specifying how many bits
> each type has.  This is because most modern chips aren't crap, and can
> actually support all this reasonably.

Yes, this is irrelevant for non-char types. What are you talking about?

> > Is there a reason why signed char has been made the default rather
> > than unsigned char? Because i386 is signed? Almost all embedded (i.e.
> > modern RISC type) architectures default to unsigned chars. ARM saves
> > approx 1k of text size on libuClibc.so if built without -fsigned-char.
>
> I don't know, ask Mike.  As I said, I chose unsigned for busybox.  I suspect
> he wanted to make "char" behave the same way as "short", "int", and "long",
> which is a reasonable approach.  For Busybox, I liked our string handling
> being naturally 8-bit clean and that this didn't bloat arm (and with only
> 43,000 transistors arm has to be the minimal bar to compare other chips to.
> If they can't do as good a job as arm, they really shouldn't be bothering).
> It also didn't bloat x86/x86-64, and between arm and x86/x86-64 that covers
> something like 90% of all processors ever produced.  (And I believe the next
> biggest category, by volume, is variants of the Z80...)

Just to let you know what I'm talking about, the problem is that when
you do a load of a char on an architecture like ARM with -fsigned-char
you end up with an instruction sequence like:

; load byte into reg
; sign extend reg to 32 bits

On ARM this sign extend takes two instructions, on our chip it's one
(wow, look at us go) but it's still an overhead. Typically one paid by
architectures that are more RISC in nature.

x86 doesn't care, it has 8 bit registers, RISCs may not.

If your char is not signed you don't need to sign extend, hence those
architectures that default to unsigned char prefer to stay that way.

NB Whilst we aren't close to ARM shipment volumes (we aren't primarily
in the business of shipping general purpose cores) we have shipped
over 3 million units and hope to ship many more in the future.

> So you have a currently closed source, undistributed, out-of-tree fork.  If it
> had been merged, Mike might have noticed the change inconvenienced you when
> he made it, but it wasn't, so he didn't.
>
> Wake me when you have public code.  Until then, I don't care enough to
> continue this conversation.  (Not that I'm the person you'd need to convince
> anyway: Mike is.)

This isn't an issue just for me, the code works on architectures with
like ours but it's suboptimal and the correctness is questionable.

I was after a rationale or a justification for this decision which I
haven't managed to find as yet. If it would be accepted I would be
happy to submit a patch that fixes the warnings and restores char
behaviour to architecture dependant.



More information about the uClibc mailing list