Bernd Schmidt bernds_cb1 at t-online.de
Mon Jun 9 10:55:13 UTC 2008

Denys Vlasenko wrote:
> A small story. I wrote a glibc's nscd replacement last year.
> glibc's nscd, when run on Google machines (and not only on them),
> was crashing or locking up, and since it was broken by design,

References?  I assume this has been reported and discussed somewhere?

> On MMU machine,
> if you call crypt() while your machine is in OOM state -
> it may crash because crypt touches static buffers which are
> allocated on request and also can be swapped out, and in OOM
> crunch kernel simply CANNOT satisfy your request "give me
> that zeroed page in my bss". IT IS NOWHERE TO BE TAKEN FROM.
> On Linux, this will either invoke oom killer or will leave
> the machine locked up in OOM if oom killer is disabled.

That's if you leave overcommit enabled, which is known and documented to 
be unreliable, but it is not the only option.

> IOW: crypt() which uses static buffers is as likely
> to die/lock up on OOMing machine as malloc() based one.

Incorrect on some systems, notably nommu which is quite a likely setup 
for uClibc.  Also, an order-5 malloc on nommu is much more likely to 
fail than taking an order-0 page fault on mmu.  A 70k allocation _will_ 
fail occasionally, even on systems that have plenty of memory free.

Also, for most of the other cases it's not (primarily) the use of malloc 
I'm objecting to.  It's the lousy error handling we have with __uc_malloc.

> Even if libc would be perfect and not crash on OOM, and all
> libraries and programs would be similarly perfect and not crash
> on OOM (fat chance), you still CANNOT do anything useful on
> OOMing machine - yes, programs maybe don't crash, they "only"
> refuse to do what you ask them to do, they error out:
> $ ls
> cannot exec ls: Cannot allocate memory
> $ cat httpd.log
> cannot exec cat: Cannot allocate memory

Which is perfectly good, documented behaviour, and infinitely better 
than crashing in the middle of a program and leaving things in an 
inconsistent state.  Even when you get OOM, there are different modes of 
failure, and we can and should do all we can to control the negative 
effects.  Applications may not take enough care, but uClibc is a system 
library and must not take such a sloppy approach.

> This actually _improves_ our performance in near-OOM conditions.
> How? Going back to crypt(). If we will go back and reinstate
> static buffers there, busybox's data+bss size will jump from 8k
> to 80k - tenfold increase. On NOMMU, if you have N running
> busybox daemons, you already have additional N*72k bytes
> allocated and sitting there, totally unused.

Well, wasting memory at run-time is inherent in the design of busybox. 
Have you considered that it might be busybox that is broken?

> This will be a measurable, real drop in memory utilisation
> efficiency. Just start 1000 copies of "busybox sleep 10"
> and measure how many more megabytes that would require.

This only shows that busybox is unsuitable for that workload and people 
should install normal GNU utilities if that's what they want to run.

I want this discussion to end.  We have two people who think that the 
uc_malloc patches as implemented now are a poor idea.  Can we get some 
more feedback one way or another, please?

This footer brought to you by insane German lawmakers.
Analog Devices GmbH      Wilhelm-Wagenfeld-Str. 6      80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif

More information about the uClibc mailing list