libdl usage count wrapping

Kevin Day thekevinday at gmail.com
Tue Sep 23 22:04:33 UTC 2008


On Tue, Sep 23, 2008 at 3:06 PM, Carmelo Amoroso <carmelo73 at gmail.com> wrote:
> I'll look at these two issues soon.
>
> Thanks,
> Carmelo
> Vallevand, Mark K wrote:
>> Wow.  I just ran into this problem.  Or, something very similar.  I
>> reported a memory leak in dlopen() dlclose() last week.  I've got a fix
>> for that problem, and my program doesn't leak any more.  But, now its
>> crashing consistently after a period of time.  The program makes heavy
>> use of dlopen() dlclose().
>>
>> Looking at dlopen() dlclose(), I'm probably not going to look for
>> another fix there.  I'm going to fix my program to dlopen() once and
>> leave libraries open.
>>
>> Regards.
>> Mark K Vallevand
>>
>> We old folks have to find our cushions and pillows in our tankards.
>> Strong beer is the milk of the old.
>> - Martin Luther
>>
>>
>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
>> MATERIAL and is thus for use only by the intended recipient. If you
>> received this in error, please contact the sender and delete the e-mail
>> and its attachments from all computers.
>>
>>
>> -----Original Message-----
>> From: uclibc-bounces at uclibc.org [mailto:uclibc-bounces at uclibc.org] On
>> Behalf Of Phil Estes
>> Sent: Monday, September 22, 2008 9:45 PM
>> To: uclibc at uclibc.org
>> Subject: libdl usage count wrapping
>>
>> Recently I was looking into an issue where someone was claiming pam was
>> segfaulting after a lot of usage (lots of calls to authenticate
>> users--usually around 5K calls).  My investigation led me to the point
>> where I realized that the dlopen() and dlclose() management of ref.
>> counting is not balanced, which leads to the heaviest "DL_NEEDED"
>> libraries basically getting incremented to the point of overflowing
>> "unsigned short usage_count".  This leads to a nasty situation where
>> libc.so is munmapped (because usage_count == 0), and the next call to a
>> C runtime function traps, of course.
>>
>> Since I was working with 0.9.29 snapshot from 2006, I decided to see if
>> anything in SVN has changed that might impact this.  I was interested to
>> find the 17530 changeset and accompanying discussion
>> ( http://uclibc.org/lists/uclibc/2007-January/017165.html ) and while
>> testing it, noted a 10x improvement, given that more of the dependent
>> libs are added to the list that is walked at do_dlclose that includes a
>> decrement of usage_count.  However, it is still not exact in that
>> libc.so's usage_count continues to rise even with matching dlopen() and
>> dlclose() calls each iteration through the example program (I'll attach
>> below), and now needs 65K iterations of loading/unloading a lib to get a
>> segfault.
>>
>> One way to 'watch' this in gdb is to add a breakpoint in libdl.c around
>> line 247 or so and use the commands interface to do the following
>> output:
>> commands <brnum>
>> silent
>> printf "%d : %s\n",(*tpnt1)->usage_count, lpntstr
>> cont
>> end
>>
>> Now continue and watch libc's usage_count climb, and if you are patient
>> enough, you will get a segfault somewhere after 65,500 iterations.
>>
>> Obviously one potential fix is to "handle" the wrap of usage_count in
>> ldso/dl-elf.c by checking for 0 after the increment and setting to "near
>> max" ..which would then never allow a wrap to zero to occur which causes
>> the segfault.
>>
>> However, it would be interesting to know if the uClibc maintainers think
>> usage_count is important enough for some of the core libs to either (a)
>> be correct, or (b) be protected from the segfault condition which is
>> less likely than it used to be given the aforementioned changes, but
>> still potential for long-running apps (like pam running on a system with
>> a very long uptime).  I'm slightly interested in trying to fix, but
>> given I'm no expert on all the various lists and pointers employed via
>> dlopen() it seems like someone with some skill in the area should make
>> sure it's done right.  My hunch is that is has to do with the init_fini
>> list creation and the filtering out of RTLD_GLOBAL libs, but I'm not
>> sure yet without more debug...but the init_fini list seems to be what is
>> walked at dlclose that has any relation to decrementing usage_count.
>>
>> Thanks for any thoughts/input,
>> Phil Estes
>> estesp at linux.vnet.ibm.com
>>
>> Here's my example program for creating the segfault condition..it can
>> obviously be doctored to load different libs, more libs, less libs, etc.
>>
>> ---stress-dlopen.c----
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <dlfcn.h>
>>
>> #define NUMLIBS 4
>> #define ITERS 50000
>>
>> void *handles[NUMLIBS];
>> /*
>> char *names[NUMLIBS] = { "/lib/security/pam_deny.so",
>>                          "/lib/security/pam_warn.so",
>>                          "/lib/libpam.so.0" };
>> */
>> char *names[NUMLIBS] = { "/usr/lib/libxml2.so.2",
>>                          "/usr/lib/libpcap.so",
>>                          "/usr/lib/libpng.so.2",
>>                          "/usr/lib/libxslt.so.1" };
>>
>> int openlibs()
>> {
>>       int i, errors=0;
>>       for (i = 0; i < NUMLIBS; i++) {
>>               handles[i] = dlopen(names[i], RTLD_NOW);
>>               if (handles[i] == 0) {
>>                       errors++;
>>                       fprintf(stderr, "%s\n", dlerror());
>>               }
>>     }
>>     return errors;
>> }
>>
>> int closelibs()
>> {
>>       int i;
>>       for (i = 0; i < NUMLIBS; i++) {
>>               if (handles[i] != 0) {
>>                       dlclose(handles[i]);
>>                       handles[i] = NULL;
>>               }
>>     }
>>     return 0;
>> }
>>
>> int main()
>> {
>>       unsigned int retcode = 0, i = 0, cnt = ITERS;
>>
>>       for(i=0;i<cnt;i++)
>>       {
>>               printf("\n Call ... %d/%d\n",i,cnt);
>>               retcode = openlibs();
>>               printf("opened %d\n",retcode);
>>               retcode = closelibs();
>>               printf("closed %d\n",retcode);
>>
>>       }
>>       return 0;
>>
>> }
>>
>>
>>
>>
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>> _______________________________________________
>> uClibc mailing list
>> uClibc at uclibc.org
>> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>>
>
> _______________________________________________
> uClibc mailing list
> uClibc at uclibc.org
> http://busybox.net/cgi-bin/mailman/listinfo/uclibc
>

I felt the need to test this on my uClibc 0.9.28.3
and got the following segfault:

 Call ... 32766/50000
File not found
opened 1
closed 0

 Call ... 32767/50000
File not found
opened 1
Segmentation fault (core dumped)

What's particularly interesting is 32767 is just under the magical 32768.
Makes me want to the this is an integer overflow issue.
Possibly a signed integer somewhere, and if this is changed to an
unsigned that would be 65534 (I am assuming)
This would match the 65k issue from 0.9.29 svn.

-- 
Kevin Day



-- 
Kevin Day



More information about the uClibc mailing list