__uc_malloc

Denys Vlasenko vda.linux at googlemail.com
Mon Jun 9 15:52:07 UTC 2008


On Monday 09 June 2008 16:56, Bernd Schmidt wrote:
> > In a bigger picture, no. The machine is unusable one way or another.
> 
> But what are the consequences of failure?  Okay, so I agree that a 
> production machine should be big enough not to go OOM.  Reality is more 
> difficult, maybe there's a memory leak somewhere, in any case OOM does 
> happen occasionally.  The question then becomes: where and how do we 
> fail, and what are the consequences.  Do we fail at a point where the 
> possibility of a memory allocation failure (or other hard error) is 
> documented and can be dealt with, or do we fail in a random place?  In 
> one of these cases it's possible to engineer a certain measure of 
> robustness, in the other case it's impossible.  The machine may be 
> unusable in both cases, but in one case the damage can be contained more 
> effectively.

Do you assume that __uc_malloc prevents this?

Let's take a look at real world cases.

	Oracle database.

For obvious reasons, they have to go to great pains to protect
against data corruption. And in my experience, they do it well.
The primary weapon for them is that *even if application crashes*,
the on-disk data format is designed to survive that without data
loss. Oracle rightly believes that "we will never even crash" is not
an attainable goal. "We will design to survive crashes without data loss"
is more realistic.

But they also have OOM protection code in the db. Basically, on OOM
database will generally hang, not crash.

In hypothetical case of Oracle linked against uclibc with this dreaded
__uc_malloc thing, can they do it? Yes, it's trivial:

   #ifdef __UCLIBC__
   static void uc_malloc_failed(size_t size) {
	log_message("failed to allocate %d bytes", (int )size);
        return; /* uclibc will retry the allocation */
   }
   #endif

   int main() {
   #ifdef __UCLIBC__
       __uc_malloc_failed = uc_malloc_failed;
   #endif



	Firefox

This project was never concerned about checking for OOM,
and I can't really blame them.

firefox-1.5-source.tar.bz2#utar/mozilla/dom/src/base/nsGlobalWindow.cpp:

Try to change the following function to check for, and to work properly
in the case of "new nsDOMWindowUtils(this)" failing:

NS_IMETHODIMP
nsGlobalWindow::GetInterface(const nsIID & aIID, void **aSink)
{
  NS_ENSURE_ARG_POINTER(aSink);
  *aSink = nsnull;

  if (aIID.Equals(NS_GET_IID(nsIDocCharset))) {
    FORWARD_TO_OUTER(GetInterface, (aIID, aSink), NS_ERROR_NOT_INITIALIZED);

    if (mDocShell) {
      nsCOMPtr<nsIDocCharset> docCharset(do_QueryInterface(mDocShell));
      if (docCharset) {
        *aSink = docCharset;
        NS_ADDREF(((nsISupports *) *aSink));
      }
    }
  }
  else if (aIID.Equals(NS_GET_IID(nsIWebNavigation))) {
    FORWARD_TO_OUTER(GetInterface, (aIID, aSink), NS_ERROR_NOT_INITIALIZED);

    if (mDocShell) {
      nsCOMPtr<nsIWebNavigation> webNav(do_QueryInterface(mDocShell));
      if (webNav) {
        *aSink = webNav;
        NS_ADDREF(((nsISupports *) *aSink));
      }
    }
  }
  else if (aIID.Equals(NS_GET_IID(nsIWebBrowserPrint))) {
    FORWARD_TO_OUTER(GetInterface, (aIID, aSink), NS_ERROR_NOT_INITIALIZED);

    if (mDocShell) {
      nsCOMPtr<nsIContentViewer> viewer;
      mDocShell->GetContentViewer(getter_AddRefs(viewer));
      if (viewer) {
        nsCOMPtr<nsIWebBrowserPrint> webBrowserPrint(do_QueryInterface(viewer));
        if (webBrowserPrint) {
          *aSink = webBrowserPrint;
          NS_ADDREF(((nsISupports *) *aSink));
        }
      }
    }
  }
  else if (aIID.Equals(NS_GET_IID(nsIScriptEventManager))) {
    nsCOMPtr<nsIDocument> doc(do_QueryInterface(mDocument));
    if (doc) {
      nsIScriptEventManager* mgr = doc->GetScriptEventManager();
      if (mgr) {
        *aSink = mgr;
        NS_ADDREF(((nsISupports *) *aSink));
      }
    }
  }
  else if (aIID.Equals(NS_GET_IID(nsIDOMWindowUtils))) {
    FORWARD_TO_OUTER(GetInterface, (aIID, aSink), NS_ERROR_NOT_INITIALIZED);

    nsCOMPtr<nsISupports> utils(do_QueryReferent(mWindowUtils));
    if (utils) {
      *aSink = utils;
      NS_ADDREF(((nsISupports *) *aSink));
    } else {
      nsDOMWindowUtils *utilObj = new nsDOMWindowUtils(this);
      nsCOMPtr<nsISupports> utilsIfc =
                              NS_ISUPPORTS_CAST(nsIDOMWindowUtils *, utilObj);
      if (utilsIfc) {
        mWindowUtils = do_GetWeakReference(utilsIfc);
        *aSink = utilsIfc;
        NS_ADDREF(((nsISupports *) *aSink));
      }
    }
  }
  else {
    return QueryInterface(aIID, aSink);
  }

  return *aSink ? NS_OK : NS_ERROR_NO_INTERFACE;
}

"Fixing" firefox to check for NULL would be a titanic effort.
With "allocator with callback" it's surprisingly easy.

> There are computers which run software that is more complex than 
> busybox, where it's not just a question of the user experience of 
> printing "out of memory" on a shell prompt.

Yes, I picked Oracle as an example of an extreme opposite of busybox.
__uc_malloc would not be a problem for it.


> With that, I'll give up on convincing you; unless someone else objects 
> soon I'll take Bernhard's and Daniel's messages as consensus to start 
> fixing things.

I am merely explaining my view on the matter.
It's unusual to have a perfect, 100% agreement on everything.
You can revert it.
--
vda



More information about the uClibc mailing list