uClibc++ associative_base performance
gkajmowi at tbaytel.net
Thu Jul 5 23:36:25 UTC 2007
On Wednesday 04 July 2007 03:18, Asier Llano Palacios wrote:
> Quick performance analysis. I think I'm maybe biassed by the tests I've
> been performing, but I'd like to contribute a performance bump to uClibc
> ++, and I'd like to know your point of view.
> Current performance situtation
> uClibc++-0.2.1: My test spent 9 seconds.
> uClibc++-0.2.2: My test spent 1 minutes and 30 seconds (aprox).
> Target time: Less than 1 second.
Eek. I expected that there would be a performance drop under certain
workloads. Still, this is worse than I expected.
I changed the underlying infrastructure (as well as representation) of the map
and set classes so that I might be able to comply with one corner case of the
spec (upon element deletion, all iterators still need to be valid). The
quick and dirty way to do that was to use a list as an underlying
representation. It makes me happy to reuse code in such a manner.
> General analysis:
> After some profiling I realized that most of the time is spent in
> std::map operations. In 0.2.1 the time is spent in the std::map::insert
> operation, and in uClibc++-0.2.2 the time is mostly spent in the
> std::map::find operation.
> Performance by operation:
> Having a look at the algorithms used in the associative containers in
> uClibc++ I've been looking which operations are not scalable. We will
> have a look at the media of the timing spent for each operation as a
> function of the number of elements in the collection.
> find = O(log(n)) <--- Binary search (fast)
> insert = O(n) <--- Binary search, but linear storage (slow)
> iterate = O(1)
> find = O(n) <-- Linear search (very slow)
> insert = O(n) <-- Linear search to insert (because of the find)
> iterate = O(1)
workload you would be running that would take that amount of time. On a low
memory system (which is what uClibc++ is geared towards), it's hard to have
enough memory to have enough records that this should matter. If it's a
matter of inner loop performance, you would be best either taking a reference
to the node that you are using over and over, or using iterators to go
through the entire contents. Both should result in constant-time
If you have a very large amount of memory to the point that you are able to
fill up a very large data set than I would suggest going with libstdc++ - it
has algorithms that are optimized for speed whereas I went for readability
(at least in my own mind) and size.
> What I'd like to contribute
> I'd like to contribute a performance improvement in every associative
> container by improving the associative_base. This way the set, mset, map
> and mmap. I'll try to implement it so that it still passes the same
> number of tests with the following timing:
> find = O(log(n)) <-- binary search in a tree
> insert = O(log(n)) <-- binary search, tree bouncing.
> iterate = O(1) <-- walking in a tree (it will be slower than
> the previous but will not grow with n).
> Of course, I wan't to mantain:
> - Small code, small binary size and small memory footprint
> - Iterators point to the same position even after element insertion
> and deletion.
> You'll have news about this implementation as soon as I finish it.
I'm happy to consider any contribution that you would like to make.
> The only thing I don't like about uClibc++ is its LGPL license, its
> quite restrictive to me. I work for an enterprise where I develop and
> contribute all the general purpose code (the code that is not specific
> for our application). I consider myself a community contributor. I use
> uClibc++ and I have other application specific libraries (that I don't
> see the point of sharing them, because they are too specific of our
> embedded architecture). Even if I contribute the other libraries I
> wouldn't contribute them under LGPL. While mantaining the source code
> like it is now, I would like to build the uClibc++ and my specific
> libraries in one shared library to make the binary size of the code
> smaller (having only one "general purpose for me" shared library). I've
> tested it and I could gain some binary size by joining both libraries
> compilation. I think that I cannot join LGPL and other binary code in a
> single library. Is there any possibility to have a less restrictive
> license (like X11 or BSD)?. I don't want to fork, nor sell uClibc++, I
> only want to have linking flexibility with uClibc++ and some non LGPL
I personally have objections to using a license similar to BSD or X11. One
person has made the argument that I should instead be using the GPL with
linking exception (same thing that libstdc++ uses), but I haven't bothered
converting as of yet.
> Thank you,
> ----------------------------------------- PLEASE NOTE
> ------------------------------------------- This message, along with any
> attachments, may be confidential or legally privileged. It is intended only
> for the named person(s), who is/are the only authorized recipients. If this
> message has reached you in error, kindly destroy it without review and
> notify the sender immediately. Thank you for your help.
> µSysCom uses virus scanning software but excludes any liability for viruses
> contained in any attachment.
It is so privileged that you sent it to a publically accessible mailing list
with a published public archive? Unencrypted? Wow. And I was just starting
to think that you knew what you were talking about. I like the bit about how
I'm supposed to know to "destroy it without review" the document when the
super-secret stamp is attached at the bottom.
Finally, the message is scanned for viruses, but you disclaim liability. Why
tell me this? And if somehow the document contained a virus in the main
message (as opposed to an attachment) would I then be able to sue you because
your disclaimer doesn't cover it?
Thanks for the input. I look forward to seeing what you come up with.
More information about the uClibc