We recently discovered that the cubicweb.org site (the one you are
probably visiting right now) was suffering from a memory leak. The
munin graphs showed a memory consumption steadily increasing soon
after the instance was started, and this would only stop when all the
memory on the host was exhausted. This was clearly caused by a memory
leak somewhere, either in CubicWeb itself or in a cube used by the
instance.
Since Python has a garbage collector, either the leak was occuring in
a C extension, or it was caused by some objects which were not garbage
collectable. A common cause for the latter, as explained in the gc
module documentation, are objects with a __del__ method which are part
of a cycle.
We used the "gc" view, which is an administrative view in CubicWeb, reachable by appending "?vid=gc" at the end of the url of
the root of your instance, if you are a member of the managers group. This view uses the gc module from the
python standard library to see which objects are not garbage
collected.
This view showed thousands of instances of
mercurial.url.httphandler. This class indeed has a __del__ method
and instances have a cycle with urllib2.OpenerDirector. Mercurial is
used by the vcsfile cube which regularly polls remote repository
over HTTP, which causes httphandler to be instantiated (and a
reference to be leaked). This problem had gone undetected in mercurial
because most of the time, processes using mercurial over http are
shortlived and the leaked memory is quickly collected by the operating
system. Discussion ensued on the IRC forum #mercurial with the
developers and a patch was submitted which fixes the leak. In order to
avoid the problem with versions of mercurial up to the current one, a
new version of vcsfile including a monkey patch for mercurial was
released and deployed on cubicweb.org!